Я прошу совета диагностировать R710, работающий под управлением Xen в Debian Stretch, паникуя один раз в день.
сервер будет нормально работать ~24 часа и в конечном итоге вызовет панику ядра обычно о bxn2. (Поставит полную панику ниже) система больше не отвечает и обычно требует перезагрузки. После перезагрузки система в порядке, но в конечном итоге будет паниковать.
У меня есть идентичный сервер, на котором работает сервер, который не падает, когда у него есть одна или две виртуальные машины, но будет сбой, когда все виртуальные машины бежать из системы. У меня также есть еще один R710 с H700, который работает ~10 виртуальных машин без проблем.
У меня также возникают проблемы с воссозданием паники. В какой-то момент я смог надежно врезаться второй сервер, загрузив процессор до максимума и делая высокий IO. (sha1sum / dev / zero и dd).
спецификации Делл Р710 следующим образом:
- 2x 4core cpu
- 72GB оперативной памяти
- версия Микропрограммных обеспечений SAS 6/iR интегрированная : 00.25.47.00.06.22.03.00
- SAS6 проездом до 1X 1ТБ диск с ОС, 2х 2ТБ дисков в массив для mdadm (03:контроллер SCSI-накопителя 00.0: логика LSI / симбиоз логики SAS1068E PCI-Экспресс "Фьюжн" МПТ САС (откр. 08))
- Broadcom NetExtreeme 2 nic. (01:00.0 Ethernet контроллер: Broadcom Limited NetXtreme II BCM5709 Gigabit Ethernet (rev 20))
- все bios и прошивки обновлены с контроллером жизненного цикла.
Он работает под управлением Debian Stretch с следующие детали:
- GNU / Linux 9 n l
- Линукс xen01 4.9.0-3-для amd64 #1 СМП в Debian 4.9.30-2+deb9u3 (2017-08-06) архитектуру x86_64 GNU/Линукс
- запущен Xen-hypervisor-4.8-amd64 4.8.1-1+deb9u1 с 22 виртуальными машинами.
- прошивка ii-bnx2 20161130-3 все бинарные прошивки для Broadcom NetXtremeII
- mptsas версия 3.04.20
- версия mpt3sas 13.100.00.00
- II bridge-utils 1.5-13 amd64 утилиты для настройки Linux Ethernet bridge
до сих пор я пробовал следующее (в одиночку и в разных комбинациях):
- отключить MSIX для bnx2. modprobe bnx2 disable_msi=1
- отключить MSIX для mpt3sas. modprobe mpt3sas msix_disable=1
- добавил intremap=off в ядро и выключил виртуализацию Intel через BIOS, чтобы избежать ошибок чипсета Intel 55x0. https://support.citrix.com/article/CTX136517
- опустил ВМ.dirty_background_ratio=5 и vm.dirty_ratio=10, чтобы обойти Linux Server сбой с "INFO: задача заблокирована более чем на 120 секунд"
- установите nic rx выше и поднял сеть.сердечник.netdev_max_backlog=30000 как в https://unix.stackexchange.com/questions/37727/solving-ethernet-watchdog-timer-deadlocks
один пример панической ситуации.
Aug 18 14:45:16 xen02 kernel: [54277.859415] ------------[ cut here ]------------
Aug 18 14:45:16 xen02 kernel: [54277.859451] WARNING: CPU: 0 PID: 0 at /build/linux-me40Ry/linux-4.9.30/net/sched/sch_generic.c:316 dev_watchdog+0x22d/0x230
Aug 18 14:45:16 xen02 kernel: [54277.859456] NETDEV WATCHDOG: eno2 (bnx2): transmit queue 5 timed out
Aug 18 14:45:16 xen02 kernel: [54277.859457] Modules linked in: ipmi_si xt_tcpudp xt_physdev br_netfilter iptable_filter xen_netback xen_blkback mpt3sas raid_class mptctl bridge stp llc dell_rbu xen_gntdev xen_evtchn xenfs xen_privcmd ipmi_devintf iTCO_wdt iTCO_vendor_support evdev joydev mgag200 ttm drm_kms_helper intel_powerclamp coretemp drm i2c_algo_bit serio_raw dcdbas sg pcspkr acpi_power_meter ipmi_msghandler wmi button shpchp i7core_edac lpc_ich mfd_core edac_core ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 md_mod sd_mod uas usb_storage sr_mod cdrom ata_generic hid_generic usbhid hid crc32c_intel psmouse
Aug 18 14:45:16 xen02 kernel: [54277.859517] ehci_pci uhci_hcd mptsas ehci_hcd ata_piix scsi_transport_sas mptscsih libata mptbase usbcore usb_common scsi_mod bnx2 [last unloaded: ipmi_si]
Aug 18 14:45:16 xen02 kernel: [54277.859533] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
Aug 18 14:45:16 xen02 kernel: [54277.859534] Hardware name: Dell Inc. PowerEdge R710/0YDJK3, BIOS 6.4.0 07/23/2013
Aug 18 14:45:16 xen02 kernel: [54277.859537] 0000000000000000 ffffffff81328574 ffff8811f5a03e20 0000000000000000
Aug 18 14:45:16 xen02 kernel: [54277.859540] ffffffff81076ebe 0000000000000005 ffff8811f5a03e78 ffff8811dee04000
Aug 18 14:45:16 xen02 kernel: [54277.859542] 0000000000000000 ffff8811e4b9c940 0000000000000008 ffffffff81076f3f
Aug 18 14:45:16 xen02 kernel: [54277.859544] Call Trace:
Aug 18 14:45:16 xen02 kernel: [54277.859547] <IRQ>
Aug 18 14:45:16 xen02 kernel: [54277.859553] [<ffffffff81328574>] ? dump_stack+0x5c/0x78
Aug 18 14:45:16 xen02 kernel: [54277.859558] [<ffffffff81076ebe>] ? __warn+0xbe/0xe0
Aug 18 14:45:16 xen02 kernel: [54277.859560] [<ffffffff81076f3f>] ? warn_slowpath_fmt+0x5f/0x80
Aug 18 14:45:16 xen02 kernel: [54277.859563] [<ffffffff8152a98d>] ? dev_watchdog+0x22d/0x230
Aug 18 14:45:16 xen02 kernel: [54277.859564] [<ffffffff8152a760>] ? qdisc_rcu_free+0x40/0x40
Aug 18 14:45:16 xen02 kernel: [54277.859570] [<ffffffff810e3e90>] ? call_timer_fn+0x30/0x110
Aug 18 14:45:16 xen02 kernel: [54277.859571] [<ffffffff810e43ce>] ? run_timer_softirq+0x1ce/0x420
Aug 18 14:45:16 xen02 kernel: [54277.859575] [<ffffffff810d0f91>] ? handle_irq_event_percpu+0x51/0x70
Aug 18 14:45:16 xen02 kernel: [54277.859576] [<ffffffff810d4dc7>] ? handle_percpu_irq+0x37/0x50
Aug 18 14:45:16 xen02 kernel: [54277.859581] [<ffffffff81608d95>] ? __do_softirq+0x105/0x290
Aug 18 14:45:16 xen02 kernel: [54277.859583] [<ffffffff8107cf6e>] ? irq_exit+0xae/0xb0
Aug 18 14:45:16 xen02 kernel: [54277.859587] [<ffffffff814052e1>] ? xen_evtchn_do_upcall+0x31/0x40
Aug 18 14:45:16 xen02 kernel: [54277.859588] [<ffffffff8160724e>] ? xen_do_hypervisor_callback+0x1e/0x40
Aug 18 14:45:16 xen02 kernel: [54277.859589] <EOI>
Aug 18 14:45:16 xen02 kernel: [54277.859592] [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
Aug 18 14:45:16 xen02 kernel: [54277.859594] [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
Aug 18 14:45:16 xen02 kernel: [54277.859597] [<ffffffff8101b30c>] ? xen_safe_halt+0xc/0x20
Aug 18 14:45:16 xen02 kernel: [54277.859600] [<ffffffff8160584a>] ? default_idle+0x1a/0xd0
Aug 18 14:45:16 xen02 kernel: [54277.859603] [<ffffffff810b957a>] ? cpu_startup_entry+0x1ca/0x240
Aug 18 14:45:16 xen02 kernel: [54277.859608] [<ffffffff81d38f57>] ? start_kernel+0x443/0x463
Aug 18 14:45:16 xen02 kernel: [54277.859611] [<ffffffff81d3e098>] ? xen_start_kernel+0x526/0x530
Aug 18 14:45:16 xen02 kernel: [54277.859613] ---[ end trace 213eed970c44d2fa ]---
Aug 18 14:45:16 xen02 kernel: [54277.859619] bnx2 0000:01:00.1 eno2: <--- start FTQ dump --->
Aug 18 14:45:16 xen02 kernel: [54277.859658] bnx2 0000:01:00.1 eno2: RV2P_PFTQ_CTL 00010000
Aug 18 14:45:16 xen02 kernel: [54277.859682] bnx2 0000:01:00.1 eno2: RV2P_TFTQ_CTL 00020000
Aug 18 14:45:16 xen02 kernel: [54277.859707] bnx2 0000:01:00.1 eno2: RV2P_MFTQ_CTL 00004000
Aug 18 14:45:16 xen02 kernel: [54277.859730] bnx2 0000:01:00.1 eno2: TBDR_FTQ_CTL 00004000
Aug 18 14:45:16 xen02 kernel: [54277.859753] bnx2 0000:01:00.1 eno2: TDMA_FTQ_CTL 00010002
Aug 18 14:45:16 xen02 kernel: [54277.859776] bnx2 0000:01:00.1 eno2: TXP_FTQ_CTL 00010000
Aug 18 14:45:16 xen02 kernel: [54277.859799] bnx2 0000:01:00.1 eno2: TXP_FTQ_CTL 00010000
Aug 18 14:45:16 xen02 kernel: [54277.859822] bnx2 0000:01:00.1 eno2: TPAT_FTQ_CTL 00010000
Aug 18 14:45:16 xen02 kernel: [54277.859845] bnx2 0000:01:00.1 eno2: RXP_CFTQ_CTL 00008000
Aug 18 14:45:16 xen02 kernel: [54277.859868] bnx2 0000:01:00.1 eno2: RXP_FTQ_CTL 00100000
Aug 18 14:45:16 xen02 kernel: [54277.859891] bnx2 0000:01:00.1 eno2: COM_COMXQ_FTQ_CTL 00010000
Aug 18 14:45:16 xen02 kernel: [54277.859916] bnx2 0000:01:00.1 eno2: COM_COMTQ_FTQ_CTL 00020000
Aug 18 14:45:16 xen02 kernel: [54277.859941] bnx2 0000:01:00.1 eno2: COM_COMQ_FTQ_CTL 00010000
Aug 18 14:45:16 xen02 kernel: [54277.859965] bnx2 0000:01:00.1 eno2: CP_CPQ_FTQ_CTL 00004000
Aug 18 14:45:16 xen02 kernel: [54277.859988] bnx2 0000:01:00.1 eno2: CPU states:
Aug 18 14:45:16 xen02 kernel: [54277.860017] bnx2 0000:01:00.1 eno2: 045000 mode b84c state 80001000 evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc
Aug 18 14:45:16 xen02 kernel: [54277.860063] bnx2 0000:01:00.1 eno2: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a4c pc 8000a5c instr 1440fffc
Aug 18 14:45:16 xen02 kernel: [54277.860108] bnx2 0000:01:00.1 eno2: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c10 pc 8004c14 instr 32050003
Aug 18 14:45:16 xen02 kernel: [54277.860154] bnx2 0000:01:00.1 eno2: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a98 pc 8000aa4 instr 3c020800
Aug 18 14:45:16 xen02 kernel: [54277.860199] bnx2 0000:01:00.1 eno2: 145000 mode b880 state 80000000 evt_mask 500 pc 800ae38 pc 800ae40 instr 24130001
Aug 18 14:45:16 xen02 kernel: [54277.860245] bnx2 0000:01:00.1 eno2: 185000 mode b8cc state 80000000 evt_mask 500 pc 8000c6c pc 8000c6c instr 1180000b
Aug 18 14:45:16 xen02 kernel: [54277.860285] bnx2 0000:01:00.1 eno2: <--- end FTQ dump --->
Aug 18 14:45:16 xen02 kernel: [54277.860308] bnx2 0000:01:00.1 eno2: <--- start TBDC dump --->
Aug 18 14:45:16 xen02 kernel: [54277.860332] bnx2 0000:01:00.1 eno2: TBDC free cnt: 32
Aug 18 14:45:16 xen02 kernel: [54277.860353] bnx2 0000:01:00.1 eno2: LINE CID BIDX CMD VALIDS
Aug 18 14:45:16 xen02 kernel: [54277.860382] bnx2 0000:01:00.1 eno2: 00 001100 d618 00 [0]
Aug 18 14:45:16 xen02 kernel: [54277.860411] bnx2 0000:01:00.1 eno2: 01 001300 61b8 00 [0]
Aug 18 14:45:16 xen02 kernel: [54277.860440] bnx2 0000:01:00.1 eno2: 02 001280 63c8 00 [0]
Aug 18 14:45:16 xen02 kernel: [54277.860469] bnx2 0000:01:00.1 eno2: 03 000800 79c8 00 [0]
Aug 18 14:45:16 xen02 kernel: [54277.860498] bnx2 0000:01:00.1 eno2: 04 000800 40f8 00 [0]
Aug 18 14:45:16 xen02 kernel: [54277.860526] bnx2 0000:01:00.1 eno2: 05 16fd80 9ef8 bf [0]
Aug 18 14:45:16 xen02 kernel: [54277.860555] bnx2 0000:01:00.1 eno2: 06 1b5f80 f7c8 7f [0]
Aug 18 14:45:16 xen02 kernel: [54277.860584] bnx2 0000:01:00.1 eno2: 07 1bef80 fbd8 7f [0]
Aug 18 14:45:16 xen02 kernel: [54277.860612] bnx2 0000:01:00.1 eno2: 08 1bcd80 f5f8 7c [0]
Aug 18 14:45:16 xen02 kernel: [54277.860641] bnx2 0000:01:00.1 eno2: 09 1fff80 f9f8 96 [0]
Aug 18 14:45:16 xen02 kernel: [54277.860669] bnx2 0000:01:00.1 eno2: 0a 077f00 e7b8 7f [0]
Aug 18 14:45:16 xen02 kernel: [54277.860698] bnx2 0000:01:00.1 eno2: 0b 1dff80 f9f8 e7 [0]
Aug 18 14:45:16 xen02 kernel: [54277.860727] bnx2 0000:01:00.1 eno2: 0c 1f9c00 7a78 f0 [0]
Aug 18 14:45:16 xen02 kernel: [54277.860756] bnx2 0000:01:00.1 eno2: 0d 0ff680 fdf8 ff [0]
Aug 18 14:45:16 xen02 kernel: [54277.860784] bnx2 0000:01:00.1 eno2: 0e 067980 ffe8 f7 [0]
Aug 18 14:45:16 xen02 kernel: [54277.860813] bnx2 0000:01:00.1 eno2: 0f 0ef300 fb78 7e [0]
Aug 18 14:45:16 xen02 kernel: [54277.860842] bnx2 0000:01:00.1 eno2: 10 1be600 dff8 df [0]
Aug 18 14:45:16 xen02 kernel: [54277.860870] bnx2 0000:01:00.1 eno2: 11 1fff80 faf8 bf [0]
Aug 18 14:45:16 xen02 kernel: [54277.860899] bnx2 0000:01:00.1 eno2: 12 05fd80 7ef8 ff [0]
Aug 18 14:45:16 xen02 kernel: [54277.860928] bnx2 0000:01:00.1 eno2: 13 1fba00 d6f0 ff [0]
Aug 18 14:45:16 xen02 kernel: [54277.860957] bnx2 0000:01:00.1 eno2: 14 1fed80 7fd8 db [0]
Aug 18 14:45:16 xen02 kernel: [54277.860985] bnx2 0000:01:00.1 eno2: 15 17cf80 73b0 dd [0]
Aug 18 14:45:16 xen02 kernel: [54277.861014] bnx2 0000:01:00.1 eno2: 16 1ff700 eff8 1b [0]
Aug 18 14:45:16 xen02 kernel: [54277.861042] bnx2 0000:01:00.1 eno2: 17 1dfd80 eeb8 7f [0]
Aug 18 14:45:16 xen02 kernel: [54277.861071] bnx2 0000:01:00.1 eno2: 18 1bd780 fff8 ff [0]
Aug 18 14:45:16 xen02 kernel: [54277.861099] bnx2 0000:01:00.1 eno2: 19 17fb80 fef0 df [0]
Aug 18 14:45:16 xen02 kernel: [54277.861128] bnx2 0000:01:00.1 eno2: 1a 1ffe80 6a70 df [0]
Aug 18 14:45:16 xen02 kernel: [54277.861157] bnx2 0000:01:00.1 eno2: 1b 1efe80 dfe8 ff [0]
Aug 18 14:45:16 xen02 kernel: [54277.861186] bnx2 0000:01:00.1 eno2: 1c 0f7f80 dfb0 7f [0]
Aug 18 14:45:16 xen02 kernel: [54277.861214] bnx2 0000:01:00.1 eno2: 1d 1f7f80 fad8 fb [0]
Aug 18 14:45:16 xen02 kernel: [54277.861243] bnx2 0000:01:00.1 eno2: 1e 1fff80 fbd8 d7 [0]
Aug 18 14:45:16 xen02 kernel: [54277.861272] bnx2 0000:01:00.1 eno2: 1f 0bbf80 ffd8 bb [0]
Aug 18 14:45:16 xen02 kernel: [54277.861297] bnx2 0000:01:00.1 eno2: <--- end TBDC dump --->
Aug 18 14:45:16 xen02 kernel: [54277.861327] bnx2 0000:01:00.1 eno2: DEBUG: intr_sem[0] PCI_CMD[00100406]
Aug 18 14:45:16 xen02 kernel: [54277.861358] bnx2 0000:01:00.1 eno2: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
Aug 18 14:45:16 xen02 kernel: [54277.861851] bnx2 0000:01:00.1 eno2: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
Aug 18 14:45:16 xen02 kernel: [54277.862318] bnx2 0000:01:00.1 eno2: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
Aug 18 14:45:16 xen02 kernel: [54277.862770] bnx2 0000:01:00.1 eno2: DEBUG: HC_STATS_INTERRUPT_STATUS[01df0020]
Aug 18 14:45:16 xen02 kernel: [54277.863211] bnx2 0000:01:00.1 eno2: DEBUG: PBA[00000000]
Aug 18 14:45:16 xen02 kernel: [54277.863653] bnx2 0000:01:00.1 eno2: <--- start MCP states dump --->
Aug 18 14:45:16 xen02 kernel: [54277.864102] bnx2 0000:01:00.1 eno2: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e]
Aug 18 14:45:16 xen02 kernel: [54277.864570] bnx2 0000:01:00.1 eno2: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500]
Aug 18 14:45:16 xen02 kernel: [54277.865039] bnx2 0000:01:00.1 eno2: DEBUG: pc[080009b8] pc[0800d240] instr[1440002c]
Aug 18 14:45:16 xen02 kernel: [54277.865515] bnx2 0000:01:00.1 eno2: DEBUG: shmem states:
Aug 18 14:45:16 xen02 kernel: [54277.865993] bnx2 0000:01:00.1 eno2: DEBUG: drv_mb[01030003] fw_mb[00000003] link_status[0000006e]
Aug 18 14:45:16 xen02 kernel: [54277.866494] drv_pulse_mb[00004ed3]
Aug 18 14:45:16 xen02 kernel: [54277.866498] bnx2 0000:01:00.1 eno2: DEBUG: dev_info_signature[44564903] reset_type[01005254]
Aug 18 14:45:16 xen02 kernel: [54277.867006] condition[0003610e]
Aug 18 14:45:16 xen02 kernel: [54277.867012] bnx2 0000:01:00.1 eno2: DEBUG: 000001c0: 01005254 42530000 0003610e 00000000
Aug 18 14:45:16 xen02 kernel: [54277.867565] bnx2 0000:01:00.1 eno2: DEBUG: 000003cc: 44444444 44444444 44444444 00000a28
Aug 18 14:45:16 xen02 kernel: [54277.868094] bnx2 0000:01:00.1 eno2: DEBUG: 000003dc: 0004ffff 00000000 00000000 00000000
Aug 18 14:45:16 xen02 kernel: [54277.868638] bnx2 0000:01:00.1 eno2: DEBUG: 000003ec: 00000000 00000000 00000000 00000000
Aug 18 14:45:16 xen02 kernel: [54277.869161] bnx2 0000:01:00.1 eno2: DEBUG: 0x3fc[0000ffff]
Aug 18 14:45:16 xen02 kernel: [54277.869686] bnx2 0000:01:00.1 eno2: <--- end MCP states dump --->
Aug 18 14:45:16 xen02 kernel: [54277.952626] bnx2 0000:01:00.1 eno2: NIC Copper Link is Down
Aug 18 14:45:16 xen02 kernel: [54277.953376] br-eno2: port 1(eno2) entered disabled state
Aug 18 14:45:19 xen02 kernel: [54281.121380] bnx2 0000:01:00.1 eno2: NIC Copper Link is Up, 1000 Mbps full duplex
Aug 18 14:45:19 xen02 kernel: [54281.121395] , receive & transmit flow control ON
Aug 18 14:45:19 xen02 kernel: [54281.121506] br-eno2: port 1(eno2) entered blocking state
Aug 18 14:45:19 xen02 kernel: [54281.121518] br-eno2: port 1(eno2) entered forwarding state
Aug 18 14:45:21 xen02 kernel: [54282.291106] bnx2 0000:01:00.1 eno2: NIC Copper Link is Down
Aug 18 14:45:21 xen02 kernel: [54282.292209] br-eno2: port 1(eno2) entered disabled state
Aug 18 14:45:23 xen02 kernel: [54284.644260] bnx2 0000:01:00.1 eno2: NIC Copper Link is Up, 1000 Mbps full duplex
Aug 18 14:45:23 xen02 kernel: [54284.644275] , receive & transmit flow control ON
Aug 18 14:45:23 xen02 kernel: [54284.644373] br-eno2: port 1(eno2) entered blocking state
Aug 18 14:45:23 xen02 kernel: [54284.644386] br-eno2: port 1(eno2) entered forwarding state
Aug 18 14:45:31 xen02 kernel: [54292.350727] usb 6-3: reset high-speed USB device number 4 using ehci-pci
Aug 18 14:45:47 xen02 kernel: [54308.549804] usb 6-3: device not accepting address 4, error -110
Aug 18 14:45:47 xen02 kernel: [54308.669880] usb 6-3: reset high-speed USB device number 4 using ehci-pci
Aug 18 14:46:03 xen02 kernel: [54324.676957] usb 6-3: device not accepting address 4, error -110
Aug 18 14:46:03 xen02 kernel: [54324.796936] usb 6-3: reset high-speed USB device number 4 using ehci-pci
Aug 18 14:46:10 xen02 kernel: [54331.872538] bnx2 0000:01:00.1 eno2: <--- start FTQ dump --->
Aug 18 14:46:10 xen02 kernel: [54331.873570] bnx2 0000:01:00.1 eno2: RV2P_PFTQ_CTL 00010000
Aug 18 14:46:10 xen02 kernel: [54331.874207] bnx2 0000:01:00.1 eno2: RV2P_TFTQ_CTL 00020000
Aug 18 14:46:10 xen02 kernel: [54331.874793] bnx2 0000:01:00.1 eno2: RV2P_MFTQ_CTL 00004000
Aug 18 14:46:10 xen02 kernel: [54331.875336] bnx2 0000:01:00.1 eno2: TBDR_FTQ_CTL 00004002
Aug 18 14:46:10 xen02 kernel: [54331.875876] bnx2 0000:01:00.1 eno2: TDMA_FTQ_CTL 00010000
Aug 18 14:46:10 xen02 kernel: [54331.876408] bnx2 0000:01:00.1 eno2: TXP_FTQ_CTL 00010000
Aug 18 14:46:10 xen02 kernel: [54331.876953] bnx2 0000:01:00.1 eno2: TXP_FTQ_CTL 00010000
Aug 18 14:46:10 xen02 kernel: [54331.877475] bnx2 0000:01:00.1 eno2: TPAT_FTQ_CTL 00010000
Aug 18 14:46:10 xen02 kernel: [54331.877999] bnx2 0000:01:00.1 eno2: RXP_CFTQ_CTL 00008000
Aug 18 14:46:10 xen02 kernel: [54331.878524] bnx2 0000:01:00.1 eno2: RXP_FTQ_CTL 00100000
Aug 18 14:46:10 xen02 kernel: [54331.879061] bnx2 0000:01:00.1 eno2: COM_COMXQ_FTQ_CTL 00010000
Aug 18 14:46:10 xen02 kernel: [54331.879595] bnx2 0000:01:00.1 eno2: COM_COMTQ_FTQ_CTL 00020000
Aug 18 14:46:10 xen02 kernel: [54331.880129] bnx2 0000:01:00.1 eno2: COM_COMQ_FTQ_CTL 00010000
Aug 18 14:46:10 xen02 kernel: [54331.880673] bnx2 0000:01:00.1 eno2: CP_CPQ_FTQ_CTL 00004000
Aug 18 14:46:10 xen02 kernel: [54331.881209] bnx2 0000:01:00.1 eno2: CPU states:
Aug 18 14:46:10 xen02 kernel: [54331.881754] bnx2 0000:01:00.1 eno2: 045000 mode b84c state 80001000 evt_mask 500 pc 8001294 pc 8001284 instr 8e260000
Aug 18 14:46:10 xen02 kernel: [54331.882330] bnx2 0000:01:00.1 eno2: 085000 mode b84c state 80005000 evt_mask 500 pc 8000a4c pc 8000a4c instr 10400016
Aug 18 14:46:10 xen02 kernel: [54331.882917] bnx2 0000:01:00.1 eno2: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c20 pc 8004c14 instr 10e00088
Aug 18 14:46:10 xen02 kernel: [54331.883497] bnx2 0000:01:00.1 eno2: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000aa4 pc 8000b28 instr 3c028000
Aug 18 14:46:10 xen02 kernel: [54331.884088] bnx2 0000:01:00.1 eno2: 145000 mode b880 state 80004000 evt_mask 500 pc 800adec pc 800ae00 instr 8c6366e4
любые советы будут с благодарностью.
Спасибо!
update 20170926-обновили вторую машину с 2 двухпортовыми картами intel nic и отключили bnx2, и машина все еще продолжает падать. Без любого domU, первая машина остала вверх на 6days.