Installed a ConnectX 3 EN 40G Ethernet card model CX313A.
Linux OS is: 2.6.32-5-amd64 #1 SMP Sat Jul 12 16:47:57 UTC 2014 x86_64 GNU/Linux
ethtool -i eth6
driver: mlx4_en
version: 2.4-1.0.0.1 (Feb 17 2015)
firmware-version: 2.33.5000
bus-info: 0000:08:00.0
I ran the ethtool self diagnostic on the card:
ethtool -t eth6 offline
The test result is FAIL
The test extra info:
Interrupt Test -5
Link Test -12
Speed Test -12
Register Test 0
Loopback Test 0
dmesg log:
[ 2837.440991] mlx4_core 0000:08:00.0: command NOP (0x31) timed out: in_param=0x0, in_mod=0x1f, op_mod=0x0, get_status err=0, status_reg=0x31004000, go_bit=0, t_bit=0, toggle=0x1
[ 2837.440999] mlx4_core 0000:08:00.0: mlx4_enter_error_state: device is going to be reset
[ 2837.948504] mlx4_core 0000:08:00.0: mlx4_enter_error_state: device was reset successfully
[ 2837.948608] mlx4_en 0000:08:00.0: Internal error detected, restarting device
[ 2837.948673] mlx4_core 0000:08:00.0: mlx4_enter_error_state: end
[ 2838.845428] mlx4_core 0000:08:00.0: Internal error mark was detected on device ffff881024660000
[ 2838.845680] mlx4_core 0000:08:00.0: mlx4_handle_error_state was started
[ 2838.845744] mlx4_handle_error_state: calling mlx4_restart_one
[ 2843.886584] mlx4_core 0000:08:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s
[ 2843.886586] mlx4_core 0000:08:00.0: PCIe link width is x8, device supports x8
[ 2845.487271] mlx4_core 0000:08:00.0: irq 82 for MSI/MSI-X
[ 2845.487273] mlx4_core 0000:08:00.0: irq 83 for MSI/MSI-X
[ 2845.487275] mlx4_core 0000:08:00.0: irq 84 for MSI/MSI-X
[ 2845.487277] mlx4_core 0000:08:00.0: irq 85 for MSI/MSI-X
[ 2845.487278] mlx4_core 0000:08:00.0: irq 86 for MSI/MSI-X
[ 2845.487280] mlx4_core 0000:08:00.0: irq 87 for MSI/MSI-X
[ 2845.487281] mlx4_core 0000:08:00.0: irq 88 for MSI/MSI-X
[ 2845.487283] mlx4_core 0000:08:00.0: irq 89 for MSI/MSI-X
[ 2845.487284] mlx4_core 0000:08:00.0: irq 90 for MSI/MSI-X
[ 2845.487286] mlx4_core 0000:08:00.0: irq 91 for MSI/MSI-X
[ 2845.487288] mlx4_core 0000:08:00.0: irq 92 for MSI/MSI-X
[ 2845.487289] mlx4_core 0000:08:00.0: irq 93 for MSI/MSI-X
[ 2845.487291] mlx4_core 0000:08:00.0: irq 94 for MSI/MSI-X
[ 2845.521456] mlx4_en 0000:08:00.0: Activating port:1
[ 2845.529510] mlx4_en: eth6: Using 8 TX rings
[ 2845.529512] mlx4_en: eth6: Using 8 RX rings
[ 2845.529657] mlx4_en: eth6: Initializing port
[ 2845.530099] mlx4_handle_error_state: mlx4_restart_one was ended, ret=0
[ 2845.530100] mlx4_handle_error_state end
One last thing, it was noted in the kernel log:
[594196.708563] mlx4_core 0000:08:00.0: Temperature Threshold was reached! Threshold: 105 celsius degrees; Current Temperature: 107
Is this a bad card or something else going on.
Thanks,
Chet