Hi Everyone,
We are getting below error in one of physical server, Which makes the server inaccessible from remote because we getting disconnect frequently, Regarding this we have raised a SR with oracle support and they have confirmed there is no issue from Operating system level. Operating system using latest patches and up-to-date. Some of configuration outputs are as below. While troubleshooting we have done with reinstalling modules but still the same issue present.
While the server UP and running we tried to ping it we get the below reply which little latency from 7 ms to 40 ms
64 bytes from 10.xxx.xxx.xxx: icmp_seq=104 ttl=60 time=11.9 ms
64 bytes from 10.xxx.xxx.xxx: icmp_seq=105 ttl=60 time=12.2 ms
64 bytes from 10.xxx.xxx.xxx: icmp_seq=106 ttl=60 time=7.76 ms
64 bytes from 10.xxx.xxx.xxx: icmp_seq=107 ttl=60 time=7.45 ms
64 bytes from 10.xxx.xxx.xxx: icmp_seq=108 ttl=60 time=11.1 ms
64 bytes from 10.xxx.xxx.xxx: icmp_seq=109 ttl=60 time=7.73 ms
64 bytes from 10.xxx.xxx.xxx: icmp_seq=110 ttl=60 time=7.71 ms
64 bytes from 10.xxx.xxx.xxx: icmp_seq=111 ttl=60 time=7.51 ms
64 bytes from 10.xxx.xxx.xxx: icmp_seq=112 ttl=60 time=12.0 ms
64 bytes from 10.xxx.xxx.xxx: icmp_seq=113 ttl=60 time=7.50 ms
Have anyone faced similar issue, Please share your experience and knowledge if similar issue was faced and fixed. It will be grateful. Thanks
Error getting in /var/log/message.
May 31 07:13:48 hostname_01 kernel: mlx4_en: eth2: frag:0 - size:1518 prefix:0 stride:1536
# lspci | grep Mellanox
02:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
[root@hostname_01 ~]# lspci -vv -s 02:00.0
02:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
Subsystem: Oracle/SUN Device 4349
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 256 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f5c00000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at bf800000 (64-bit, prefetchable) [size=8M]
Expansion ROM at f5b00000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Connection timed out
Not readable
Capabilities: [9c] MSI-X: Enable+ Count=256 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
[root@hostname_01 ~]# lsmod | grep _en
mlx4_en 91587 0
mlx4_core 268449 1 mlx4_en
ptp 9614 2 mlx4_en,e1000e
[root@hostname_01 ~]# modinfo mlx4_en
filename: /lib/modules/2.6.32-642.el6.x86_64/kernel/drivers/net/mlx4/mlx4_en.ko
version: 2.2-1 (Feb 2014)
license: Dual BSD/GPL
description: Mellanox ConnectX HCA Ethernet driver
author: Liran Liss, Yevgeny Petrilin
srcversion: 878D50A3AC159B9F83DA6E3
depends: mlx4_core,ptp
vermagic: 2.6.32-642.el6.x86_64 SMP mod_unload modversions
parm: udp_rss:Enable RSS for incomming UDP traffic or disabled (0) (uint)
parm: pfctx:Priority based Flow Control policy on TX[7:0]. Per priority bit mask (uint)
parm: pfcrx:Priority based Flow Control policy on RX[7:0]. Per priority bit mask (uint)
parm: num_lro:Dummy parameter for backward compatibility (uint)
parm: rss_mask:Dummy parameter for backward compatibility (uint)
parm: rss_xor:Dummy parameter for backward compatibility (uint)
parm: enable_tc:Enable separate queues for traffic classes (uint)
parm: inline_thold:Threshold for using inline data (range: 17-104, default: 104) (uint)