No worries. Which OS are you using?
Is there any chance you could do stuff on CentOS/RHEL 6.4?
Asking that because it's what I'm super familiar with.
If you're ok with that, please install the CentOS/RHEL provided IB software, and also pciutils:
$ sudo yum groupinstall "Infiniband Support"
$ sudo yum install mstflint pciutils
$ sudo chkconfig rdma on
$ sudo service rdma start
Then let's do some basic info gathering so we know what we're dealing with.
- Run lspci -Qvvs on the ConnectX card, and at least one of the Infinihost III's, then post the results here
- Also query the firmware of both using mstflint
Example from a ConnectX card here. First I find out it's PCI address in the box:
$ sudo lspci |grep Mell
01:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX VPI PCIe 2.0 2.5GT/s - IB DDR / 10GigE] (rev a0)
Then use lspci -Qvvs on that address, to retrieve all of the potentially useful info:
$ sudo lspci -Qvvs 01:00.0
01:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX VPI PCIe 2.0 2.5GT/s - IB DDR / 10GigE] (rev a0)
Subsystem: Mellanox Technologies Device 0006
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f7c00000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at f0000000 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Product Name: Eagle DDR
Read-only fields:
[PN] Part number: 375-3549-01
[EC] Engineering changes: 51
[SN] Serial number: 1388FMH-0905400010
[V0] Vendor specific: PCIe x8
[RV] Reserved: checksum good, 0 byte(s) reserved
Read/write fields:
[V1] Vendor specific: N/A
[YA] Asset tag: N/A
[RW] Read-write area: 111 byte(s) free
End
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 2.5GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
Note the blue highlighted bits. For ConnectX cards this stuff is useful. For my card, it's showing a Sun part number, as it was originally a Sun badged card (now reflashed to stock firmware). The PCI link is in x8 state too, which is useful (if it wasn't, it would indicate a problem).
And the mstflint output example:
$ sudo mstflint -d 01:00.0 q
Image type: ConnectX
FW Version: 2.9.1000
Device ID: 25418
Description: Node Port1 Port2 Sys image
GUIDs: 0003ba000100edb8 0003ba000100edb9 0003ba000100edba 0003ba000100edbb
MACs: 0003ba00edb9 0003ba00edba
Board ID: (MT_04A0120002)
VSD:
PSID: MT_04A0120002
That tells us the firmware version on the card. Useful to know, as it might need upgrading (very easy to do).
After you've pasted that info here, we can start figuring out if there's anything wrong with the basics first and fix them. Then we can move onto the next stuff.
(note - edited for typo fixes)