Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6211 articles
Browse latest View live

Setting up FDR infiniband

$
0
0

Hi ,

Please, i am a real beginner in Infiniband interconnect !, and need help please.

I want to set up infiniband FDR over HPC cluster, I haven't yet order the parts, but i am thinking to the followings :

  • Mellanox FDR single port on servers DELL C6220, CX383A ( 56 Gbps) [I think actually this is FDR10 at 40Gbps !].
  • Mellanox FDR Switch MSX6036F ( 56 Gbps).

First of all, my interest is Latency more than bandwidth, First question:

  • should i buy  QSFP FDR  cables or QSFP QDR would work correctly ?.
  • On cards & Switch descriptions ,  ports are mentioned as QSFP, but i see that there are QSFP+ cables, would they work too, or simply they are the same  ,

 

In other hand, regarding software & OS, the cluster is running very well with 10GbE SFP+ on Linux CentOS, then setting up infiniband interconnect would be more complected ( things to do and things to avoid) ?

 

I certainly have other questions but, at this time........

Thanks for help

BR


Re: Can anybody provide steps on how to run RoCE over VXLAN ?

$
0
0

Hi,

Before going with RoCE. Are you able to run TCP/IP traffic (including UDP, ICMP) over VXLAN? RoCE v2 based on UDP.

Re: ConnectX®-4 EN Adapter Card support for NetApp EF570

$
0
0

Hi Mohammed,

Mellanox HCA are compatible with SPECS, protocols, etc. When asking about compatibility against external appliance it might be better to ask regarding specific features if it supported by adapter. Are you looking for RoCE support - it is supported, NVMe - it is supported. If your question is if NetApp EF570 supports specific piece of the hardware like Mellanox HCA, the question should be addressed to NetApp as they use their own version of the driver and the firmware and might have some hardware compatibility matrix.

Re: Using SN2100 or SN2700 as PTP master clock?

Re: MSX1012B MSX6012F

$
0
0

Thank you for your response. It really help !

Concurrent INFINIBAND multicast writers

$
0
0

Hi everyone!

 

I am working on a project in which I have a small set of servers with ConnectX 3 HCAs connected to an IS5030 switch.

 

No IP, just IB.

 

Given either 1 or many multicast groups, with one reader and one writer on each machine with the appropriate cpu affinity,

I observe the following behavior:

 

Only 1 writer in the cluster, everything else reads: the only increments in XmitWait is on the sending HCA that is just trying to get the multicast packets to the switch.

 

All of the IB counters on everything look great, even at many multiples of message rate compared the problem scenario below.

 

If I introduce just 1 more multicast writer into the mix and they are both at 5k msg/sec, XmitWait on the transmitting switch ports for the multicast group start growing.  The more writers, the worse it gets.

 

A subnet manager is running on the switch.  I have tried segregating the traffic into different VLs and turning on congestion control.

 

There is something about two machines generating multicast traffic to the same switch at any decent frequency.

 

I'm using 4k buffers but my message size is only 512 bytes.

 

Does anyone have any insight into what would be causing the congestion?

Re: Support for "INBOX drivers?" for 18.04/connected mode?

$
0
0

Hi Bill,

 

It would be great if you could run the following commands to validate if the parameter ipoib_enhanced is supported on your system with inbox driver:

 

1. #find /lib/modules -iname '*ib_ipoib*'

(Example output from host running Ubuntu 16.04)

/lib/modules/4.4.0-134-generic/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko

/lib/modules/4.4.0-116-generic/updates/dkms/ib_ipoib.ko

 

The first line denotes Inbox driver and second line denotes mlnx driver

 

2.#modinfo  /lib/modules/4.4.0-134-generic/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko

 

Check which parameters are supported(You will see in output, lines that start with "parm:").

 

Compare the same by running modinfo with one which has mlnx driver

 

#modinfo  /lib/modules/4.4.0-116-generic/updates/dkms/ib_ipoib.ko

 

Please share the outputs.

Re: ConnectX-5 error: Failed to write to /dev/nvme-fabrics: Invalid cross-device link

$
0
0

Please  run # dmesg | grep "enabling port" - check if you get "....nvmet_rdma: enabling port...."


Re: Login directly to Enable Mode

$
0
0

Hello Donny Hariady,

 

Thank you for contacting Mellanox Global Support, this is Paul and your case is currently under my care. The answer to your question is NO from the switch. There is no knob on the switch to skip.

 

But you may rely on the auto login from SecureCRT or Xshell to archive it.

Mellanox on vSphere - Error Extracting File

$
0
0

Brand new card, new in server.  It likely has firmware back rev issue.. but server is vSphere 6.7 so working through process to get it up and working for iSER

 

When I follow documentation to deploy driver (which it basically is working as the driver goes live) I am trying to install per the installation guide but getting this error:

 

 

############

[root@x395001:/tmp] esxcli software acceptance set --level=PartnerSupported

Host acceptance level changed to 'PartnerSupported'.

[root@x395001:/tmp] esxcli software sources profile list -d /tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip

[MetadataDownloadError]

Could not download from depot at zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml, skipping (('zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml', '', 'Error extracting index.xml from /tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip: "There is no item named \'index.xml\' in the archive"'))

        url = zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml

Please refer to the log file for more details.

[root@x395001:/tmp] lspci | grep Mellanox

0000:18:00.0 Network controller: Mellanox Technologies MT27800 Family [ConnectX-5] [vmnic8]

0000:18:00.1 Network controller: Mellanox Technologies MT27800 Family [ConnectX-5] [vmnic9]

[root@x395001:/tmp] esxcli software vib list | grep nmlx

nmlx4-core                     3.16.11.6-1OEM.650.0.0.4598673         MEL       VMwareCertified   2018-07-31

nmlx4-en                       3.16.11.6-1OEM.650.0.0.4598673         MEL       VMwareCertified   2018-07-31

nmlx4-rdma                     3.16.11.6-1OEM.650.0.0.4598673         MEL       VMwareCertified   2018-07-31

nmlx5-core                     4.16.12.12-1OEM.650.0.0.4598673        MEL       VMwareCertified   2018-07-31

nmlx5-rdma                     4.16.12.12-1OEM.650.0.0.4598673        MEL       VMwareCertified   2018-07-31

[root@x395001:/tmp]

[root@x395001:/tmp] esxcli software sources profile list -d /tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip

[MetadataDownloadError]

Could not download from depot at zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml, skipping (('zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml', '', 'Error extracting index.xml from /tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip: "There is no item named \'index.xml\' in the archive"'))

        url = zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml

Please refer to the log file for more details.

[root@x395001:/tmp]

########

 

 

I downloaded it twice to validate it was not download issue.  I tried two separate servers and same issue.

Windows Firmware - flint "no command found"

$
0
0

New fresh install windows 2016 with Mellanox MCX512A-ACAT

 

 

Make sure firmware is updated

 

Firmware Flash Card

http://www.mellanox.com/page/management_tools

 

 

PS C:\Windows\system32> cd 'C:\Program Files\Mellanox\WinMFT\'

PS C:\Program Files\Mellanox\WinMFT> mst status

MST devices:

------------

 

mt4119_pciconf0

PS C:\Program Files\Mellanox\WinMFT> mst status -v

MST devices:

------------

 

  mt4119_pciconf0        bus:dev.fn=1a:00.0

 

  mt4119_pciconf0.1      bus:dev.fn=1a:00.1

PS C:\Program Files\Mellanox\WinMFT>  mlxfwmanager -d mt4119_pciconf0 --query

Querying Mellanox devices firmware ...

 

Device #1:

----------

 

  Device Type:      ConnectX5

  Part Number:      MCX512A-ACA_Ax

  Description:      ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; tall bracket; ROHS R6

  PSID:             MT_0000000080

  PCI Device Name:  mt4119_pciconf0

  Base GUID:        98039b0300325eba

  Base MAC:         98039b325eba

  Versions:         Current        Available

     FW             16.22.1002     N/A

     PXE            3.5.0403       N/A

     UEFI           14.15.0019     N/A

 

  Status:           No matching image found

 

PS C:\ftp\Mellanox_iSER\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin> dir

 

 

    Directory: C:\ftp\Mellanox_iSER\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin

 

 

Mode                LastWriteTime         Length Name

----                -------------         ------ ----

------        7/12/2018   8:57 AM       16777216 fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin

 

 

PS C:\ftp\Mellanox_iSER\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin> flint.bat -d mt4119_pciconf0 -i .\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin

No command found.

PS C:\ftp\Mellanox_iSER\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin> flint

No options found.

copy C:\ftp\Mellanox_iSER\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin C:\Program Files\Mellanox\WinMFT\

PS C:\Program Files\Mellanox\WinMFT> .\flint_ext.exe -d mt4119_pciconf0 -i .\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin

No command found.

PS C:\Program Files\Mellanox\WinMFT> .\flint.bat -d mt4119_pciconf0 -i .\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin

No command found.

PS C:\Program Files\Mellanox\WinMFT>

 

 

I know the firmware is back rev.  I try simple run of flint command per document.  then ".exe" vs ".bat"  then move firmware bin file into same directory as flint binary..  no change..  But when I run "flint" by itself it gives different response that I am missing options so "No command found"  not helpful at all as to what is not being found.

 

thanks,

Re: Mellanox on vSphere - Error Extracting File

$
0
0

I think I see issue here... it is a zip within a zip.

 

[root@x385004:/tmp] unzip MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip

Archive: MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip

  inflating: MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-offline_bundle-8873266.zip

  inflating: doc/README.txt

  inflating: doc/open_source_licenses_nmlx5-core_4.17.13.8-1OEM.670.0.0.8169922.txt

  inflating: doc/release_note_nmlx5-core_4.17.13.8-1OEM.670.0.0.8169922.pdf

  inflating: doc/open_source_licenses_nmlx5-rdma_4.17.13.8-1OEM.670.0.0.8169922.txt

  inflating: doc/release_note_nmlx5-rdma_4.17.13.8-1OEM.670.0.0.8169922.pdf

  inflating: source/driver_source_nmlx5-core_4.17.13.8-1OEM.670.0.0.8169922.tgz

  inflating: source/driver_source_nmlx5-rdma_4.17.13.8-1OEM.670.0.0.8169922.tgz

[root@x385004:/tmp] esxcli software vib install -d /tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-offline_bundle-8873266.zip

Installation Result

   Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.

   Reboot Required: true

   VIBs Installed: MEL_bootbank_nmlx5-core_4.17.13.8-1OEM.670.0.0.8169922, MEL_bootbank_nmlx5-rdma_4.17.13.8-1OEM.670.0.0.8169922

   VIBs Removed: MEL_bootbank_nmlx5-core_4.16.12.12-1OEM.650.0.0.4598673, MEL_bootbank_nmlx5-rdma_4.16.12.12-1OEM.650.0.0.4598673

   VIBs Skipped:

[root@x385004:/tmp]

 

 

<sigh>  Be nice if it was noted as such anywhere in readme.  Or name the zip such that it needs unzip before scp to vsphere host.

Re: Windows Firmware - flint "no command found"

$
0
0

Flashing firmware on cards...so how many tools and paths do we have here?

 

we have flint (noted above).

 

we have "WinMFT64"   and that did not work

 

then I found "mlxup"   which  downloaded for windows and ran just fine..  noted firmware level, found new firmware on internet and ran installation.

 

Windows

#########

PS C:\ftp\Mellanox_iSER> .\mlxup

Querying Mellanox devices firmware ...

 

 

Device #1:

----------

 

 

  Device Type:      ConnectX5

  Part Number:      MCX512A-ACA_Ax

  Description:      ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; tall bracket; ROHS R6

  PSID:             MT_0000000080

  PCI Device Name:  mt4119_pciconf0

  Base GUID:        98039b0300325eba

  Base MAC:         98039b325eba

  Versions:         Current        Available

     FW             16.22.1002     16.23.1020

     PXE            3.5.0403       3.5.0504

     UEFI           14.15.0019     14.16.0017

 

 

  Status:           Update required

 

 

---------

Found 1 device(s) requiring firmware update...

 

 

Perform FW update? [y/N]: y

Device #1: Updating FW ...                                                                                                                                                                        Done

 

 

Restart needed for updates to take effect.

Log File: C:\Users\TEMP\AppData\Local\Temp\mlxup-20181015_141528_6900.log

PS C:\ftp\Mellanox_iSER>

#########

 

And Linux ..... ppcle and x86 just fine

#######

[root@l82471 ~]# ./mlxup

Querying Mellanox devices firmware ...

 

 

Device #1:

----------

 

 

  Device Type:      ConnectX5

  Part Number:      MCX512A-ACA_Ax

  Description:      ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; tall bracket; ROHS R6

  PSID:             MT_0000000080

  PCI Device Name:  0002:01:00.0

  Base GUID:        98039b0300325f82

  Base MAC:         98039b325f82

  Versions:         Current        Available

     FW             16.22.1002     16.23.1020

     PXE            3.5.0403       3.5.0504

     UEFI           14.15.0019     14.16.0017

 

 

  Status:           Update required

 

 

---------

Found 1 device(s) requiring firmware update...

 

 

Perform FW update? [y/N]: y

 

#########

 

 

but...... failed miserably on vmware vsphere 6.7..  so still in firmware purgatory for that OS.

#######

[root@x395001:/tmp] ls

LenovoIMMLog.log                                                               pciList.txt

cimple_log_err_messages                                                        probe.session

fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin  scratch

ipp                                                                            sfcb

libmofc.log                                                                    vmware-root

lspci.txt                                                                      vmware_version.txt

mlxup                                                                          wbem-vm-report.xml

[root@x395001:/tmp] mlxup

-sh: mlxup: not found

[root@x395001:/tmp] ./mlxup --query

-sh: ./mlxup: Permission denied

[root@x395001:/tmp] chmod +x mlxup

[root@x395001:/tmp] ls

LenovoIMMLog.log                                                               pciList.txt

cimple_log_err_messages                                                        probe.session

fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin  scratch

ipp                                                                            sfcb

libmofc.log                                                                    vmware-root

lspci.txt                                                                      vmware_version.txt

mlxup                                                                          wbem-vm-report.xml

[root@x395001:/tmp] ./mlxup

-E- cannot use a string pattern on a bytes-like object

[root@x395001:/tmp]

#######

Re: Symbol Errors

$
0
0

Hi chandrahas amradkar,

 

There is no need to calculate symbol errors - the symbol error counter is available to query on each physical IB port.

Re: rx-out-of-buffer

$
0
0

Were you able to find answers to your query? I have the same question


Re: RoCE - Cisco catalyst 4506 switch

Re: rx-out-of-buffer

$
0
0

No. Even if Mellanox force-accepted two answers, they like good stats apparently.

 

If you have any clue, don't hesitate to share.

 

My fear is that they will not disclose this info, like any internals about the NICs. There is some internal buffers somewhere that we run out of. The queues "imissed" counters queried via DPDK clearly report 0 loss, so it's not the queues that lacks of buffers, or there is a bug in reporting if it is. Like those infinite flow table that work like magic, they probably won't say anything...

Re: Windows Firmware - flint "no command found"

Re: rx-out-of-buffer

$
0
0

Hmm. That's not fair.

 

On a lighter note, looks like question-accepted-stats and mlx5-ethtool-counter-stats -- both are misleading

Re: Setting up FDR infiniband

Viewing all 6211 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>