Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6211 articles
Browse latest View live

Re: send_bw test between QSFP ports on Dual Port Adapter

$
0
0

Hi Dmitri,

 

For testing tcp performance in Windows we recommend using nttcp tool, From the command line kindly run the NTttcp test and provide the output, the NTttcp tool is provided by Microsoft to test the network performance.

 

For example:

Server side: ntttcp.exe -s -m 8,*,<client ip> -l 128k -a 2 -t 30

Client side: ntttcp.exe -r -m 8,*, <client ip> -l 128k -a 2 -t 30

 

For your reference kindly see the download and explanation link:

https://gallery.technet.microsoft.com/NTttcp-Version-528-Now-f8b12769

 

Please let me know about the results.

Karen.


Re: How to configure host chaining for ConnectX-5 VPI

$
0
0

Hi Daniel,

 

I wanted to thank you for this directions they were very helpful. I was successful in linking three nodes together, all running Ubuntu 18.04. I was able to get ~96Gbs in speed between all the host using iperf2. I then took one of the boxes and loaded ESXi 6.7, and configured the same IP address on the two interface I had before. The VMware box can not communicate with the others now. I can communicate through the Nic between the other Ubuntu boxes. When I run a tcpdump on the ESXi I see the ARP request getting created, but get no response. I am wondering if you have any idea why the Chaining feature does not seem to work with ESXi?

 

Thanks

Shawn

Can't ibping Lid or GUID but can ping by ip

$
0
0

We are using an SB7790 unmanaged switch connected to:

  1. VMWARE (6.5) server with opensm on a guest Centos VM (7.5) - Mellanox ConnectX-4
  2. Server with Ubuntu (16.04.5 LTS) - Mellanox ConnectX-4
  3. Have all updated

 

Successful items:

  • Opensm is running (active) from Centos VM
  • ibstat finds all interfaces with active and linkup.
  • ibnetworkdiscover finds all interfaces connected
  • We can ping by ip to and from each server

 

Unsuccessful item:

  • Not able to ibping across switch

 

We're not sure what we might be missing.

 

Can't find many resources to do more troubleshooting. Anyone that could help would be greatly appreciated!

 

Thanks

Brian

Re: SN2100B v3.6.8004

$
0
0

Hi Reginald,

 

The reason for this is because of an enhanced security feature added for all versions starting from Mellanox Onyx/OS 3.6.8004 and above - HTTP is disabled by default. Therefore, we are not able to reach the GUI after upgrading to 3.6.8004 and above.

There are 2 possible solutions:

1.  Use HTTPS instead of HTTP to log into the GUI

2.  You can enable http by using the following commands:

      switch(config)# no web https ssl secure-cookie enable

      switch(config)# web http enable

      switch(config)# write memory

Now you can use HTTP and HTTPS connections to log into the GUI

 

Hope this helps

 

Thanks,

Pratik Pande

Re: Assign a MAC to a VLAN

$
0
0

Hi, that is not supported. The VLANs are separated only by port # and VLAN ID.

Soft-RoCE on mininet topology

$
0
0

Hi Team,

 

On two mininet VMs (on virtual box), I am able to run RDMA client and server and can also send traffic using rping tool. (Using link :- HowTo Configure Soft-RoCE )

 

Issue -

 

I have created 1switch 1 host topology on either VMs and connected both switch using GRE tunnel. (Host 1 can ping Host2 and also Host2 can ping Host1).

 

When tried to couple veth with rxe device got the error "sh: echo: I/O error".

 

Can you please suggest on Soft-RoCE working for mininet topology.

 

Thanks

Re: How to configure host chaining for ConnectX-5 VPI

$
0
0

You're welcome!

I'm glad I helped someone after all the headache I went through for it.

 

I have no hard experience with VMWare, and so take all of this with a grain of salt.

 

First thought is vlan tags. I was told that VMWare tags by default.

 

From my (limited) understanding and thoughts, host chaining inside VMware is not a good idea.

If you setup a virtual switch (on the vmware side) and put both ports of the card on the switch, give that switch an IP, that would allow for vmotion and such over the link at close to line speed. Letting the switch (analogous to openvswitch) do all of the routing, and fast pathing.

 

Thoughts - If there was host chaining:

Vmware still sees both ports (we can't assign IPs to raw port interfaces to start with.)

It doesn't really know which port to send out, so it could take the extra hop before it gets to the destination.

Three node, desired going from A -> B might take the path of A -> C -> B

 

Where I can talk is non-chaining speed.

We did try using openswitch and the cards with chaining off. So long as the stp stuff is turned on; we got nearly line speed.

 

We opened a support ticket for our problems with MTU. It took a while, but we found the problem.

They have a nice little utility (sysinfo-snapshot) for seeing the card internals and OS config options which helped us (by looking through it.)

Can the cable of an AOC be replaced?

$
0
0

Hi all,

 

I've got some FDR AOCs with damaged cables. I'm hoping to reuse the transceivers instead of scrapping them. I opened op the top panel on one of the transceivers and saw that the does disconnect internally. Are there replacement cables that have those little ferules on the end, or an adapter to convert the transceiver into a standalone?

 

Thanks you


mlx5: ethtool -m not working

$
0
0

I have a ConnectX-4 2x100G. I'm running Linux 4.16.16 (Fedora) with the mlx5_core kernel module installed. ethtool -m does not appear to work with this setup. Other ethool commands work fine such as ethtool -S and ethtool -i and just plain ethtool. I have an official Mellanox active optical cable transceiver plugged into the port. What is required to get the transceiver module info from the card? I've checked that the firmware is the latest version (MT_2150110033), this is part number MCX416A-CCAT.

 

$ ethtool -m enp9s0f0

Cannot get module EEPROM information: Input/output error

 

$ ethtool -i enp9s0f0

driver: mlx5_core

version: 5.0-0

firmware-version: 12.12.1100 (MT_2150110033)

expansion-rom-version:

bus-info: 0000:09:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: no

supports-register-dump: no

supports-priv-flags: yes

 

$ lspci | grep Mel

09:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

09:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

 

$ ethtool enp9s0f0

Settings for enp9s0f0:

    Supported ports: [ FIBRE ]

    Supported link modes:   10000baseKR/Full

                            40000baseCR4/Full

                            40000baseSR4/Full

                            40000baseLR4/Full

                            25000baseCR/Full

                            25000baseSR/Full

                            50000baseCR2/Full

                            100000baseSR4/Full

                            100000baseCR4/Full

                            100000baseLR4_ER4/Full

    Supported pause frame use: Symmetric

    Supports auto-negotiation: Yes

    Supported FEC modes: Not reported

    Advertised link modes:  10000baseKR/Full

                            40000baseCR4/Full

                            40000baseSR4/Full

                            40000baseLR4/Full

                            25000baseCR/Full

                            25000baseSR/Full

                            50000baseCR2/Full

                            100000baseSR4/Full

                            100000baseCR4/Full

                            100000baseLR4_ER4/Full

    Advertised pause frame use: Symmetric

    Advertised auto-negotiation: Yes

    Advertised FEC modes: Not reported

    Speed: 100000Mb/s

    Duplex: Full

    Port: FIBRE

    PHYAD: 0

    Transceiver: internal

    Auto-negotiation: on

    Supports Wake-on: d

    Wake-on: d

    Current message level: 0x00000004 (4)

                   link

    Link detected: yes

Re: Assign a MAC to a VLAN

CX5 - bad system state

$
0
0

I'm working with Xilinx Petalinux on a Xilinx PG213 core as root complex, so in general, there is no confidence in the HW or SW.

CX5 gets pretty far along before it fails with:

 

[    4.447417] pci 0000:01:00.0: calling mellanox_check_broken_intx_masking+0x0/0x168                                                                                

[    4.454965] mlx5_core 0000:01:00.0: runtime IRQ mapping not provided by arch                                                                                      

[    4.462017] mlx5_core 0000:01:00.0: enabling device (0000 -> 0002)                                                                                                

[    4.468151] mlx5_core 0000:01:00.0: enabling bus mastering                                                                                                        

[    4.473941] mlx5_core 0000:01:00.0: firmware version: 16.22.1002                                                                                                  

[    4.700002] mlx5_core 0000:01:00.0: mlx5_cmd_check:710:(pid 1710): MANAGE_PAGES(0x108) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x4e2106)      

[    4.713926] mlx5_core 0000:01:00.0: give_pages:311:(pid 1710): func_id 0x0, npages 14972, err -5                                                                  

[    4.742890] mlx5_core 0000:01:00.0: failed to allocate init pages                                

 

Any clues on if this points to a HW problem? Or a SW problem?

Re: CX5 - bad system state

Keeping two versions driver for two kernels

$
0
0

Hi,

 

How to set in the installation script no removal of the old driver. I have two kernels (both needed):
1. Centos 7.5 (./install --eth-only);
2. Centos 7.5 + patch RT (compilation, ./install --eth-only -add-kernel-support).

Unfortunately, one driver uninstalls another during installation. This effectively blocks the use of the latest drivers for both kernels.

 

Please help.

 

Best Regards,

Robert

Slow File Transfer On 20Gbps IB

$
0
0

Dear All,

 

I am new in Infiniband devices. I bought Mellanox 2 pieces of Connectx-2 (20Gbps) from ebay and installed them on 2 debian servers (PCIE3 8 lanes) with no problem.  I had got 15Gbps measured with iperf3 as follow:

iperf3 -c 10.20.0.34

Connecting to host 10.20.0.34, port 5201

[  4] local 10.20.0.35 port 58208 connected to 10.20.0.34 port 5201

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd

[  4]   0.00-1.00   sec  1.85 GBytes  15.9 Gbits/sec    0   11.9 MBytes      

[  4]   1.00-2.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   2.00-3.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   3.00-4.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   4.00-5.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   5.00-6.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   6.00-7.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   7.00-8.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   8.00-9.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   9.00-10.00  sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bandwidth       Retr

[  4]   0.00-10.00  sec  18.2 GBytes  15.6 Gbits/sec    0             sender

[  4]   0.00-10.00  sec  18.2 GBytes  15.6 Gbits/sec                  receiver

 

 

But, why do I only get 150MB/s (about 1.2Gbps) while transfer a large file (3.5GB) via SCP and RSYNC?

I think no problem with disk I/O because I transfer from and to ramdisk.

 

I appreciate your helps. Thank you very much.

Windows 2016 Storage Spaces Direct over IPoIB

$
0
0

Hello,

 

I am in need of some assistance regarding Ethernet vs Infiniband IPoIB and lossless networks.

 

We have a 3 Node Windows 2016 Storage Spaces Direct Cluster that was setup early last year when documentation on S2D was still fairly sparse.  We used Infiniband IPoIB instead of Ethernet because we have been using for years to connect our Hyper-V Clusters to our Windows Storage SANs.  The S2D setup is Hyper-converged, storage and hypervisors are separate, so the storage data VM/Ethernet traffic are not over the same network.

 

We currently have a case open with Microsoft related to the Windows Server May Rollup which caused a problem with a VD after a server restart.  The MS engineers have stressed that everything must be perfect in the networking creating a lossless network, including RoCE and QoS setup.

 

Since we are using IPoIB it has brought up the question is our configuration correct.  Does Infiniband IPoIB provide the resiliency needed for S2D traffic?

 

Please excuse me if the question seem too simple.  I have been reading on RoCE and IPoIB for a couple days and I think all the info is confusing me.

 

One added factor.   Since S2D was new at the time and there were a variety of unknowns we included 4x 56Gb ( 2x MCX-354a-fcbt ) ports in each node.  The intent being to over-spec the network to reduce the possibility of congestion.

 

Thanks,

 

Todd


Re: Web interface error on SX6036

$
0
0

Hi Andrew,

Can you provide with version of Mellanox OS running on the switch?

 

Thanks,

Pratik

Remote VTEP mac learning is not working

$
0
0

pastedImage_0.png

 

I'm trying VXLAN configuration with above topology with Mellanox switches(with mellanox OS) as leaves and Cisco N9k as Spine. Both hosts are configured with vlan 10 tagging. Loopbaks on leaf switches are reachable via Spine. swp16 is configured as nve port and vlan 10 is bridged to VNI 10000 on both leaves. This is controller-less configuration and remote VTEPS are added manually using CLI and remote learning is enabled using below commands.

protocol nve

interface nve 1

interface nve 1 vxlan source interface loopback 1

interface ethernet 1/16 nve mode only force

interface nve 1 nve bridge 10000

interface ethernet 1/16 nve vlan 10 bridge 10000

no interface nve 1 nve fdb flood load-balance

interface nve 1 nve fdb flood bridge 10000 address 3.3.3.3

interface nve 1 nve fdb learning remote

 

But the hosts are not able to ping each other. What could be the problem here?

I could see that the VTEP on each switch has learnt the MAC address of the directly connected host. But unable to learn the MAC of the hosts belonging to remote VTEP. I used below command to check MAC learned.

show interface nve 1 mac-address-table

Also nve counters are increased when host2 is pinged from host1. But no packets are going out of swp2.

show interface nve 1 counters

Re: Problem installing MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-x86_64

$
0
0

Hi Sebastian,

 

1) Have you validated based on the RN of the drivers that the following packages were installed:

 

apt-get install perl dpkg autotools-dev autoconf libtool automake1.10

automake m4 dkms debhelper tcl tcl8.4 chrpath swig

graphviz tcl-dev tcl8.4-dev tk-dev tk8.4-dev bison flex dpatch

zlib1g-dev curl libcurl4-gnutls-dev python-libxml2 libvirt-bin

libvirt0 libnl-dev libglib2.0-dev libgfortran3 automake m4

pkg-config libnuma logrotate ethtool lsof

 

2) Did you try to install the latest driver version 4.4-2.0.7.0.

 

3) Can you run it with the following options:

 

./mlnx_add_kernel_support.sh --make-tgz -t /var/tmp/MOFED -k `uname -r` -s /usr/src/kernels/`uname -r` -m . -n MLNX_OFED_LINUX-4.4-2.0.7.0-ubuntu18.04-x86_64-`uname -r` -v

 

Possibly add: --distro ubuntu18.04

 

Sophie.

Re: when using write op with more than 1024B(MTU) in softroce mode,the operation fail

$
0
0

Hi Tianyu,

 

Have you properly configured Soft-Roce whether upstream or Mellanox OFED Driver.

See reference links below:

 

HowTo Configure Soft-RoCE

How to configure Soft-RoCE with Mellanox OFED 4.x

 

Also, you original statement is confusing or contradicting itself:

 

when my write opcode with length=1024, it is ok. but when length=1025 in the same code, it will fail.

when the same code with length=1024 or 1025 run using mellanox CX4 card, it is ok >>> Apparently working.

 

Sophie.

Re: How to configure host chaining for ConnectX-5 VPI

$
0
0

Hi,

 

I have problem to pinging between the nic, this is my configuration:

 

SERVER 1: PORT1:192.168.10.10 PORT2: 192.168.10.11

SERVER 2: PORT1:192.168.10.12 PORT2: 192.168.10.13

SERVER 3: PORT1: 192.168.10.14 PORT2: 192.168.10.15

 

mlxconfig -d mt4119-pciconf0 set LINK_TYPE_P1=2  LINK_TYPE_P2=2

mlxconfig -d mt4119-pciconf0 set HOST_CHAINING_MODE=1

mlxfwreset --device mt4119_pciconf0 reset

 

All commands works perfect, but only pingin ports interconnected, i need pinging all ports.

 

My configuration is correct?

Viewing all 6211 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>