Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6211 articles
Browse latest View live

Re: kworker has a high CPU usage

$
0
0

Hello,

Thank you for posting your question on the Mellanox Community.

Based on the information provided, we need some more additional information to debug the issue.
MLNX_OFED version used
OS release and kernel version used
How-to reproduce

Many thanks.
~Mellanox Technical Support 


Re: Help with Mellanox MHGA28-XTC InfiniHost III Ex Installation on Windows

$
0
0

Hi Graham,

Thank you for posting your question on the Mellanox Community.

Unfortunately, the Mellanox Infinihost III adapter MHGA28-XTC is EOL for awhile. Our latest drivers do not provide any support for this HCA.

I have found the following post online, regarding the bring-up of these HCA's under Windows 7.
The link to the post -> http://andy-malakov.blogspot.com/2015/03/connecting-two-windows-7-computers-with.html

Hopefully the post's instructions will resolve your issue.

Thanks and regards,
~Mellanox Technical Support

Hang occured when mount by glusterfs using driver 4.4.2 for MT27500 Family [ConnectX-3 on CentOS 7.1

$
0
0

hello,

After I changed to driver from 4.1 to 4.4.2 on Centos 7.1, when I mount and umount glusterfs some times, the system was hanged.

The below is the screen shot of console output and dmesg output when do mount.

It is ok while using dirver 4.1, I don't know how to debug this problem and it blocked me for a long time。

I need some advice to workround with this problem, thanks.

 

rdma_20181108135914.png

 

[Fri Nov 2 15:07:38 2018] WARNING: at /var/tmp/OFED_topdir/BUILD/mlnx-ofa_kernel-4.4/obj/default/drivers/infiniband/core/cma.c:666 cma_acquire_dev+0x268/0x280 [rdma_cm]()

[Fri Nov 2 15:07:38 2018] Modules linked in: fuse ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop bonding rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) iTCO_wdt dcdbas iTCO_vendor_support mxm_wmi intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses ipmi_devintf enclosure pcspkr ipmi_si ipmi_msghandler wmi acpi_power_meter shpchp lpc_ich mei_me sb_edac edac_core mei ip_tables xfs libcrc32c mlx4_ib(OE) mlx4_en(OE)

[Fri Nov 2 15:07:38 2018] ib_core(OE) sd_mod crc_t10dif crct10dif_generic mgag200 crct10dif_pclmul drm_kms_helper crct10dif_common crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mpt3sas raid_class drm scsi_transport_sas nvme ahci ixgbe libahci igb mdio libata ptp i2c_algo_bit mlx4_core(OE) i2c_core pps_core megaraid_sas devlink dca mlx_compat(OE) fjes dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate sg

[Fri Nov 2 15:07:38 2018] CPU: 10 PID: 18958 Comm: glusterfs Tainted: P W OE ------------ 3.10.0-514.26.2.el7.x86_64 #1

[Fri Nov 2 15:07:38 2018] Hardware name: Dell Inc. PowerEdge R730xd/0WCJNT, BIOS 2.4.3 01/17/2017

[Fri Nov 2 15:07:38 2018] 0000000000000000 000000009310e9fa ffff8807f176bcf0 ffffffff81687133

[Fri Nov 2 15:07:38 2018] ffff8807f176bd28 ffffffff81085cb0 ffff8810498cec00 0000000000000000

[Fri Nov 2 15:07:38 2018] 0000000000000001 ffff88104f5a71e0 ffff8807f176bd60 ffff8807f176bd38

[Fri Nov 2 15:07:38 2018] Call Trace:

[Fri Nov 2 15:07:38 2018] [<ffffffff81687133>] dump_stack+0x19/0x1b

[Fri Nov 2 15:07:38 2018] [<ffffffff81085cb0>] warn_slowpath_common+0x70/0xb0

[Fri Nov 2 15:07:38 2018] [<ffffffff81085dfa>] warn_slowpath_null+0x1a/0x20

[Fri Nov 2 15:07:38 2018] [<ffffffffa0aed1c8>] cma_acquire_dev+0x268/0x280 [rdma_cm]

[Fri Nov 2 15:07:38 2018] [<ffffffffa0af214a>] rdma_bind_addr+0x85a/0x910 [rdma_cm]

[Fri Nov 2 15:07:38 2018] [<ffffffff8120e5e6>] ? path_openat+0x166/0x490

[Fri Nov 2 15:07:38 2018] [<ffffffff8168a982>] ? mutex_lock+0x12/0x2f

[Fri Nov 2 15:07:38 2018] [<ffffffffa082c104>] ucma_bind+0x84/0xd0 [rdma_ucm]

[Fri Nov 2 15:07:38 2018] [<ffffffffa082b71b>] ucma_write+0xcb/0x150 [rdma_ucm]

[Fri Nov 2 15:07:38 2018] [<ffffffff811fe9fd>] vfs_write+0xbd/0x1e0

[Fri Nov 2 15:07:38 2018] [<ffffffff810ad1ec>] ? task_work_run+0xac/0xe0

[Fri Nov 2 15:07:38 2018] [<ffffffff811ff51f>] SyS_write+0x7f/0xe0

[Fri Nov 2 15:07:38 2018] [<ffffffff81697809>] system_call_fastpath+0x16/0x1b

[Fri Nov 2 15:07:38 2018] ---[ end trace c97345452e609a78 ]---

[Fri Nov 2 15:07:38 2018] ------------[ cut here ]------------

[Fri Nov 2 15:07:38 2018] WARNING: at /var/tmp/OFED_topdir/BUILD/mlnx-ofa_kernel-4.4/obj/default/drivers/infiniband/core/cma.c:666 cma_acquire_dev+0x268/0x280 [rdma_cm]()

[Fri Nov 2 15:07:38 2018] Modules linked in: fuse ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop bonding rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) iTCO_wdt dcdbas iTCO_vendor_support mxm_wmi intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ses ipmi_devintf enclosure pcspkr ipmi_si ipmi_msghandler wmi acpi_power_meter shpchp lpc_ich mei_me sb_edac edac_core mei ip_tables xfs libcrc32c mlx4_ib(OE) mlx4_en(OE)

[Fri Nov 2 15:07:38 2018] ib_core(OE) sd_mod crc_t10dif crct10dif_generic mgag200 crct10dif_pclmul drm_kms_helper crct10dif_common crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mpt3sas raid_class drm scsi_transport_sas nvme ahci ixgbe libahci igb mdio libata ptp i2c_algo_bit mlx4_core(OE) i2c_core pps_core megaraid_sas devlink dca mlx_compat(OE) fjes dm_mirror dm_region_hash dm_log dm_mod zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate sg

Re: rx-out-of-buffer

$
0
0

So Arvind, I found at least one "lying" counter : imissed is not implemented in mlx5 driver.

Re: rx-out-of-buffer

$
0
0

So I could finally answer this specific questions with help from support :

 

The counters for "imissed" that is the number of packets that could not be delivered to a queue is not implemented with DPDK mellanox driver. So the only way to know if a specific queue dropped packets is to track it with eth_queue_count, for which I added support in the DPDK driver, it is coming in the next release.

rx_out_of_buffer is actually (what should have been) the imissed aggregated for all queues. That is the number of packets dropped because your CPU does not consume them fast enough.

 

In our case, rx_out_of_buffer did not explain all the drops.

 

So we observed that rx_packets_phy was higher than rx_good_packets. Actually, if you look at ethtool -S (that contains more counter than DPDK xstats)  you will also have rx_discards_phy.

If there is no intrinsic error in the packets (checksums etc), you'll have rx_packets_phy = rx_packets_phy + rx_discards_phy + rx_out_of_buffer.

 

So rx_discards_phy is actually (as stated in the mentioned doc above) the number of packets dropped by the NIC, not because there were not enough buffers in the queue but because there is some congestion in the NIC or the bus.

We're now investigating why that happens, but this question is resolved.

 

Tom

VXLAN offload/RSS not working under OVS switch scenario

$
0
0

Hi folks,

       We are using two Mellanox Connectx-5 EN (2-port 100 GBE) cards to validate performance under an Openstack vxlan tenant scenario using OVS bridges.   The Mellanox cards are NOT in SR-IOV mode and are currently connected back-to-back to create the vxlan tenant network with core IP addresses applied directly to the ports.  It is our understanding from reading Mellanox literature that under this scenario the Mellanox should be offloading the VXLAN tunnel udp encap/decap somewhat automatically (as per the OVS switch 'vxlan' interface configuration); however, at the moment this does not seem to be occurring.  Traffic is being generated across multiple VNID's, however all ingress traffic from the vxlan/tenant network is getting processed by a single core softirq process.  We are not able to verify if this is b/c something is not setup properly in regard to the Mellanox card/drivers or if we do not have RSS tuned as needed.   Further details below, any assistance on configuration/debug suggestions or updated documentation for this scenario under Connectx-5 would be appreciated!  

 

Further details:

We have been reading from the following document although it is targeted for Connectx-3 ( HowTo Configure VXLAN for ConnectX-3 Pro (Linux OVS) .    This document shows several items to be configured (DMFS & VXLAN port number) but it seems both of these should be enabled now under Connectx-5 and the 'mlx5_core' driver.    Also the recommended debug/log steps in this document are no longer supported under Connectx-5.   In our setup we are using the apt package install for OVS. 

 

 

--------------------------------------------------------------------

Mellanox card:  Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

Server version:   Ubuntu 18.04  (kernel 4.15.0-36-generic)

OVS version:   ovs-vsctl (Open vSwitch) 2.9.0

 

 

root@us-ral-1a-e1:~# mlxfwmanager --query

Querying Mellanox devices firmware ...

 

 

Device #1:

----------

  Device Type:      ConnectX5

  Part Number:      MCX516A-CCA_Ax

  Description:      ConnectX-5 EN network interface card; 100GbE dual-port QSFP28; PCIe3.0 x16; tall bracket; ROHS R6

  PSID:             MT_0000000012

  PCI Device Name:  /dev/mst/mt4119_pciconf0

  Base GUID:        98039b03004dd018

  Base MAC:         98039b4dd018

  Versions:         Current        Available

     FW             16.23.1020     N/A

     PXE            3.5.0504       N/A

     UEFI           14.16.0017     N/A

 

  Status:           No matching image found

root@us-ral-1a-e1:~#

In-Cast or Micro Burst on SN2000 Series Switch

$
0
0

Hi ...,

 

I would like to know whether a 100Gbps NIC is able to achieve full 100Gbps speeds, without RoCE / RDMA / VMA.

 

Consider a compute farm like scenario.

Say, there is a NAS with a 100Gps NIC, connected to an SN2100 Switch.

The 100Gbps switch is in turn connected to 4 x 48 Port 1Gbps switches using a 40G links to each switch.

i.e. 192 x 1Gbps client computers

 

|----------|                              |-----------|  ---40G--->  | 48 Port 1G Switch |

| NAS    |  ---100G NIC ---> |SN2100|  ---40G--->  | 48 Port 1G Switch |

|Storage|                             |  Switch |  ---40G--->  | 48 Port 1G Switch |

|----------|                              |-----------|  ---40G--->  | 48 Port 1G Switch |

 

Now, if all the 192 x 1Gbps clients were to read files from the storage at the same time, will the NAS NIC be able to serve at 100Gbps (assuming that there are no bottlenecks in the storage system itself)?

 

Regards,

 

Indivar Nair

Status of RDMA over Resilient Ethernet

$
0
0

Dear All,

 

at SIGCOM 2018 Mellanox announced support for RoCEv2 over Resilient Ethernet, with HCA packet drop detection (and out of order handling) ?

there is also this FAQ but no coding procedure is provide.

Introduction to Resilient RoCE - FAQ

 

how is it possible to use this feature with a Connectx-5 using IBV_SEND over UD (Unreliable Datagram) queue pair ?

 

thank for your attention


SX6025 and QSFP-LR4-40G

$
0
0

Hi,

 

we have multiple SX6025 switches and want to connect them within 2 different rooms.

We bought 4 QSFP-LR4-40G modules LC/LC 1310.

If we connect them to the switches, we dont get a link status up and running.

Is this possible or do we use the wrong switches/interfaces?

Do we have to configure the opensm to accept 40Gbit interfaces?

Best regards,

Volker

ESXi host on NEO

$
0
0

Hi all!

 

The managed hosts supported by Mellanox NEO are Linux and Windows. Is any kind of roadmap to add ESXi hosts?

 

Regards!

Unable to set Mellanox ConnectX-3 to Ethernet (Failed to query device current configuration)

$
0
0

I have three Mellanox ConnectX-3 cards, that I'm trying to setup with Proxmox (Proxmox Installer does not see Mellanox ConnectX-3 card at all? | Proxmox Support Forum)

I need to change them from Infiniband mode to Ethernet mode.

I was able to install the Mellanox Management Tools, and they can see my card:

root@gcc-proxmox:~/mft-4.10.0-104-x86_64-deb# mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
root@gcc-proxmox:~/mft-4.10.0-104-x86_64-deb# mst status
MST modules:
------------  MST PCI module loaded  MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4099_pciconf0 - PCI configuration cycles access.
  domain:bus:dev.fn=0000:41:00.0 addr.reg=88 data.reg=92  Chip revision is: 01
/dev/mst/mt4099_pci_cr0 - PCI direct access.  domain:bus:dev.fn=0000:41:00.0 bar=0xd4f00000 size=0x100000  Chip revision is: 01

However, when I tried to query the current config, it complains about the firmware version being too old.

root@gcc-proxmox:~/mft-4.10.0-104-x86_64-deb# mlxconfig -d /dev/mst/mt4099_pciconf0 q -E- Failed to open device: /dev/mst/mt4099_pciconf0. Unsupported FW (version 2.31.5000 or above required for CX3/PRO)

So I updated the firmware:

root@gcc-proxmox:~# flint -d /dev/mst/mt4099_pci_cr0 -i fw-ConnectX3-rel-2_42_5000-MCX311A-XCA_Ax-FlexBoot-3.4.752.bin burn    Current FW version on flash:  2.10.4290    New FW version:               2.42.5000


Burn process will not be failsafe. No checks will be performed.
ALL flash, including the Invariant Sector will be overwritten.
If this process fails, computer may remain in an inoperable state.


 Do you want to continue ? (y/n) [n] : y
Burning FS2 FW image without signatures - OK
Restoring signature                     - OK

But now when I try to read the config - I get a new error:

root@gcc-proxmox:~# mlxconfig -d /dev/mst/mt4099_pciconf0 q


Device #1:
----------


Device type:    ConnectX3
Device:         /dev/mst/mt4099_pciconf0


Configurations:                              Next Boot
-E- Failed to query device current configuration

Any ideas what's going on, or how to get these cards working in permanent Ethernet mode?

Re: Concurrent INFINIBAND multicast writers

$
0
0

Hi Wayne,

 

Also, this is an expected behaviour, that is, if the number of writers increase and is greater than number of readers then congestion is expected.

Re: Concurrent INFINIBAND multicast writers

$
0
0

Thank you for the input. I will pull the data for the previous comment.

Re: Hang occured when mount by glusterfs using driver 4.4.2 for MT27500 Family [ConnectX-3 on CentOS 7.1

$
0
0

Hello Malfe,

Thank you for posting your question on the Mellanox Community.

Based on the information provided, we are not able to debug the issue you are experiencing. Currently we have no issues reported with Glusterfs and MLNX_OFED 4.4.

We recommend to open a Mellanox Support case through support@mellanox.com to investigate this issue further.

Thanks and regards,
~Mellanox Technical Support

Re: ESXi host on NEO

$
0
0

Hi Diego

 

Have checked internally the the input I have is that we Mellanox does not support Esxi on Neo, neither do we have it in our roadmap to support in the future


Re: In-Cast or Micro Burst on SN2000 Series Switch

$
0
0

With no RoCE / RDMA / VMA and as per the configuration you've presented - then theoraticaly, the NAS nic of the Storage node is able to server 100Gb/s only practically the switch is likely to cause traffic-congestion In more details, when at the same time a total of 192Gb/s traffic is tunneled through 160Gb/s switch (4x 40Gb), hitting an adapter with "only "100Gb/s capability  - traffic will go into congestion state and will drop down drastically due to switch running into "buffer overflow".

A workaround will be setting Flow-control (Paused-Frames" on adapter interface & switch ports).

Re: sending order of 'segmented' UDP packets

$
0
0

Hi Sofia,

 

Could you be more specific what is not working? In the code you mentioned "without 0x05 order it ok",, after the on beginning of the comment "order seems to be random" and later, "sending order gets ok".

By the way, how do you capture the packets to see the order?

Re: Unable to set Mellanox ConnectX-3 to Ethernet (Failed to query device current configuration)

$
0
0

I have the same issue that mlxconfig cannot read the current device configuration ( Iwant to set sr-iov, my cards are already in ethernet mode). I also noticed that the bios screen cannot save the configuration for SR-IOV claiming "access denied". Any help would be much appreciated

/Louis

Re: SX6025 and QSFP-LR4-40G

$
0
0

Any ideas? Do we have to configure link speed in opensm.conf or partitions.conf?

 

opensm.log didnt recognized that we pluged in a new interface and we got no link signal.

 

Regards,

Volker

Inconsistent hardware timestamping? ConnectX-5 EN & tcpdump

$
0
0

Hi all,

 

We recently purchased a MCX516A-CCAT from the Mellanox webstore, but encountered the following issue when trying to do a simple latency measurement, using hardware timestamping.
Using the following command to retrieve system timestamps:

ip netns exec ns_m0 tcpdump --time-stamp-type=host --time-stamp-precision=nano

Which gives the following results (for example):

 

master/serverslave/client
19:36:03.883442258 IP15
19:36:03.883524725 IP15
19:36:03.883678497 IP15
19:36:03.883703809 IP15
19:36:03.883924377 IP15
19:36:03.883939231 IP15
19:36:03.883971437 IP15
19:36:03.883985143 IP15
19:36:03.884010765 IP15
19:36:03.884021139 IP15
19:36:03.884051422 IP15
19:36:03.884062029 IP15
19:36:03.884083780 IP15
19:36:03.884091661 IP15
19:36:03.884127283 IP15
19:36:03.884135654 IP15
19:36:03.884159177 IP15
19:36:03.884167900 IP15
19:36:03.884187810 IP15
19:36:03.884197308 IP15
19:36:03.883379688 IP15
19:36:03.883590507 IP15
19:36:03.883659403 IP15
19:36:03.883716669 IP15
19:36:03.883914510 IP15
19:36:03.883947770 IP15
19:36:03.883961851 IP15
19:36:03.883994953 IP15
19:36:03.884005137 IP15
19:36:03.884030823 IP15
19:36:03.884046094 IP15
19:36:03.884068390 IP15
19:36:03.884078674 IP15
19:36:03.884100314 IP15
19:36:03.884119333 IP15
19:36:03.884141135 IP15
19:36:03.884152060 IP15
19:36:03.884173955 IP15
19:36:03.884182438 IP15
19:36:03.884203057 IP15

This is expected, timestamps are in chronological order. About the traffic: small and equal packets are bounced back-and-forth. Client initiates traffic generation. So for the client the odd numbered timestamps are outgoing and vice-versa for the server.

But now, when using hardware timestamping, we get the following (for example):

ip netns exec ns_m0 tcpdump --time-stamp-type=adapter_unsynced --time-stamp-precision=nano

 

master/server
slave/client
14:44:04.710315788 IP15
14:44:04.758545873 IP15
14:44:04.710567282 IP15
14:44:04.758799830 IP15
14:44:04.710849394 IP15
14:44:04.759069396 IP15
14:44:04.711042879 IP15
14:44:04.759236686 IP15
14:44:04.711141554 IP15
14:44:04.759281897 IP15
14:44:04.711184281 IP15
14:44:04.759324535 IP15
14:44:04.711224345 IP15
14:44:04.759364437 IP15
14:44:04.711266610 IP15
14:44:04.759406555 IP15
14:44:04.711310310 IP15
14:44:04.759449711 IP15
14:44:04.711349465 IP15
14:44:04.759488431 IP15
14:44:04.758411898 IP15
14:44:04.710425435 IP15
14:44:04.758680982 IP15
14:44:04.710662581 IP15
14:44:04.758963612 IP15
14:44:04.710928565 IP15
14:44:04.759157087 IP15
14:44:04.711098779 IP15
14:44:04.759261251 IP15
14:44:04.711140994 IP15
14:44:04.759302503 IP15
14:44:04.711182978 IP15
14:44:04.759344893 IP15
14:44:04.711223669 IP15
14:44:04.759384802 IP15
14:44:04.711267547 IP15
14:44:04.759428520 IP15
14:44:04.711308661 IP15
14:44:04.759469128 IP15
14:44:04.711351810 IP15

Now we can see that the timestamps are not chronological (see nanosecond portions). Which is unexpected, and is making the latency measurement impossible (as far as I can see). I expect both ports to be on their own clocks, this does not appear to be the case however (clock for RX and a clock for TX, instead of clock per port). Is there a solution to this? Must I use socket ancillary data in a custom C application to receive the correct timestamps? I'll put information on the setup below. Please let me know if more information is needed. Note: Applications like linuxptp do seem to work fine with hardware timestamping, and gives a path-delay in the sub-microsecond range.

The setup:

meas_E2E_exp (1).png

CentOS Linux 7.

Kernel 3.10.0-862.14.4.el7.x86_64 (default kernel for CentOS 7.5 installation).

Mellanox OFED, latest firmware & drivers.

Using network namespaces ns_m0 with ens6f0 and ns_m1 with ens6f1 to prevent kernel loopback.

Viewing all 6211 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>