sending order of 'segmented' UDP packets

September 26, 2018, 1:44 am

≫ Next: Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!

≪ Previous: ways to take best Cisco SFP+ modules

Hi,

when creating an UDP packet, I need to use two mbufs - one containing the UDP header (hdr) and another holding the payload (pay):

struct rte_mbuf* hdr = rte_pktmbuf_alloc(hdrmp);

struct rte_mbuf* pay = rte_pktmbuf_alloc(paymp);

// filling ether, IP, UDP header

...

ip_hdr->version_ihl = 0x40 | 0x05; // (*) without 0x05 order it ok

...

// setting sizes and linkage

hdr->data_len = sizeof(struct ether_hdr) + sizeof(struct ipv4_hdr) + sizeof(struct udp_hdr);

pay->data_len = payloadSize;

hdr->pkt_len = hdr->data_len + pay->data_len;

pay->pkt_len = hdr->pkt_len;

hdr->next = pay;

hdr->nb_segs = 2;

When sending plenty of such UDP packets using rte_eth_tx_burst(), all of them were sent correctly, but the sending order seems to be random. When using just a single mbuf for an UDP packet, the sending order is always the order of the packets in the tx array - which is what I expect. Using the header-payload-separation approach and omitting the IP header size info in the ip_hdr field - resulting in a wrong IP packet - the sending order gets ok.

I'm using the mlx5 PMD, NIC is a ConnectX-5. Could it be some offload mechanisms, influencing the sending order? Maybe someone can help.

Thanks and best regards

Sofia Baran

↧

Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!

September 26, 2018, 2:14 pm

≫ Next: Re: tgtd disconnected frequently

≪ Previous: sending order of 'segmented' UDP packets

Check Mellanox OFED user manual, section 3.2.2 for additional details about Subnet Manager. It is a service, that can be enabled/disabled and by default it uses /etc/opensm/opensm.conf file.

Regarding GUID (should be Port GUID), if you have only one connected port, application will detect it and start SM on it, so keep it simple.

↧

Re: tgtd disconnected frequently

September 26, 2018, 2:24 pm

≫ Next: Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!

≪ Previous: Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!

For beginning, what is the hardware? Are you using Mellanox OFED or Inbox driver? I would suggest to install Mellanox OFED - http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers .

Next, simplify and debug the issue using only one connection (you have six at the moment)

Use ibdump/tcpdump(with sniffer flag enabled) in order to collect the data on the sender/receiver and load it in wireshark and follow the packets to see who terminates the connection

For additional details how to use ibdump/tcpdump(sniffer), check Mellanox OFED manual http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v4_4.pdf

↧

Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!

September 26, 2018, 2:55 pm

≫ Next: Re: Not able to program the tc flows into the ConnectX-4 card.

≪ Previous: Re: tgtd disconnected frequently

Thank you, the Mellanox user manual has a wealth of information on OpenSM. I'll check settings and create/check log files. I'll revert back to the active port GUID.

↧

Re: Not able to program the tc flows into the ConnectX-4 card.

September 26, 2018, 3:50 pm

≫ Next: Re: Add iPXE support for Connectx-3-Pro MT27520

≪ Previous: Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!

Hello Suresh,

Thank you for posting your question on the Mellanox Community.

Based on the information provided, we cannot determine the root cause of the issue you are experiencing.

As we only support ASAP2 on a limited collection of Operation Systems, please refer to the ASAP2 Release Notes (https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_Release_Notes_v4.4.pdf) to check, in section 1.1 and 1.2 the Software Requirements and Supported Operating Systems.

Also please note:
"In case of offloading VXLAN, the PF should not be added as a port in the OVS data-path but rather be assigned with the IP address to be used for encapsulation."

If you want us to debug this issue further, we recommend to open a Mellanox Support case, by sending an email to support@mellanox.com

Thanks and regards,
~Mellanox Technical Support

↧

Re: Add iPXE support for Connectx-3-Pro MT27520

September 26, 2018, 4:00 pm

≫ Next: Re: rx-out-of-buffer

≪ Previous: Re: Not able to program the tc flows into the ConnectX-4 card.

Hi Bal,

Based on the 'flint' output you are using a Mellanox HP OEM card (HP Ethernet 10Gb 2-port 546FLR-SFP+ Adapter (779799-B21). Based on this information, you should contact HP for support as HP is capable of changing the capabilities for this adapter.

Thanks and regards,

~Mellanox Technical Support

↧

Re: rx-out-of-buffer

September 26, 2018, 4:26 pm

≫ Next: Re: Disable offloading of tag matching on ConnectX-5?

≪ Previous: Re: Add iPXE support for Connectx-3-Pro MT27520

Hi Tom,

My apologies for not providing the link to the Mellanox Community document. The link is -> Understanding mlx5 ethtool Counter

The "rx_out_of_buffer" counter from 'ethtool -S' indicates RX packet drops due to lack of receive buffers. The lack of receive buffers can be related to a system tuning issue or system capability.

What happens, when you turn off 'interrupt coalescence' on the NIC with the following command -> # ethtool -C <int> adaptive-rx off rx-usecs 0 rx-frames 0

Also make sure, you disable flow-control on the NIC and set the PCI Max Read Request to '4096'. Link to document -> Understanding PCIe Configuration for Maximum Performance

Thanks and regards,
~Mellanox Technical Support

↧

Re: Disable offloading of tag matching on ConnectX-5?

September 26, 2018, 6:49 pm

≫ Next: Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

≪ Previous: Re: rx-out-of-buffer

Hello Karlo,

Thank you for posting your question on the Mellanox Community.

Unfortunately, you cannot disable the capability on the adapter itself.

You can, however TM Offload is disabled by default or enable when you are using UCX.
For example:

UCX_RC_TM_ENABLE=y -> h/w Tag Matching Offload enabled

For more information, please see the following link -> https://hpcadvisorycouncil.atlassian.net/wiki/spaces/HPCWORKS/pages/141230081/Understanding+Tag+Matching+Offload+on+ConnectX-5+Adapters

Thanks and regards,
~Mellanox Technical Support

↧

Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

September 27, 2018, 12:10 am

≫ Next: Login directly to Enable Mode

≪ Previous: Re: Disable offloading of tag matching on ConnectX-5?

The OP should get at least 100G even with PCIe 3.

↧

Login directly to Enable Mode

September 27, 2018, 2:02 am

≫ Next: Copper SFP vs Optical SFP: Which One Is the Best to Use?

≪ Previous: Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

Hi,

My customer is using ONYX and they want to go directly to Enable Mode after login, is it possible? Thank you.

BR,

Donny Hariady

↧

Copper SFP vs Optical SFP: Which One Is the Best to Use?

September 27, 2018, 2:04 am

≫ Next: Re: rx-out-of-buffer

≪ Previous: Login directly to Enable Mode

The conflict of copper vs fiber has raged for years. Fiber seems to operate as a rival to copper instead of a substitute till now, it has already set up a gap within the enterprise. However, with current advances in copper generation, the copper offers the identical step-ladder improve direction. The pace difference between the two media is considerably smaller. In a few approaches, the copper subjects maximum to IT professionals and records center selection makers. But many cease-user groups nonetheless face tough choice approximately which type is the satisfactory universal price for his or her contemporary and destiny projected needs. This conflict is also being waged in SFP transceivers, there is a measurable distinction within the copper SFP vs optical SFP. This article will discover their respective strengths and weaknesses and display insights into how IT specialists are to continue.

Copper SFP vs Optical SFP: Copper SFP Is a Balanced Choice in Environment Restrictions

The Gigabit RJ45 copper SFP transceiver helps 1000Mbps over Cat5 cables with RJ45 connector interface, which operates on wellknown Cat5 unshielded twisted-pair copper cabling of link lengths as much as a hundred m (328 feet). GLC-T is a regular Cisco 1000BASE-T SFP copper RJ-45 transceiver. For brief-distance links on a Gigabit transfer, it makes no difference in case you use SFP ports or RJ45 ports to interconnect switches. Copper SFP is famous to be used for quick variety uplinks, as it’s simpler and inexpensive to apply 1G copper SFPs and patch cables. And SFP ports are by and large for allowing fiber connections over longer distances. Especially in some case, Copper SFP will make sense if the activate one side does not have copper ports however SFP slots and the turn on the alternative side only has copper and can’t be geared up with fiber ports. Or in case you don’t want the distance of fiber, you can remember converting SFP to RJ45, if you want to depend upon the transfer to determine what copper speeds (10/one hundred/a thousand) are supported on a copper SFP. Moreover, using copper SFPs to connect the regular copper Gigabit ports is a smart desire to make the quality use of the corresponding number of SFPs on present connected switches.

Copper SFP vs Optical SFP: Fiber SFP Is More Flexible in Long Distance

The optical fiber SFP modules with LC or SC optical connectors are available in Fast Ethernet and Gigabit Ethernet. And those SFP modules are industrially rated to carry out within the most hard running environments. The SFP fiber module offers distinctive wavelengths and optical strength budget to allow distances from 550m to 120km. A form of 1Gbps SFP modules in one-of-a-kind distance can be discovered in FS.COM. Some statics additionally shows that the legacy SFP can hit 4.25Gb/s at 150m, or up to one.25Gb/s for 160km runs and a spread of ranges/speeds in between relying on sort of fibre. Generally, whilst the space of the run is over 328 ft/one hundred m, fiber SFP module should be taken into consideration rather than copper SFP RJ45 module, on the grounds that 1000Mbps may want to simplest pass as some distance as 100m over copper cabling. In that feel, optical fiber SFP offers the substantial gain over copper SFP.

Copper SFP vs Optical SFP

Operating Temperature

For the usual fiber SFP and copper SFP, there's no distinction for the running temperature – they aid zero to 70°C (32 to 158°F) case temperature as default. In reality, there may be more warmth dissipated for optical or electric transmission within the unique applications. Generally, the copper SFPs run a great deal warmer than the fiber SFPs. There are factors that affect the temperature: energy consumption and the case surface. The typical electricity intake of fiber SFPs is zero.8W, the copper SFP is 1.05w, that’s why copper SFP have a better case temperature. In the same environment, the fiber SFP runs at forty°C (104°F) while the copper SFP have to run round fifty two°C (126°F).

Distance

As stated above, copper SFP supports the max cable distance is 100m, so it is usually used to interconnect among switches and servers in horizontal and shorter-period backbone packages. While the fiber SFP lets in the transmission distance up to 120km, which show the high overall performance over longer distances.

Security

When protection might be considered as a hassle within the connection, the use of fiber SFP module is higher than RJ45 copper SFP module. Because fiber doesn’t conduct strength that makes it proof against lightning strikes.

Cost

Copper SFP transceiver might be greater costly than fiber SFP module inside the equal short distance. In Gigabit Ethernet packages, while copper SFP is used in mixture with cooper cables in brief runs, it's miles extra fee powerful as the copper cables are more inexpensive than fiber cables. Besides, with the growth of 1/3-birthday celebration vendors, their full-well suited and straightforward fiber SFP modules are evolved to aid lower value fiber runs. The price gap among 100m copper transceiver and 40km 1000BASE-EX SFP fiber transceiver is reduced. More selections are provided for customers to fulfill their particular demands.

Conclusion

Through copper SFP vs optical SFP, we are able to see that each technology has its personal set of blessings and downsides. Optical fibre SFP is not always higher than copper SFP. In truth, mixing copper and fiber solutions is the satisfactory exercise for facts center, as a flexible answer is critical to ensuring the statistics center remains each practicable and scalable whilst overall performance demands skyrocket. Network enterprise is unpredictable, and the demands of tomorrow may additionally require facilities to investigate answers they may have scoffed at a year in the past.

Article posted by 10Gtek. If you looking for a lower price interconnect solution, welcome to check on www.sfpcables.com

Compatible Brands

↧

Re: rx-out-of-buffer

September 27, 2018, 2:16 am

≫ Next: Warning about EXT_QP_MAX_RETRY_LIMIT/EXT_QP_MAX_RETRY_PERIOD in sys log on ConnectX-3

≪ Previous: Copper SFP vs Optical SFP: Which One Is the Best to Use?

Thanks, but how is it possible to have rx_out_of_buffer and all queues that have not a single "imissed" (DPDK counter that says how much packet could not be received because of a lack of buffers) in any single queues ? Something does not add up here.

We use DPDK so ethtool -C will not impact the performance as those would be overridden by DPDK. We did disable flow-control and send max read request. But my questions here is not the performance, we have a ticket with support for that, it is specifically about the rx_out_of_buffer. I do not understand how that number can increase while the rings themselves do not have any reported miss?

↧

Warning about EXT_QP_MAX_RETRY_LIMIT/EXT_QP_MAX_RETRY_PERIOD in sys log on ConnectX-3

September 28, 2018, 4:22 am

≫ Next: Re: How to Configure Docker in SR-IOV or Passthrough Mode with Mellanox Infiniband Adapters ?

≪ Previous: Re: rx-out-of-buffer

Hi all.

I am using ConnectX-3 Pro card on Windows Server Essentials 2016.

Every boot I got this warning in sys log:

Native_33_0_0: EXT_QP_MAX_RETRY_LIMIT/EXT_QP_MAX_RETRY_PERIOD registry keys were requested by user but FW does not support this feature. Please upgrade your firmware to support it.
For more details, please refer to WinOF User Manual.

it's very strange things, because I have no EXT_QP_MAX_RETRY_LIMIT and EXT_QP_MAX_RETRY_PERIOD parameters in the registry, more of that, it's WinOF-2 related parameters, but my card using WinOF.

I searched everywhere and found nothing Is it normal behavior?

I have last drivers and firmware (see screenshot).

↧

Re: How to Configure Docker in SR-IOV or Passthrough Mode with Mellanox Infiniband Adapters ?

September 28, 2018, 10:09 am

≫ Next: Re: RoCEv2 PFC/ECN Issues

≪ Previous: Warning about EXT_QP_MAX_RETRY_LIMIT/EXT_QP_MAX_RETRY_PERIOD in sys log on ConnectX-3

Hello Pharthiphan,

Thank you for posting your question on the Mellanox Community.

Based on the information provided, please following Section 5.6 of the MLNX_OFED User Manual (http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v4_4.pdf), on how-to configure Docker in SR-IOV or Passthrough with our adapters.

If you run into any issue during configuring or afterwards, please open a Mellanox Support case by sending an email to support@mellanox.com

Thanks and regards,
~Mellanox Technical Support

↧

Re: RoCEv2 PFC/ECN Issues

September 28, 2018, 1:26 pm

≫ Next: Re: When will an ACK generated in RDMA write?

≪ Previous: Re: How to Configure Docker in SR-IOV or Passthrough Mode with Mellanox Infiniband Adapters ?

What happens when you run ib_read_bw test?

↧

Re: When will an ACK generated in RDMA write?

September 28, 2018, 7:14 pm

≫ Next: Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

≪ Previous: Re: RoCEv2 PFC/ECN Issues

Hi Alkx,

Got it. Many thanks.

↧

Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

September 28, 2018, 8:43 pm

≫ Next: Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

≪ Previous: Re: When will an ACK generated in RDMA write?

Hi Arvind

Many thanks for posting your question on the Mellanox Community.

Based on the information provided and also some of the already correct answers, PCIe3.0 has a speed of 8GT and based on the system board configuration, x16 and x8 capabilities. Some of our adapters already have PCIe 4.0 capabilities which have a speed of 16GT.

For more information regarding regarding the PCI specification, please see the following link -> PCI Express - Wikipedia

For more information regarding performance tuning for mlx5 for DPDK. please see the following link -> PCI Express - Wikipedia

The next link contains performance tuning recommendation for a Dell PowerEdge R730 but some of them are also applicable for the R620.

Many thanks.

~Mellanox Technical Support

↧

Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

September 29, 2018, 3:10 pm

≫ Next: Not able to offload tc flows to hardware with Linux tc

≪ Previous: Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

Thanks for the response Martin.

For more information regarding regarding the PCI specification, please see the following link -> PCI Express - Wikipedia

For more information regarding performance tuning for mlx5 for DPDK. please see the following link -> PCI Express - Wikipedia

Both links go to PCI Express on Wikipedia. Did you mean this link?

The next link contains performance tuning recommendation for a Dell PowerEdge R730 but some of them are also applicable for the R620.

There was no next link, did you miss putting any link or meant this one?

Thanks,

Arvind

↧

Not able to offload tc flows to hardware with Linux tc

September 30, 2018, 6:33 am

≫ Next: Re: Best IPoIB settings for a mixed environment of ConnectX-2,3,4 (MT27500, MT26428, MT27700)?

≪ Previous: Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

I have a setup where I launched two SRIOV VMs by with Openstack(openstack + open contrail) and installed the OVS manually in the setup(Ubuntu 18.04). Following instructions provided in the https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf document to have tc flows installed on the ConnectX-4 Ln card. I enabled switchdev to get the representer(switchdev) netdevs for the corresponding VFs. I created a ovs bridge and attached the PF and representer(switchdev) netdevs in that bridge. When I ping one VM to the other VM the traffic always comes on to the representer(switchdev) netdevs, which it should not. I mean only the arp and the first packet should come on to the the representer(switchdev) netdevs and rest should flow through the network card and should not be visible on the representer(switchdev) netdevs. After debugging sometime, I saw status of ovsd and I could see that the following error is showing up in the status of ovsd. ovs|00002|dpif_netlink(handler20)|ERR|failed to offload flow: Invalid argument and when I run dmesg I get the following error [ 2117.205547] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length. Despite that error from the ovsd, the traffic still works but in software which I don't wanted. Please suggest me a proper approach whether I am on the right path or anything needs to be done other than this.

Setup details:

------------------

OS: Ubuntu 18.04

Kernerl version: 4.15.0-20-generic

OVS version: 2.9.0

MLNX OFED version: 4.4-2.0.7

firmware-version: 14.23.1020 (MT_2410110034)

Mellanox card: ConnectX-4 LN.

Please also note that in the ASAP2 document it mentions that the firmware-version should be 16.21.0338 for the same ConnectX-4 card. Should I need to update my firmware or any extra configuration needs to be done. If I have to update the update the firmware, what extra measures I need to take to make it successful. Thanks.

Message was edited by: Suresh Dharavath

↧

Re: Best IPoIB settings for a mixed environment of ConnectX-2,3,4 (MT27500, MT26428, MT27700)?

October 1, 2018, 8:21 am

≫ Next: PTP synchronization in ConnectX-5

≪ Previous: Not able to offload tc flows to hardware with Linux tc

Hi,

There are multiple questions, let me answer one by one

1. ib0 is missing

When using IPoIB, the stack that need to be installed is Mellanox OFED, as Mellanox EN has no IB related packages.

2. Using mixed environment

It is possible, but if you going to mix different HCA for MPI job - don't expect good performance. The system will be as slow as its slowest component. In addition, ConnectX-2 HCA is not supported by Mellanox OFED. it might work, but no troubleshooting/code changes will be done. In addition, ConnectX-2 vs ConnectX-4, it is about different encoding QDR (8/10 - 20% loss) vs EDR (64/66 - 0.03% loss), check this link - InfiniBand Types and Speeds - Advanced Clustering Technologies

3. For any INBOX related questions, work with OS vendor. If there will be any issue, vendor will open a case with Mellanox if necessary.

4. Enhanced mode - you seems to be opened another case - Support for "INBOX drivers?" for 18.04/connected mode? and might be better to keep discussion there.

5. Bests IPoIB settings - everything depends on your setup and traffic pattern and only the benchmarks or real application results can tell what is the best. What works for one cluster, doesn't work for another. The only way is - start with default, set a baseline, start changing parameters one by one, measure, compare.

↧