#! /bin/bash
set -x
DEV=mlx4
NET=21
ip addr flush dev mlx4
ip link set dev mlx4 down
ip link del vxlan0
ip link set dev $DEV mtu 9000
ip addr add 10.224.$NET.27/24 brd + dev $DEV
ip link set dev $DEV up
ip route add 10.224.0.0/12 via 10.224.$NET.1
ip link add vxlan0 type vxlan id 17 group 239.1.1.17 dev $DEV
ip addr add 172.18.1.$NET/24 brd + dev vxlan0
ip link set dev vxlan0 up
This is run on both machines (with different NET variable), bare metal with no VM. mlx4 is the ethX device renamed.
MTU 9000 is a new addition; with that I get ~38 Gbit/s when doing single-stream TCP testing on the mlx4 device, but VXLAN encapsulated traffic stays at ~24Gbit/s; CPU bound on a single core.
The performance I am seeing is close to the one you show in DOC-1456 for 1 VM pair. While I can get high performance by running multiple streams, I could get similar aggregate performance by bonding 4 10 Gbit/s connections. I'm really hoping to improve our single-stream speeds.