Hello,
Have you tried seeing that you cannot actually allocate DMA of the requires size with the default settings ? The parameters in mlx4 (ConnectX-3Pro and below) were configurable as the default values on some occasions were too low for some users. The mlx5 driver should have a much larger default pool.