So I now have two servers with Windows 2012 on them + The latest firmware on each HCA + the latest 4.2 drivers on each machine.
I confirmed that RDMA is enabled on each machine and I still can't break past the 1350MB/s mark using windows file sharing.
I ran NTttcp as per my Mellanox contact's advice and managed to get 23Gbps by throwing several threads at the job.
I used the following settings which I googled for (some guy was able to get 9.9Gbps on his intel 10GbE card with these) :
ntttcps -m 8,0,10.10.10.111 -l 1048576 -n 100000 -w -a 16 -t 10
ntttcpr -m 8,0,10.10.10.111 -l 1048576 -rb 2097152 -n 100000 -w -a 16 -fr -t 10
On a single thread I get 15Gbps. On two threads I get 20Gbps.
This is still a far cry from 40Gbps.