Hello Ali and welcome to our community!
Although this community deals with really complicated matters .... , I have to admit, i enjoyed reading your post. We will try to get you going.
So far it does sounds like you do have good idea of the basics and beyond. you are almost there!
Practically, you now have a DDR network which means, by theory it is capable of 20Gb/s speeds.
Your servers and HCA cards would probably give you less because of PCI bus encoding and efficiency.
Pointers for getting the max performance for HPC applications:
- Make sure your HCAs and switch runs with the latest FW and SW available for the model.
- Make sure that your HCAs are all connected on a PCI-E Gen2 or Gen3 bus (and that the Gen mode is enables in BIOS!)
- Make sure that your network is healthy. you could have degraded links or links with errors.
Since i am not sure what diagnostics tools are available with the server driver version you got, i suggest you add a 5th machine to the network, running with Linux and recent MOFED version. Windows and Linux can share the same space. no problem.
from that machine run:
* ibnetdoscover - make sure everything is discoverable, at 4X and DDR speeds
* ibdiagnet - see if there are any errors at the PM (Performance Monitoring) section.
Next, look into tune up your server's part. here is a good document:
Next would be to benchmark your network:
* run basic RDMA BW tests: point A to B, A to C and A to D - they should all return similar numbers
* run your application on each server individually - each server should have similar result. if not, it can impact the overal result of the pack.
look at other performance related posts on this site, you may find more tips.
Good luck!