Good afternoon,
I am trying to use IPoIB with an infiniband card MT26428. The Idea is to use a small cluster with MPI support over InfiniBand, using the SLURM scheduler.
I was using SGE as scheduler and for this one, it was necessary to use IPoIB. I guess it is the same with Slurm.
Actually, my problem is that I cannot use IPoIB with my cards. I install the drivers using :
mlnxofedinstall --all -n /root/myConfig.cfg
and the contents of myConfig.conf are:
IPADDR_ib0=10.1.2.101
NETMASK_ib0=255.255.255.0
NETWORK_ib0=10.1.2.0
BROADCAST_ib0=10.1.2.255
ONBOOT_ib0=1
Apparently, the install is succefull, after a reboot I get:
root@node01:~# ifconfig ib0
ib0 Link encap:UNSPEC HWaddr A0-00-01-00-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:10.1.2.101 Bcast:10.1.2.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:4092 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1024
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
I do the same on a second machine, these machine has IP: 10.1.2.102
But I cannot ping the second machine from the first one (the opposite does not work too, of course).
I don't understand what I am missing in my configuration, is it maybe something related to routing?
root@node01:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.1.1.1 0.0.0.0 UG 100 0 0 eth0
10.1.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.1.2.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0
I am not an expert in routing, I have tried to add the current machine as gateway for ib0 network:
route add -net 10.1.2.0 gw 10.1.2.101 netmask 255.255.255.0 dev ib0
resulting in :
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.1.1.1 0.0.0.0 UG 100 0 0 eth0
10.1.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.1.2.0 10.1.2.101 255.255.255.0 UG 0 0 0 ib0
10.1.2.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0
But this does not help.
I would be really grateful for any help, thanks in advance,
Best regards,
Andrea