I have developed a test client server application which uses the verbs library and seems to work well when I have my ConnectX-3 Pro cards configured to use Infiniband.
However, if I reconfigure the ports to use Ethernet mode and try to use roce v1 mode my client always fails with the same error whenever I try call rdma_resolve_addr(...) - it generates RDMA_CM_EVENT_ADDR_ERROR, error: -2 (ENOENT).
If I try use udaddy instead of my own application I see exactly the same error:
>strace -f -s 32 -x udaddy -s 192.168.0.100
...
open("/dev/infiniband/rdma_cm", O_RDWR|O_CLOEXEC) = 3
...
write(1, "udaddy: connecting\n", 19udaddy: connecting) = 19
write(3,"\x15\x00\x00\x00\x10\x01\x00\x00\x00\x00\x00\x00\xd0\x07\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 280) = 280
write(3, "\x0c\x00\x00\x00\x08\x00\x48\x01\xa0\xbc\x4e\x2a\xff\x7f\x00\x00", 16) = 16
write(1, "udaddy: event: RDMA_CM_EVENT_ADD"..., 51udaddy: event: RDMA_CM_EVENT_ADDR_ERROR, error: -2) = 51
write(1, "test complete\n", 14test complete) = 14
write(3, "\x01\x00\x00\x00\x10\x00\x04\x00\x30\xc1\x4e\x2a\xff\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 24) = 24
close(3) = 0
write(1, "return status -2\n", 17return status -2) = 17
shutdown(4, 2 /* send and receive */) = 0
close(4) = 0
exit_group(-2) = ?
The ENOENT error seems to be coming from the rdma_cm kernel module in response to the RDMA_USER_CM_CMD_RESOLVE_ADDR command which is written to /dev/infiniband/rdma_cm - see write(3,"\x15...
Looking briefly at the rdma_cm code the ENOENT error code typically seems to be returned when there is no matching entry found in the GID cache.
Is there something I should be doing on my system to ensure that the GID cache is populated?
The system is running RHEL6.6 with MLNX_OFED_LINUX-3.1-1.0.3-rhel6.6-x86_64 installed.
Thanks.
-Ronnie