Hi justinclift,
For sample code -in this stage- you'll need early access to the alpha drop.
Please contact hpc@mellanox.com.
Once the GA is ready, we'll provide the code as well as samples for utilizing this feature.
Basically, this feature lets the RDMA application allocate memory on the GPU (using cudaMalloc method) and then be able to register a Memory Region (MR) pointer by a GPU memory virtual pointer (using ibv_reg_mr method).. after this point, Mellanox HCA would be able to RDMA read/write directly from the GPU memory.
Before having this feature, users couldn't let the HCA access the GPU memory directly, hence they had to either memcpy the buffer back to the host memory, or avoid using GPU memory. But GPUDirect-RDMA is introduced to make this possible.