Hi,
I implemented a program having two threads: with one polling the cq, and the other one monitoring the event: IBV_EVENT_SRQ_LIMIT_REACHED, to post receives to the srq.
The program works fine when I pinned one thread to core 0 and the other one to core 1. However, as soon as I moved the thread originally running on core 1 to another core, i.e.
core 2, there is a noticeable performance degradation, as much as 5x slower.
I am running the application on an AMD Opteron Processor 6320, in which a pair of cores sharing the L2 cache. I guess, by moving the thread to a different core, which does not
share the same L2 cache with the previous thread, would introduce some coherence overhead. However, from appearance, ibv_poll_cq and ibv_post_srq_recv do not share any
common resource. I am just wondering what could be the cause for such performance degradation.
Thank you in advance for any advice,