Hi Mikyung,
Your figures make it look like there might be a problem with hostB's VM... do you get the same results (hostA<->hostB's VM (3521.03 MB/s)) when reversing to hostA's VM<->hostB?
It might be useful if you dump more of your config here, e.g., lscpu/numactl -h on the hosts and inside the VMs, the libvirt xml etc.