Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6211

Management command failed in KVM for SR-IOV

$
0
0

Hi,

 

This is my forth day of fighting with SR-IOV and KVM.

 

I can ping from VM to other IPoIB computer but when I tried to use ibnetdiscover command I get SIGSEGV

 

ibnetdiscover

src/query_smp.c:98; send failed; -5

#

# Topology file: generated on Fri Jul 19 19:28:24 2013

#

Segmentation fault (core dumped)

 

Any access from most of the ib commands failed, dmesg shows:

 

mlx4_core 0000:04:00.0: vhcr command MAD_IFC (0x24) slave:3 in_param 0x29f3a000 in_mod=0xffff0001, op_mod=0xc failed with error:0, status -1

mlx4_core 0000:04:00.0: vhcr command SET_PORT (0xc) slave:3 in_param 0x29f3a000 in_mod=0x1, op_mod=0x0 failed with error:0, status -22

mlx4_core 0000:04:00.0: slave 3 is trying to execute a Subnet MGMT MAD, class 0x1, method 0x81 for attr 0x11. Rejecting

 

Looks like command firmware MAD_IFC is failing by somereason in device but I don't have idea about the cause, possibly this part of code is related to this:

 

+ if (slave != dev->caps.function &&

+    ((smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) ||

+     (smp->mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED &&

+      smp->method == IB_MGMT_METHOD_SET))) {

+ mlx4_err(dev, "slave %d is trying to execute a Subnet MGMT MAD, "

+ "class 0x%x, method 0x%x for attr 0x%x. Rejecting\n",

+ slave, smp->method, smp->mgmt_class,

+ be16_to_cpu(smp->attr_id));

+ return -EPERM;

+ }

 

from

 

+static int mlx4_MAD_IFC_wrapper(struct mlx4_dev *dev, int slave,

+ struct mlx4_vhcr *vhcr,

+ struct mlx4_cmd_mailbox *inbox,

+ struct mlx4_cmd_mailbox *outbox,

+ struct mlx4_cmd_info *cmd)

 

 

Please find below some details about my build.

 

 

Reallly appreciate if anybody can point me the right direction or even better help me to fix the issue.

 

 

Thanks in advance

Marcin

 

 

Host:

-----

Motherboard: Supermicro X9DRI-F

CPUs: 2x E5-2640

 

 

System: CentOS 6.3:2.6.32-279.el6.x86_64 and CentOS 6.4 2.6.32-358.el6.x86_64

 

 

Infiniband: Mellanox Technologies MT27500 Family [ConnectX-3], MCX354A-QCB

Mellanox OFED: MLNX_OFED_LINUX-2.0-2.0.5-rhel6.3-x86_64

 

 

qemu-kvm.x86_64                         2:0.12.1.2-2.355.el6

 

 

 

 

#lspci | grep Mel

04:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

04:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

04:00.2 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

04:00.3 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

04:00.4 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

04:00.5 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

04:00.6 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

04:00.7 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

04:01.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

 

 

#dmesg | grep mlx4

mlx4_core: Mellanox ConnectX core driver v1.1 (Apr 23 2013)

mlx4_core: Initializing 0000:04:00.0

mlx4_core 0000:04:00.0: PCI INT A -> GSI 32 (level, low) -> IRQ 32

mlx4_core 0000:04:00.0: setting latency timer to 64

mlx4_core 0000:04:00.0: Enabling SR-IOV with 5 VFs

mlx4_core 0000:04:00.0: Running in master mode

mlx4_core 0000:04:00.0: irq 109 for MSI/MSI-X

mlx4_core 0000:04:00.0: irq 110 for MSI/MSI-X

mlx4_core 0000:04:00.0: irq 111 for MSI/MSI-X

mlx4_core 0000:04:00.0: irq 112 for MSI/MSI-X

mlx4_core: Initializing 0000:04:00.1

mlx4_core 0000:04:00.1: enabling device (0000 -> 0002)

mlx4_core 0000:04:00.1: setting latency timer to 64

mlx4_core 0000:04:00.1: Detected virtual function - running in slave mode

mlx4_core 0000:04:00.1: Sending reset

mlx4_core 0000:04:00.0: Received reset from slave:1

mlx4_core 0000:04:00.1: Sending vhcr0

mlx4_core 0000:04:00.1: HCA minimum page size:512

mlx4_core 0000:04:00.1: irq 113 for MSI/MSI-X

mlx4_core 0000:04:00.1: irq 114 for MSI/MSI-X

mlx4_core 0000:04:00.1: irq 115 for MSI/MSI-X

mlx4_core 0000:04:00.1: irq 116 for MSI/MSI-X

mlx4_core: Initializing 0000:04:00.2

mlx4_core 0000:04:00.2: enabling device (0000 -> 0002)

mlx4_core 0000:04:00.2: setting latency timer to 64

mlx4_core 0000:04:00.2: Skipping virtual function:2

mlx4_core: Initializing 0000:04:00.3

mlx4_core 0000:04:00.3: enabling device (0000 -> 0002)

mlx4_core 0000:04:00.3: setting latency timer to 64

mlx4_core 0000:04:00.3: Skipping virtual function:3

mlx4_core: Initializing 0000:04:00.4

mlx4_core 0000:04:00.4: enabling device (0000 -> 0002)

mlx4_core 0000:04:00.4: setting latency timer to 64

mlx4_core 0000:04:00.4: Skipping virtual function:4

mlx4_core: Initializing 0000:04:00.5

mlx4_core 0000:04:00.5: enabling device (0000 -> 0002)

mlx4_core 0000:04:00.5: setting latency timer to 64

mlx4_core 0000:04:00.5: Skipping virtual function:5

<mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (Apr 23 2013)

mlx4_core 0000:04:00.0: mlx4_ib: multi-function enabled

mlx4_core 0000:04:00.0: mlx4_ib: initializing demux service for 80 qp1 clients

mlx4_core 0000:04:00.1: mlx4_ib: multi-function enabled

mlx4_core 0000:04:00.1: mlx4_ib: operating in qp1 tunnel mode

mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.1 (Apr 23 2013)

mlx4_en 0000:04:00.0: Activating port:2

mlx4_en: eth2: Using 216 TX rings

mlx4_en: eth2: Using 4 RX rings

mlx4_en: eth2: Initializing port

mlx4_en 0000:04:00.1: Activating port:2

mlx4_en: eth3: Using 216 TX rings

mlx4_en: eth3: Using 4 RX rings

mlx4_en: eth3: Initializing port

mlx4_core 0000:04:00.0: mlx4_ib: Port 1 logical link is up

mlx4_core 0000:04:00.0: Received reset from slave:2

mlx4_core 0000:04:00.0: slave 2 is trying to execute a Subnet MGMT MAD, class 0x1, method 0x81 for attr 0x11. Rejecting

mlx4_core 0000:04:00.0: vhcr command MAD_IFC (0x24) slave:2 in_param 0x106a10000 in_mod=0xffff0001, op_mod=0xc failed with error:0, status -1

mlx4_core 0000:04:00.1: mlx4_ib: Port 1 logical link is up

mlx4_core 0000:04:00.0: slave 2 is trying to execute a Subnet MGMT MAD, class 0x1, method 0x81 for attr 0x11. Rejecting

mlx4_core 0000:04:00.0: vhcr command MAD_IFC (0x24) slave:2 in_param 0x119079000 in_mod=0xffff0001, op_mod=0xc failed with error:0, status -1

mlx4_core 0000:04:00.0: mlx4_ib: Port 1 logical link is down

mlx4_core 0000:04:00.1: mlx4_ib: Port 1 logical link is down

mlx4_core 0000:04:00.0: mlx4_ib: Port 1 logical link is up

mlx4_core 0000:04:00.1: mlx4_ib: Port 1 logical link is up

 

 

 

 

# ibv_devinfo

hca_id: mlx4_0

  transport: InfiniBand (0)

  fw_ver: 2.11.500

  node_guid: 0002:c903:00a2:8fb0

  sys_image_guid: 0002:c903:00a2:8fb3

  vendor_id: 0x02c9

  vendor_part_id: 4099

  hw_ver: 0x0

  board_id: MT_1090110018

  phys_port_cnt: 2

  port: 1

  state: PORT_ACTIVE (4)

  max_mtu: 2048 (4)

  active_mtu: 2048 (4)

  sm_lid: 1

  port_lid: 1

  port_lmc: 0x00

  link_layer: InfiniBand

 

 

  port: 2

  state: PORT_DOWN (1)

  max_mtu: 2048 (4)

  active_mtu: 2048 (4)

  sm_lid: 0

  port_lid: 0

  port_lmc: 0x00

  link_layer: InfiniBand

 

 

#cat /etc/modprobe.d/mlx4_core.conf

options mlx4_core num_vfs=8 port_type_array=1,1 probe_vf=1

 

 

KVM Guest: CentOS 6.4 and CentOS 6.3

----------------------

 

 

Mellanox OFED: MLNX_OFED_LINUX-2.0-2.0.5-rhel6.3-x86_64

Kernel: 2.6.32-279.el6.x86_64

 

 

#lspci | grep Mel

00:07.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]

 

 

 

 

#ibv_devinfo

hca_id: mlx4_0

  transport: InfiniBand (0)

  fw_ver: 2.11.500

  node_guid: 0014:0500:c0bb:4473

  sys_image_guid: 0002:c903:00a2:8fb3

  vendor_id: 0x02c9

  vendor_part_id: 4100

  hw_ver: 0x0

  board_id: MT_1090110018

  phys_port_cnt: 2

  port: 1

  state: PORT_ACTIVE (4)

  max_mtu: 2048 (4)

  active_mtu: 2048 (4)

  sm_lid: 1

  port_lid: 1

  port_lmc: 0x00

  link_layer: InfiniBand

 

 

  port: 2

  state: PORT_DOWN (1)

  max_mtu: 2048 (4)

  active_mtu: 2048 (4)

  sm_lid: 0

  port_lid: 0

  port_lmc: 0x00

  link_layer: InfiniBand

 

 

# sminfo

ibwarn: [3673] _do_madrpc: send failed; Function not implemented

ibwarn: [3673] mad_rpc: _do_madrpc failed; dport (Lid 1)

sminfo: iberror: failed: query

 

 

 

 

OpenSM log:

Jul 19 09:57:54 001056 [C520D700] 0x02 -> osm_vendor_init: 1000 pending umads specified

Jul 19 09:57:54 002074 [C520D700] 0x80 -> Entering DISCOVERING state

Using default GUID 0x14050000000002

Jul 19 09:57:54 191924 [C520D700] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0x14050000000002

Jul 19 09:57:54 671075 [C520D700] 0x02 -> osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0x14050000000002

Jul 19 09:57:54 671503 [C520D700] 0x02 -> osm_vendor_bind: Mgmt class 0x04 binding to port GUID 0x14050000000002

Jul 19 09:57:54 672363 [C520D700] 0x02 -> osm_vendor_bind: Mgmt class 0x21 binding to port GUID 0x14050000000002

Jul 19 09:57:54 672774 [C520D700] 0x02 -> osm_opensm_bind: Setting IS_SM on port 0x0014050000000002

Jul 19 09:57:54 673345 [C520D700] 0x01 -> osm_vendor_set_sm: ERR 5431: setting IS_SM capmask: cannot open file '/dev/infiniband/issm0': Invalid argument

Jul 19 09:57:54 674233 [C1605700] 0x01 -> osm_vendor_send: ERR 5430: Send p_madw = 0x7f11b00008c0 of size 256 TID 0x1234 failed -5 (Invalid argument)

Jul 19 09:57:54 674278 [C1605700] 0x01 -> sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_ERROR): SubnGet(NodeInfo), attr_mod 0x0, TID 0x1234

Jul 19 09:57:54 674311 [C1605700] 0x01 -> vl15_send_mad: ERR 3E03: MAD send failed (IB_UNKNOWN_ERROR)

Jul 19 09:57:54 674336 [C0C04700] 0x01 -> state_mgr_is_sm_port_down: ERR 3308: SM port GUID unknown


Viewing all articles
Browse latest Browse all 6211

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>