Join us at Super Computing 2011!We invite you to visit us at the SC2011 conference in Seattle, Nov 14-17 at Booth #2040 See first-hand how we are enabling research discovery with Dell HPC solutions.
When you receive a new cluster, you’ll want to test the various components to make sure everything is working. Here we’ll take a look at how to do some very basic Infiniband connectivity tests to ensure your links are up and running at the correct speed. There are several different tools and methods you can use, but we’ll just cover a few.My environmentHardware · Frontend: Dell PER710 · Compute Nodes: PER410 x 64· Infiniband Connectx-2 IB Cards· Qlogic 12800-040 IB SwitchCluster Middleware · PCM 1.2a· Mellanox OFED 1.4First, make sure IB hardware is discovered on a single compute nodeDo the following:1. ssh to a compute node as root and check the available IB tools in your path
i. Type ib<tab><tab> to see all the toolsii. Experiment with a few 1. # ibstat
1. # ibstat
# ibstatCA 'mlx4_0'CA type: MT26428Number of ports: 1Firmware version: 2.7.0Hardware version: a0Node GUID: 0x0002c903000442f4System image GUID: 0x0002c903000442f7Port 1:State: ActivePhysical state: LinkUpRate: 40Base lid: 37LMC: 0SM lid: 1Capability mask: 0x02510868Port GUID: 0x0002c903000442f5
For this one you are looking at the “rate”, in our case, we are expecting 40 for QDR> 2. # ibhosts
# ibhostsCa : 0x0002c90300077f86 ports 1 "compute-00-08 HCA-1"Ca : 0x0002c90300077f92 ports 1 "compute-00-06 HCA-1"Ca : 0x0002c90300077fb2 ports 1 "compute-00-09 HCA-1"Ca : 0x0002c90300077eae ports 1 "compute-00-02 HCA-1"Ca : 0x0002c90300077f8e ports 1 "compute-00-10 HCA-1"Ca : 0x0002c903000442a4 ports 1 "compute-00-26 HCA-1"Ca : 0x0002c90300077f06 ports 1 "compute-00-18 HCA-1"<snip>
3. # ibv_devinfo
# ibv_devinfohca_id: mlx4_0fw_ver: 2.7.000node_guid: 0002:c903:0004:42f4sys_image_guid: 0002:c903:0004:42f7vendor_id: 0x02c9vendor_part_id: 26428hw_ver: 0xA0board_id: MT_0C40110009phys_port_cnt: 1port: 1state: PORT_ACTIVE (4)max_mtu: 2048 (4)active_mtu: 2048 (4)sm_lid: 1port_lid: 37port_lmc: 0x00
For this one, you are paying attention to the “state”, your port can be in 3 states: PORT_ACTIVE = goodPORT_INIT = link but no subnet managerPORT_DOWN = bad, no link detected4. # ibswitches5. etc… There are many more tools.
iii. Exit and return back to Installer node
Now check all the nodes 2. Check to make sure you have an active port on each node
a. # pdsh –a ibv_devinfo | grep –i port_active | dshbak -c
3. Check to make sure you can see all the hosts on the IB network
a. # pdsh -a ibhosts
4. Explore IB card model, from the installer node
a. # pdsh -a lspci | grep -i infini
<snip> compute-00-63-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev a0)compute-00-13-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev b0)compute-00-32-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev b0)compute-00-23-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev a0)compute-00-04-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev b0)compute-00-33-eth0: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev b0)<snip>
Here you’ll notice the “rev”, some are “a0” and some are “b0”, this signifies whether the card is ConnectX-1 or ConnectX-2. In this case, “a0” is ConnectX-1 and “b0” is ConnectX-2.
b. # pdsh -a dmesg | grep -i infini
5. Look at driver modules loaded
a. # lsmod | grep –i coreb. # modinfo ib_core
6. To determine what type of IB switch you have (from a node with a IB connection)
a. # ibswitches
Check basic connectivity by running tests from one node to another Do the following:1. Open 2 terminals and SSH into each compute node2. Run a simple send / receive test from one node to the other, we’ll test with ib_send_bw, ib_send_lat
a. To test latency, on the first compute node: i. # ibv_send_lat
i. # ibv_send_lat
1. Hit enter and you will see the following output, while this node is waiting on traffic # ib_send_lat ------------------------------------------------------------------Send Latency TestInline data is used up to 400 bytes messageConnection type : RClocal address: LID 0x2b QPN 0x100049 PSN 0xcb9adf b. On compute node 1 i. # ibv_send_lat compute-0X-00 2. Hit enter and you will see something like the following output on a successful run # ib_send_lat compute-00-11------------------------------------------------------------------Send Latency TestInline data is used up to 400 bytes messageConnection type : RClocal address: LID 0x33 QPN 0x40049 PSN 0x550d5eremote address: LID 0x2b QPN 0x100049 PSN 0xcb9adfMtu : 2048------------------------------------------------------------------#bytes #iterations t_min[usec] t_max[usec] t_typical[usec]2 1000 1.52 11.40 1.54------------------------------------------------------------------
1. Hit enter and you will see the following output, while this node is waiting on traffic # ib_send_lat ------------------------------------------------------------------Send Latency TestInline data is used up to 400 bytes messageConnection type : RClocal address: LID 0x2b QPN 0x100049 PSN 0xcb9adf b. On compute node 1 i. # ibv_send_lat compute-0X-00
b. On compute node 1 i. # ibv_send_lat compute-0X-00
i. # ibv_send_lat compute-0X-00
2. Hit enter and you will see something like the following output on a successful run
# ib_send_lat compute-00-11------------------------------------------------------------------Send Latency TestInline data is used up to 400 bytes messageConnection type : RClocal address: LID 0x33 QPN 0x40049 PSN 0x550d5eremote address: LID 0x2b QPN 0x100049 PSN 0xcb9adfMtu : 2048------------------------------------------------------------------#bytes #iterations t_min[usec] t_max[usec] t_typical[usec]2 1000 1.52 11.40 1.54------------------------------------------------------------------
b. To test bandwidth, on the first compute node i. # ibv_send_bw
i. # ibv_send_bw
1. Hit enter and you will see the following output, while this node is waiting on traffic: # ib_send_bw ------------------------------------------------------------------Send BW TestConnection type : RCInline data is used up to 1 bytes messagelocal address: LID 0x2b, QPN 0xc0049, PSN 0x30a89fii. # ibv_send_bw compute-0X-002. Hit enter and you will see the following output from a successful run # ib_send_bw compute-00-11------------------------------------------------------------------Send BW TestConnection type : RCInline data is used up to 1 bytes messagelocal address: LID 0x33, QPN 0x0049, PSN 0x70aa26remote address: LID 0x2b, QPN 0xc0049, PSN 0x30a89fMtu : 2048------------------------------------------------------------------#bytes #iterations BW peak[MB/sec] BW average[MB/sec]65536 1000 3204.65 3202.92------------------------------------------------------------------
That concludes this tech tip. We’ve really just scratched the surface and there are many other topics to cover with regards to Infiniband. What kind of bandwidth and latency to expect with different cards, how to run code over the Infiniband network, etc. I’ll cover those topics and more in some later posts. -- Scott Collier