Mellanox Infiniband HCA and/or Ethernet adapters
I have a number of servers with ConnectX-3 cards to connect to various network infrastructure bits and bobs. Recently (today) installed on in a Cisco UCS C220 M5 server and realized that I have made no notes about things that have been done to make the cards work. Ooops.
ConnectX-3 cards
Several of these are deployed around the home network. There are various models (pretty much all are ConnectX-3, though.)
Finding card info
List out all Mellanox cards in a machine. "15b3" is Mellanox's PCI vendor ID.
itops@syadasti:~$ lspci -d 15b3: d8:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro] itops@syadasti:~$
First column of output is the PCI bus ID of the card. This is used in the next bunch of commands. Running with elevated privileges (under sudo) allows reading to the capabilities info. That isn't really necessary for this step, but causes no harm.
itops@syadasti:~$ sudo lspci -s d8:00.0 -vv
d8:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
Subsystem: Mellanox Technologies Device 0003
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 83
NUMA node: 1
IOMMU group: 20
Region 0: Memory at fbe00000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at 4ffff800000 (64-bit, prefetchable) [size=8M]
Expansion ROM at fbd00000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Capabilities: [48] Vital Product Data
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Capabilities: [60] Express Endpoint, IntMsgNum 0
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [148] Device Serial Number f4-52-14-03-00-2c-5e-00
Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
Capabilities: [154] Advanced Error Reporting
Capabilities: [18c] Secondary PCI Express
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
itops@syadasti:~$
Query the card to find out its code revision, Ethernet MAC address(es), Infiniband GUID(s), and the Mellanox Parameter Set ID (PSID). This does need the elevated privileges provided by sudo:
itops@syadasti:~$ sudo mstflint -d d8:00.0 q full Image type: FS2 FW Version: 2.36.5000 FW Release Date: 26.1.2016 MIC Version: 1.5.0 Config Sectors: 2 Product Version: 02.36.50.00 Rom Info: type=PXE version=3.4.718 Device ID: 4103 Description: Node Port1 Port2 Sys image GUIDs: f4521403002c5e00 f4521403002c5e01 f4521403002c5e02 f4521403002c5e03 MACs: f452142c5e01 f452142c5e02 VSD: PSID: MT_1090111019 itops@syadasti:~$
Where to get firmware images
See that PSID in the output from the mstflint query operation? We are running relase 2.36.5000 of the code in the ConnectX-3 chip on this card. There are also code blobs for BIOS/UEFI that can be installed.
nVidia bought Mellanox some years ago. With that purchase, firmware images for switches and adapter cards can be found at https://network.nvidia.com/support/firmware/firmware-downloads/. If it is not there in the future, try Duck-Duck-Going for something like "nVidia adapter card firmware download" until you find the right thing. Today's card is a "ConnectX-3 Pro" (as seen in the output of lspci -d 15b3:). These cards came in Infiniband only and Ethernet+Infiniband (and maybe just Ethernet, too?) varieties. Navigating through nVidia firmware download, I found a card with a matching PSID under the ConnectX-3 Pro Infiniband link. PSID MT_1090111019 looks to go with a Mellanox MCX354A-FCCT (dual port) or MCX353A-FCCT (single port) adapter. Download the corresponding ZIP archive from nVidia's site. As this is written, that is https://www.mellanox.com/downloads/firmware/fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752.bin.zip.