Mellanox Infiniband HCA and/or Ethernet adapters: Difference between revisions

From FnordWiki
Jump to navigation Jump to search
Line 59: Line 59:


=== Burning new firmwares ===
=== Burning new firmwares ===
Unpack the just-downloaded ZIP file and find a .bin file inside. And compare what is currently on the card with what is in the .bin file:
itops@syadasti:~/Mlnx-HCA-Firmware$ '''diff -u --color <(sudo mstflint -d d8:00.0 q full) <(mstflint -i fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752.bin q full) '''
--- /dev/fd/63 2026-04-13 19:15:20.421912090 +0000
+++ /dev/fd/62 2026-04-13 19:15:20.421912090 +0000
@@ -1,13 +1,14 @@
Image type: FS2
-FW Version: 2.36.5000
-FW Release Date: 26.1.2016
-MIC Version: 1.5.0
+FW Version: 2.42.5000
+FW Release Date: 5.9.2017
+MIC Version: 2.0.0
Config Sectors: 2
-Product Version: 02.36.50.00
-Rom Info: type=PXE version=3.4.718
+PRS Name: cx3pro_MCX354A_fdr_09v.prs
+Product Version: 02.42.50.00
+Rom Info: type=PXE version=3.4.752
Device ID: 4103
Description: Node Port1 Port2 Sys image
-GUIDs: f4521403002c5e00 f4521403002c5e01 f4521403002c5e02 f4521403002c5e03
-MACs: f452142c5e01 f452142c5e02
-VSD:
+GUIDs: 0002c9000100d050 0002c9000100d051 0002c9000100d052 0002c9000100d050
+MACs: 0002c9000001 0002c9000002
+VSD: n/a
PSID: MT_1090111019
itops@syadasti:~/Mlnx-HCA-Firmware$
Along with the firmware version numbers, we can see that the Infiniband GUIDs and Ethernet MAC addresses are different. It would probably be good to keep those even after the new firmware image is flashed. Note that the PSID stays the same. I believe it is possible to change an adapter card from a ConnectX-3 to a ConnectX-3 Pro by flashing different images into it. But we do not need to do that here.

Now, let's make a backup of what is in the card:
itops@syadasti:~/Mlnx-HCA-Firmware$ '''sudo mstflint -d d8:00.0 ri backup_$(date -Iseconds)_2.36.5000.bin'''
itops@syadasti:~/Mlnx-HCA-Firmware$ '''mstflint -i backup_2026-04-13T19\:58\:45+00\:00_2.36.5000.bin q full'''
Image type: FS2
FW Version: 2.36.5000
FW Release Date: 26.1.2016
MIC Version: 1.5.0
Config Sectors: 2
Product Version: 02.36.50.00
Rom Info: type=PXE version=3.4.718
Device ID: 4103
Description: Node Port1 Port2 Sys image
GUIDs: f4521403002c5e00 f4521403002c5e01 f4521403002c5e02 f4521403002c5e03
MACs: f452142c5e01 f452142c5e02
VSD:
PSID: MT_1090111019
itops@syadasti:~/Mlnx-HCA-Firmware$

Remove drivers from the running kernel and flash the new firmware image:
itops@syadasti:~/Mlnx-HCA-Firmware$ '''lsmod | grep mlx'''
mlx4_ib 262144 0
ib_uverbs 204800 1 mlx4_ib
mlx4_en 167936 0
mlx4_core 442368 2 mlx4_ib,mlx4_en
ib_core 524288 2 mlx4_ib,ib_uverbs
itops@syadasti:~/Mlnx-HCA-Firmware$ '''sudo modprobe -rv mlx4_ib mlx4_en mlx4_core'''
rmmod mlx4_ib
rmmod ib_uverbs
rmmod mlx4_en
rmmod mlx4_core
itops@syadasti:~/Mlnx-HCA-Firmware$ '''sudo mstflint -d d8:00.0 --guids f4521403002c5e00,f4521403002c5e01,f4521403002c5e02,f4521403002c5e03 --macs f452142c5e01,f452142c5e02 --image fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752.bin burn'''
You are about to change the Guids/Macs/Uids on the device:
New Values Current Values
Node GUID: f4521403002c5e00 f4521403002c5e00
Port1 GUID: f4521403002c5e01 f4521403002c5e01
Port2 GUID: f4521403002c5e02 f4521403002c5e02
Sys.Image GUID: f4521403002c5e03 f4521403002c5e03
Port1 MAC: f452142c5e01 f452142c5e01
Port2 MAC: f452142c5e02 f452142c5e02
Do you want to continue ? (y/n) [n] : '''y'''
Current FW version on flash: 2.36.5000
New FW version: 2.42.5000
Burning FS2 FW image without signatures - OK
Restoring signature - OK
itops@syadasti:~/Mlnx-HCA-Firmware$
And query the card to check on the flashing:
itops@syadasti:~/Mlnx-HCA-Firmware$ '''sudo mstflint -d d8:00.0 q full'''
Image type: FS2
FW Version: 2.42.5000
FW Version(Running): 2.36.5000
FW Release Date: 5.9.2017
MIC Version: 2.0.0
Config Sectors: 2
PRS Name: cx3pro_MCX354A_fdr_09v.prs
Product Version: 02.42.50.00
Rom Info: type=PXE version=3.4.752
Device ID: 4103
Description: Node Port1 Port2 Sys image
GUIDs: f4521403002c5e00 f4521403002c5e01 f4521403002c5e02 f4521403002c5e03
MACs: f452142c5e01 f452142c5e02
VSD:
PSID: MT_1090111019
itops@syadasti:~/Mlnx-HCA-Firmware$
New code isn't running right now. But there is an "image reactivate" operation that could have been performed along with the flashing. Let's give that a go here:
itops@syadasti:~/Mlnx-HCA-Firmware$ '''sudo mstflint -d d8:00.0 ir'''
-E- Failed to execute image reactivation on device d8:00.0. Error: Operation not supported..
itops@syadasti:~/Mlnx-HCA-Firmware$
Hmmm. A different tool, maybe?
itops@syadasti:~/Mlnx-HCA-Firmware$ '''sudo mstfwreset -d d8:00.0 query'''
-E- Unsupported Device: d8:00.0 (ConnectX3Pro).
itops@syadasti:~/Mlnx-HCA-Firmware$
Fine, be that way. Halt the OS and power cycle the machine. Grrrr. Fast forward a bit (power cycle, OS start done) and we can check again:
itops@syadasti:~$ '''sudo mstflint -d d8:00.0 query full'''
Image type: FS2
FW Version: 2.42.5000
FW Release Date: 5.9.2017
MIC Version: 2.0.0
Config Sectors: 2
PRS Name: cx3pro_MCX354A_fdr_09v.prs
Product Version: 02.42.50.00
Rom Info: type=PXE version=3.4.752
Device ID: 4103
Description: Node Port1 Port2 Sys image
GUIDs: f4521403002c5e00 f4521403002c5e01 f4521403002c5e02 f4521403002c5e03
MACs: f452142c5e01 f452142c5e02
VSD:
PSID: MT_1090111019
itops@syadasti:~$
So that is looking correct. Yay!

=== Set card mode ===
=== Set card mode ===

Revision as of 20:50, 13 April 2026

I have a number of servers with ConnectX-3 cards to connect to various network infrastructure bits and bobs. Recently (today) installed on in a Cisco UCS C220 M5 server and realized that I have made no notes about things that have been done to make the cards work. Ooops.

ConnectX-3 cards

Several of these are deployed around the home network. There are various models (pretty much all are ConnectX-3, though.)

Finding card info

List out all Mellanox cards in a machine. "15b3" is Mellanox's PCI vendor ID.

itops@syadasti:~$ lspci -d 15b3:
d8:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
itops@syadasti:~$ 

First column of output is the PCI bus ID of the card. This is used in the next bunch of commands. Running with elevated privileges (under sudo) allows reading to the capabilities info. That isn't really necessary for this step, but causes no harm.

itops@syadasti:~$ sudo lspci -s d8:00.0 -vv
d8:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
        Subsystem: Mellanox Technologies Device 0003
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 83
        NUMA node: 1
        IOMMU group: 20
        Region 0: Memory at fbe00000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at 4ffff800000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at fbd00000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
        Capabilities: [60] Express Endpoint, IntMsgNum 0
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [148] Device Serial Number f4-52-14-03-00-2c-5e-00
        Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [154] Advanced Error Reporting
        Capabilities: [18c] Secondary PCI Express
        Kernel driver in use: mlx4_core
        Kernel modules: mlx4_core

itops@syadasti:~$ 

Query the card to find out its code revision, Ethernet MAC address(es), Infiniband GUID(s), and the Mellanox Parameter Set ID (PSID). This does need the elevated privileges provided by sudo:

itops@syadasti:~$ sudo mstflint -d d8:00.0 q full
Image type:            FS2
FW Version:            2.36.5000
FW Release Date:       26.1.2016
MIC Version:           1.5.0
Config Sectors:        2
Product Version:       02.36.50.00
Rom Info:              type=PXE version=3.4.718
Device ID:             4103
Description:           Node             Port1            Port2            Sys image
GUIDs:                 f4521403002c5e00 f4521403002c5e01 f4521403002c5e02 f4521403002c5e03 
MACs:                                       f452142c5e01     f452142c5e02
VSD:                   
PSID:                  MT_1090111019
itops@syadasti:~$ 

Where to get firmware images

See that PSID in the output from the mstflint query operation? We are running relase 2.36.5000 of the code in the ConnectX-3 chip on this card. There are also code blobs for BIOS/UEFI that can be installed.

nVidia bought Mellanox some years ago. With that purchase, firmware images for switches and adapter cards can be found at https://network.nvidia.com/support/firmware/firmware-downloads/. If it is not there in the future, try Duck-Duck-Going for something like "nVidia adapter card firmware download" until you find the right thing. Today's card is a "ConnectX-3 Pro" (as seen in the output of lspci -d 15b3:). These cards came in Infiniband only and Ethernet+Infiniband (and maybe just Ethernet, too?) varieties. Navigating through nVidia firmware download, I found a card with a matching PSID under the ConnectX-3 Pro Infiniband link. PSID MT_1090111019 looks to go with a Mellanox MCX354A-FCCT (dual port) or MCX353A-FCCT (single port) adapter. Download the corresponding ZIP archive from nVidia's site. As this is written, that is https://www.mellanox.com/downloads/firmware/fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752.bin.zip.

Burning new firmwares

Unpack the just-downloaded ZIP file and find a .bin file inside. And compare what is currently on the card with what is in the .bin file:

itops@syadasti:~/Mlnx-HCA-Firmware$ diff -u --color <(sudo mstflint -d d8:00.0 q full) <(mstflint -i fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752.bin q full) 
--- /dev/fd/63	2026-04-13 19:15:20.421912090 +0000
+++ /dev/fd/62	2026-04-13 19:15:20.421912090 +0000
@@ -1,13 +1,14 @@
 Image type:            FS2
-FW Version:            2.36.5000
-FW Release Date:       26.1.2016
-MIC Version:           1.5.0
+FW Version:            2.42.5000
+FW Release Date:       5.9.2017
+MIC Version:           2.0.0
 Config Sectors:        2
-Product Version:       02.36.50.00
-Rom Info:              type=PXE version=3.4.718
+PRS Name:              cx3pro_MCX354A_fdr_09v.prs
+Product Version:       02.42.50.00
+Rom Info:              type=PXE version=3.4.752
 Device ID:             4103
 Description:           Node             Port1            Port2            Sys image
-GUIDs:                 f4521403002c5e00 f4521403002c5e01 f4521403002c5e02 f4521403002c5e03 
-MACs:                                       f452142c5e01     f452142c5e02
-VSD:                   
+GUIDs:                 0002c9000100d050 0002c9000100d051 0002c9000100d052 0002c9000100d050 
+MACs:                                       0002c9000001     0002c9000002
+VSD:                   n/a
 PSID:                  MT_1090111019
itops@syadasti:~/Mlnx-HCA-Firmware$

Along with the firmware version numbers, we can see that the Infiniband GUIDs and Ethernet MAC addresses are different. It would probably be good to keep those even after the new firmware image is flashed. Note that the PSID stays the same. I believe it is possible to change an adapter card from a ConnectX-3 to a ConnectX-3 Pro by flashing different images into it. But we do not need to do that here.

Now, let's make a backup of what is in the card:

itops@syadasti:~/Mlnx-HCA-Firmware$ sudo mstflint -d d8:00.0 ri backup_$(date -Iseconds)_2.36.5000.bin
itops@syadasti:~/Mlnx-HCA-Firmware$ mstflint -i backup_2026-04-13T19\:58\:45+00\:00_2.36.5000.bin q full
Image type:            FS2
FW Version:            2.36.5000
FW Release Date:       26.1.2016
MIC Version:           1.5.0
Config Sectors:        2
Product Version:       02.36.50.00
Rom Info:              type=PXE version=3.4.718
Device ID:             4103
Description:           Node             Port1            Port2            Sys image
GUIDs:                 f4521403002c5e00 f4521403002c5e01 f4521403002c5e02 f4521403002c5e03 
MACs:                                       f452142c5e01     f452142c5e02
VSD:                   
PSID:                  MT_1090111019
itops@syadasti:~/Mlnx-HCA-Firmware$

Remove drivers from the running kernel and flash the new firmware image:

itops@syadasti:~/Mlnx-HCA-Firmware$ lsmod | grep mlx
mlx4_ib               262144  0
ib_uverbs             204800  1 mlx4_ib
mlx4_en               167936  0
mlx4_core             442368  2 mlx4_ib,mlx4_en
ib_core               524288  2 mlx4_ib,ib_uverbs
itops@syadasti:~/Mlnx-HCA-Firmware$ sudo modprobe -rv mlx4_ib mlx4_en mlx4_core
rmmod mlx4_ib
rmmod ib_uverbs
rmmod mlx4_en
rmmod mlx4_core
itops@syadasti:~/Mlnx-HCA-Firmware$ sudo mstflint -d d8:00.0 --guids f4521403002c5e00,f4521403002c5e01,f4521403002c5e02,f4521403002c5e03 --macs f452142c5e01,f452142c5e02 --image fw-ConnectX3Pro-rel-2_42_5000-MCX354A-FCC_Ax-FlexBoot-3.4.752.bin burn
    You are about to change the Guids/Macs/Uids on the device:

                        New Values              Current Values
        Node  GUID:     f4521403002c5e00        f4521403002c5e00
        Port1 GUID:     f4521403002c5e01        f4521403002c5e01
        Port2 GUID:     f4521403002c5e02        f4521403002c5e02
        Sys.Image GUID: f4521403002c5e03        f4521403002c5e03
        Port1 MAC:          f452142c5e01            f452142c5e01
        Port2 MAC:          f452142c5e02            f452142c5e02

 Do you want to continue ? (y/n) [n] : y

    Current FW version on flash:  2.36.5000
    New FW version:               2.42.5000

Burning FS2 FW image without signatures - OK  
Restoring signature                     - OK
itops@syadasti:~/Mlnx-HCA-Firmware$

And query the card to check on the flashing:

itops@syadasti:~/Mlnx-HCA-Firmware$ sudo mstflint -d d8:00.0 q full
Image type:            FS2
FW Version:            2.42.5000
FW Version(Running):   2.36.5000
FW Release Date:       5.9.2017
MIC Version:           2.0.0
Config Sectors:        2
PRS Name:              cx3pro_MCX354A_fdr_09v.prs
Product Version:       02.42.50.00
Rom Info:              type=PXE version=3.4.752
Device ID:             4103
Description:           Node             Port1            Port2            Sys image
GUIDs:                 f4521403002c5e00 f4521403002c5e01 f4521403002c5e02 f4521403002c5e03 
MACs:                                       f452142c5e01     f452142c5e02
VSD:                   
PSID:                  MT_1090111019
itops@syadasti:~/Mlnx-HCA-Firmware$

New code isn't running right now. But there is an "image reactivate" operation that could have been performed along with the flashing. Let's give that a go here:

itops@syadasti:~/Mlnx-HCA-Firmware$ sudo mstflint -d d8:00.0 ir
-E- Failed to execute image reactivation on device d8:00.0. Error: Operation not supported..
itops@syadasti:~/Mlnx-HCA-Firmware$ 

Hmmm. A different tool, maybe?

itops@syadasti:~/Mlnx-HCA-Firmware$ sudo mstfwreset -d d8:00.0 query
-E- Unsupported Device: d8:00.0 (ConnectX3Pro).
itops@syadasti:~/Mlnx-HCA-Firmware$ 

Fine, be that way. Halt the OS and power cycle the machine. Grrrr. Fast forward a bit (power cycle, OS start done) and we can check again:

itops@syadasti:~$ sudo mstflint -d d8:00.0 query full
Image type:            FS2
FW Version:            2.42.5000
FW Release Date:       5.9.2017
MIC Version:           2.0.0
Config Sectors:        2
PRS Name:              cx3pro_MCX354A_fdr_09v.prs
Product Version:       02.42.50.00
Rom Info:              type=PXE version=3.4.752
Device ID:             4103
Description:           Node             Port1            Port2            Sys image
GUIDs:                 f4521403002c5e00 f4521403002c5e01 f4521403002c5e02 f4521403002c5e03 
MACs:                                       f452142c5e01     f452142c5e02
VSD:                   
PSID:                  MT_1090111019
itops@syadasti:~$ 

So that is looking correct. Yay!

Set card mode