EMC branded Mellanox SX60xx switch

From FnordWiki
Jump to navigation Jump to search

Used enterprise IT stuff is stupid cheap in the US. Stupid cheap like 18 port FDR (56Gbits/sec) Infiniband switches shipped to your door for 6.67USD per port. Only a couple of issues with these:

  • They run a very limited EMC OS instead of the somewhat more open Linux-based Mellanox MLNX-OS
  • No VPI (Ethernet over Infiniband, more or less)
  • No subnet manager (facilitates IP over Infiniband)
  • Management by telnet ("SSH? That's so not going to happen!")
  • It's not Linux

The devices at FnordNet are Mellanox SX6018s wearing EMC colors. (Mellanox ships switches with blue fronts and EMC switches are black.) On the connector side are:

  • 18 QSFP 56Gbits/sec Infiniband connectors
  • 2 8P8C modular 1000baseT Ethernet connectors (Labelled MGT)
  • 1 USB connector
  • 1 8P8C modular connector for the serial console (labelled CONSOLE)
  • a small recessed reset button switch (small hole on the left side of the connector panel)

Mellanox makes a number of similar devices with the same generation of silicon:

  • SX6005 and SX6012 -- 1U, half width, 12 QSFP connectors arranged in two rows of six. SX6005 is the unmanaged version. SX6012 is the managed switch with the subnet manager and single port Ethernet connectivity
  • SX6015 and SX6018 -- 1U, full width, 18 QSFP connectors in one row of 18. SX6018 is the managed version with a subnet manager and dual Ethernet management ports
  • SX6025 and SX6036 -- 1U, full width, 36 QSFP connectors in two rows of 18. SX6025 is unmanaged. SX6036 is the managed version.

All of these are built on the same Infiniband silicon -- Mellanox SwitchX-2. The management functions on the SX6012, SX6018, and SX6036 are done by an embedded Linux system running on a PowerPC M460EX series CPU attached to the SwitchX-2 silicon over a PCI interface of some sort.

Before breaking it, a diversion into EMC's SwitchOS

At power on time, a tiny bit of hardware initialization is done and then U-Boot (a free as in freedom) bootloader is given control.

Here's what it looks is dumped to the console when power is applied and the automatic boot is interrupted:


U-Boot 2009.01 SX_PPC_M460EX SX_3.2.0330-82-EMC ppc (Feb 27 2013 - 12:13:42)

CPU:   AMCC PowerPC 460EX Rev. B at 1000 MHz (PLB=166, OPB=83, EBC=83 MHz)
       Security/Kasumi support
       Bootstrap Option H - Boot ROM Location I2C (Addr 0x52)
       Internal PCI arbiter disabled
       32 kB I-Cache 32 kB D-Cache
Board: Mellanox PPC460EX Board
FDEF:  No
I2C:   ready
DRAM:   2 GB (ECC enabled, 333 MHz, CL3)
FLASH: 16 MB
NAND:  1024 MiB
PCI:   Bus Dev VenId DevId Class Int
PCIE0: link is not up.
PCIE1: successfully set as root-complex
        01  00  15b3  c738  0c06  00
Net:   ppc_4xx_eth0, ppc_4xx_eth1
Hit any key to stop autoboot:  0 
=>

Nothing too super exciting there, but it does tell us what we're looking at. Note that u-Boot is running and we have its CLI available to use. A little bit more info about the hardware is available using the bdinfo U-Boot command:

=> bdinfo
memstart    = 0x00000000
memsize     = 0x80000000
flashstart  = 0xFF000000
flashsize   = 0x01000000
flashoffset = 0x00000000
sramstart   = 0x00000000
sramsize    = 0x00000000
bootflags   = 0xFFFE0218
intfreq     =   1000 MHz
busfreq     = 166.667 MHz
ethaddr     = 00:02:C9:63:EF:EE
eth1addr    = 00:02:C9:63:EF:EF
IP addr     = 172.17.255.120
baudrate    =   9600 bps
=>

printenv will give a feel for what EMC (before Dell bought them all up) has done pre-boot software wise:

=> printenv
bootcmd=run flash_jffs2
baudrate=9600
loads_echo=
autoload=n
hostname=mlnx460ex
netdev=eth0
nfsargs=setenv bootargs root=/dev/nfs rw nfsroot=${serverip}:${rootpath}
ramargs=setenv bootargs root=/dev/ram rw
addip=setenv bootargs ${bootargs} ip=${ipaddr}:${serverip}:${gatewayip}:${netmask}:${hostname}:${netdev}:off panic=1
addtty=setenv bootargs ${bootargs} console=ttyS0,${baudrate}
addmisc=setenv bootargs ${bootargs}
initrd_high=30000000
kernel_addr_r=400000
fdt_addr_r=800000
ramdisk_addr_r=C00000
hostname=mlnx460ex
ramdisk_file=mlnx460ex/uRamdisk
rootpath=/opt/eldk/ppc_4xxFP
flash_self=run ramargs addip addtty addmisc;bootm ${kernel_addr} ${ramdisk_addr} ${fdt_addr}
flash_nfs=run nfsargs addip addtty addmisc;bootm ${kernel_addr} - ${fdt_addr}
net_nfs=tftp ${kernel_addr_r} ${bootfile}; tftp ${fdt_addr_r} ${fdt_file}; run nfsargs addip addtty addmisc;bootm ${kernel_addr_r} - ${fdt_addr_r}
net_self_load=tftp ${kernel_addr_r} ${bootfile};tftp ${fdt_addr_r} ${fdt_file};tftp ${ramdisk_addr_r} ${ramdisk_file};
net_self=run net_self_load;run ramargs addip addtty addmisc;bootm ${kernel_addr_r} ${ramdisk_addr_r} ${fdt_addr_r}
fdt_file=mlnx460ex/mlnx460ex.dtb
flash_self_old=run ramargs addip addtty addmisc;bootm ${kernel_addr} ${ramdisk_addr}
flash_nfs_old=run nfsargs addip addtty addmisc;bootm ${kernel_addr}
net_nfs_old=tftp ${kernel_addr_r} ${bootfile};run nfsargs addip addtty addmisc;bootm ${kernel_addr_r}
load=tftp 200000 mlnx460ex/u-boot.bin
update=protect off 0xFFFA0000 FFFFFFFF;era 0xFFFA0000 FFFFFFFF;cp.b ${fileaddr} 0xFFFA0000 ${filesize};setenv filesize;saveenv
upd=run load update
dhcp_vendor-class-identifier=bootmfg:hwname:mlnx460ex:
clear_filesize=setenv filesize
mfg_dir=mlnx460ex
mfg_args=setenv bootargs root=/dev/ram rw ramdisk_size=${mfg_ramdisk_size} ${mfg_extra_args}
mfg_common_args=run addtty addmisc
mfg_load=tftp ${kernel_addr_r} ${mfg_root}${mfg_dir}/${mfg_kernel_file};tftp ${fdt_addr_r} ${mfg_root}${mfg_dir}/${mfg_fdt_file};tftp ${ramdisk_addr_r} ${mfg_root}${mfg_dir}/${mfg_ramdisk_file}
mfg_nodhcp=echo "Manufacture will TFTP from directory ${mfg_root}${mfg_dir}, and boot";echo; run clear_filesize ; run mfg_load;if test 0${filesize} -gt 0; then echo Booting mfg ; run mfg_args mfg_common_args;bootm ${kernel_addr_r} ${ramdisk_addr_r} ${fdt_addr_r} ; else ; echo Failed mfg load ; fi
mfg=echo "Manufacture will DHCP, TFTP from directory ${mfg_root}${mfg_dir}, and boot";echo; dhcp; run clear_filesize ; run mfg_load;if test 0${filesize} -gt 0; then echo Booting mfg ; run mfg_args mfg_common_args;bootm ${kernel_addr_r} ${ramdisk_addr_r} 
${fdt_addr_r} ; else ; echo Failed mfg load ; fi
menu_file=menu.img
menu_load=tftp ${menu_addr_r} ${mfg_root}${mfg_dir}/${menu_file}; if test $? -ne 0; then run clear_filesize ; echo Download failed ;fi
menu_usb_load_ext2=usb start; ext2load usb ${mfg_usb_dev}:${mfg_usb_part} ${menu_addr_r} ${mfg_usb_root}${mfg_usb_dir}/${menu_file};
menu_usb_load_fat=usb start; fatload usb ${mfg_usb_dev}:${mfg_usb_part} ${menu_addr_r} ${mfg_usb_root}${mfg_usb_dir}/${menu_file};
menu_usb_load=if test "x${mfg_usb_fstype}" = "xext2"; then run menu_usb_load_ext2 ; else ; run menu_usb_load_fat ; fi
menu_nodhcp=run clear_filesize ; run menu_load ; if test 0${filesize} -gt 0; then autoscr ${menu_addr_r}; else ; echo Failed menu load ; fi
menu=dhcp ; run clear_filesize ; run menu_load ; if test 0${filesize} -gt 0; then autoscr ${menu_addr_r}; else ; echo Failed menu load ; fi
fw_file=u-boot.bin
fw_load=tftp ${fw_addr_r} ${mfg_root}${mfg_dir}/${fw_file}; if test $? -ne 0; then run clear_filesize ; echo Download failed ;fi
fw_usb_load_ext2=usb start; ext2load usb ${mfg_usb_dev}:${mfg_usb_part} ${fw_addr_r} ${mfg_usb_root}${mfg_usb_dir}/${fw_file};
fw_usb_load_fat=usb start; fatload usb ${mfg_usb_dev}:${mfg_usb_part} ${fw_addr_r} ${mfg_usb_root}${mfg_usb_dir}/${fw_file};
fw_usb_load=if test "x${mfg_usb_fstype}" = "xext2"; then run fw_usb_load_ext2 ; else ; run fw_usb_load_fat ; fi
fw_update_raw=protect off 0xFFFA0000 FFFFFFFF;erase 0xFFFA0000 FFFFFFFF;cp.b ${fw_addr_r} 0xFFFA0000 ${filesize};cmp.b ${fw_addr_r} 0xFFFA0000 ${filesize};setenv filesize; saveenv
fw_usb_update=run clear_filesize ; run fw_usb_load ; if test 0${filesize} -gt 0; then run fw_update_raw ; else ; echo Failed update load ; fi
fw_update_nodhcp=run clear_filesize ; run fw_load ; if test 0${filesize} -gt 0; then run fw_update_raw ; else ; echo Failed update load ; fi
fw_update=dhcp ; run clear_filesize ; run fw_load ; if test 0${filesize} -gt 0; then run fw_update_raw ; else ; echo Failed update load ; fi
boot_common_args=run addtty addmisc
mfg_usb_dir=mlnx460ex
mfg_usb_load_ext2=usb start; echo "Loading ${mfg_kernel_file}";ext2load usb ${mfg_usb_dev}:${mfg_usb_part} ${kernel_addr_r} ${mfg_usb_root}${mfg_usb_dir}/${mfg_kernel_file};echo "Loading ${mfg_fdt_file}"; ext2load usb ${mfg_usb_dev}:${mfg_usb_part} ${fdt_addr_r} ${mfg_usb_root}${mfg_usb_dir}/${mfg_fdt_file};echo "Loading ${mfg_ramdisk_file}"; ext2load usb ${mfg_usb_dev}:${mfg_usb_part} ${ramdisk_addr_r} ${mfg_usb_root}${mfg_usb_dir}/${mfg_ramdisk_file}
mfg_usb_load_fat=usb start; echo "Loading ${mfg_kernel_file}";fatload usb ${mfg_usb_dev}:${mfg_usb_part} ${kernel_addr_r} ${mfg_usb_root}${mfg_usb_dir}/${mfg_kernel_file};echo "Loading ${mfg_fdt_file}"; fatload usb ${mfg_usb_dev}:${mfg_usb_part} ${fdt_addr_r} ${mfg_usb_root}${mfg_usb_dir}/${mfg_fdt_file};echo "Loading ${mfg_ramdisk_file}"; fatload usb ${mfg_usb_dev}:${mfg_usb_part} ${ramdisk_addr_r} ${mfg_usb_root}${mfg_usb_dir}/${mfg_ramdisk_file}
mfg_usb_load=if test "x${mfg_usb_fstype}" = "xext2"; then run mfg_usb_load_ext2 ; else ; run mfg_usb_load_fat ; fi
mfg_usb=echo "Manufacture will load from USB directory ${mfg_usb_root}${mfg_usb_dir}, and boot"; echo; run clear_filesize ; run mfg_usb_load; if test 0${filesize} -gt 0; then echo Booting mfg ; run mfg_args mfg_common_args;bootm ${kernel_addr_r} ${ramdisk_addr_r} 
${fdt_addr_r} ; else ; echo Failed mfg load ; fi
kernel_addr=ff000000
fdt_addr=ff1e0000
ramdisk_addr=ff200000
fw_addr_r=400000
menu_addr_r=B00000
pciconfighost=1
pcie_mode=RP:RP
autoload=no
rootdev=/dev/mtdblock6
boot_usb_ext2_loc_1=run usb_args_loc_1 boot_common_args;echo "Loading ${boot_kernel_file}";ext2load usb ${boot_usb_dev}:${boot_usb_part_loc_1} ${kernel_addr_r} ${boot_usb_root}${boot_usb_dir}/${boot_kernel_file};echo "Loading ${boot_fdt_file}";ext2load usb ${boot_usb_dev}:${boot_usb_part_loc_1} ${fdt_addr_r} ${boot_usb_root}${boot_usb_dir}/${boot_fdt_file};bootm ${kernel_addr_r} - ${fdt_addr_r}
boot_usb_ext2_loc_2=run usb_args_loc_2 boot_common_args;echo "Loading ${boot_kernel_file}";ext2load usb ${boot_usb_dev}:${boot_usb_part_loc_2} ${kernel_addr_r} ${boot_usb_root}${boot_usb_dir}/${boot_kernel_file};echo "Loading ${boot_fdt_file}";ext2load usb ${boot_usb_dev}:${boot_usb_part_loc_2} ${fdt_addr_r} ${boot_usb_root}${boot_usb_dir}/${boot_fdt_file};bootm ${kernel_addr_r} - ${fdt_addr_r}
boot_usb_fat_loc_1=run usb_args_loc_1 boot_common_args;echo "Loading ${boot_kernel_file}";fatload usb ${boot_usb_dev}:${boot_usb_part_loc_1} ${kernel_addr_r} ${boot_usb_root}${boot_usb_dir}/${boot_kernel_file};echo "Loading ${boot_fdt_file}";fatload usb ${boot_usb_dev}:${boot_usb_part_loc_1} ${fdt_addr_r} ${boot_usb_root}${boot_usb_dir}/${boot_fdt_file};bootm ${kernel_addr_r} - ${fdt_addr_r}
boot_usb_fat_loc_2=run usb_args_loc_2 boot_common_args;echo "Loading ${boot_kernel_file}";fatload usb ${boot_usb_dev}:${boot_usb_part_loc_2} ${kernel_addr_r} ${boot_usb_root}${boot_usb_dir}/${boot_kernel_file};echo "Loading ${boot_fdt_file}";fatload usb ${boot_usb_dev}:${boot_usb_part_loc_2} ${fdt_addr_r} ${boot_usb_root}${boot_usb_dir}/${boot_fdt_file};bootm ${kernel_addr_r} - ${fdt_addr_r}
mfg_kernel_file=vmlinuz
mfg_ramdisk_file=rootfs
mfg_ramdisk_size=180224
mfg_fdt_file=fdt
mfg_usb_dev=0
mfg_usb_part=1
mfg_usb_fstype=fat
mfg_usb_root=/
boot_kernel_file=vmlinuz
boot_fdt_file=fdt
boot_usb_dev=0
boot_usb_part_loc_1=2
boot_usb_part_loc_2=3
boot_usb_root_loc_1=/dev/sda5
boot_usb_root_loc_2=/dev/sda6
usb_args_loc_1=setenv bootargs root=${boot_usb_root_loc_1} ro reset_button=${reset_button} rootdelay=8 ${image_kernel_args} ${extra_args}
usb_args_loc_2=setenv bootargs root=${boot_usb_root_loc_2} ro reset_button=${reset_button} rootdelay=8 ${image_kernel_args} ${extra_args}
jffs2_args=setenv bootargs root=${rootdev} rootfstype=jffs2 ro reset_button=${reset_button} ${image_kernel_args} ${extra_args}
mfg_extra_args=ramdisk=65536
ethaddr=00:02:C9:63:EF:EE
eth1addr=00:02:C9:63:EF:EF
ethact=ppc_4xx_eth0
bootfile=pxelinux.0
emc_fabric=B
autoscr=no
filesize=175B
fileaddr=400000
gatewayip=10.10.4.1
netmask=255.255.252.0
ipaddr=172.17.255.120
serverip=172.17.255.252
script_rev=3.3.0
script_date=7.12.13
autostart=yes
bootdelay=5
emcram_addr=400000
emcfl_one_start=ff400000
emcfl_one_end=ff5fffff
emcfl_two_start=ff600000
emcfl_two_end=ff7fffff
emcload_addr=ff400000
load_dir=.
binary_name=ibsw.bin.load
restore_defaults=setenv load_dir .; setenv binary_name ibsw.bin.load; saveenv;
setip_176=setenv ipaddr 192.168.176.190; setenv serverip 192.168.176.253; saveenv
setip_177=setenv ipaddr 192.168.177.191; setenv serverip 192.168.177.253; saveenv
setip_linux=setenv ipaddr 192.168.10.10; setenv serverip 192.168.10.1; saveenv
setip_16=setenv ipaddr 172.16.255.120; setenv serverip 172.16.255.252; saveenv
setip_17=setenv ipaddr 172.17.255.120; setenv serverip 172.17.255.252; saveenv
set_fab_a=setenv emc_fabric A; saveenv
set_fab_b=setenv emc_fabric B; saveenv
emcpxe=tftp ${emcram_addr} ${load_dir}/${binary_name}
emcflash=ping $serverip; bootm ${emcload_addr}
mlxlinux=run jffs2_args boot_common_args;bootm ${kernel_addr} - ${fdt_addr}
flash_jffs2=run emcflash
menu_usb=run emcflash
emcburn_one=prot off ${emcfl_one_start} ${emcfl_one_end}; erase ${emcfl_one_start} ${emcfl_one_end}; setenv autostart no; tftp ${emcram_addr} ${load_dir}/${binary_name}; iminfo ${emcram_addr}; cp.b ${emcram_addr} ${emcfl_one_start} ${filesize}; iminfo 
${emcfl_one_start}; setenv autostart yes; prot on ${emcfl_one_start} ${emcfl_one_end};
emcburn_two=prot off ${emcfl_two_start} ${emcfl_two_end}; erase ${emcfl_two_start} ${emcfl_two_end}; setenv autostart no; tftp ${emcram_addr} ${load_dir}/${binary_name}; iminfo ${emcram_addr}; cp.b ${emcram_addr} ${emcfl_two_start} ${filesize}; iminfo 
${emcfl_two_start}; setenv autostart yes; prot on ${emcfl_two_start} ${emcfl_two_end};
emcload_one=setenv emcload_addr ${emcfl_one_start}; saveenv
emcload_two=setenv emcload_addr ${emcfl_two_start}; saveenv
script_get=setenv autostart no; tftp ${emcram_addr} uboot_start_script.img; setenv autostart yes; imi ${emcram_addr};
script_exe=autoscr ${emcram_addr}
boot_emcpxe=setenv flash_jffs2 run emcpxe; setenv menu_usb run emcpxe; saveenv
boot_emcflash=setenv flash_jffs2 run emcflash; setenv menu_usb run emcflash; saveenv
boot_mlxlinux=setenv flash_jffs2 run mlxlinux; setenv menu_usb run mlxlinux; saveenv
emchelp=echo All commands are preceeded by \"run \"; echo setip_176 ...... set ip and server ip to 192.176 subnet; echo setip_177 ...... set ip and server ip to 192.177 subnet; echo setip_16 ....... set ip and server ip to 172.16 subnet; echo setip_17 ....... set 
ip and server ip to 172.17 subnet;echo setip_linux .... set ip and server ip to linux default (10.10); echo emcburn_one .... burn ibsw.bin.load to the first symmk flash location;echo emcburn_two .... burn ibsw.bin.load to the second symmk flash location;echo 
emcload_one .... set emcflash to point to the first symmk flash location;echo emcload_two .... set emcflash to point to the second symmk flash location;echo script_get ..... Load a new version of the uboot_start_script.img;echo script_exe ..... Execute the script 
loaded by script_get (check CRC first!);echo script_status .. Dump revision and build date of last run script; echo boot_emcpxe .... Set PXE boot to be the default boot;echo emcpxe ......... Run PXE boot (just this time); echo boot_emcflash .. Set the Symmk in 
flash to be the default boot;echo emcflash ....... Boot (just this time) from flash;echo boot_mlxlinux .. Set the MLX linux version to be the default boot;echo mlxlinux ....... Run MLX linux (just this time);echo set_fab_a ...... Set switch to be fabric A; echo 
set_fab_b ...... Set switch to be fabric B
script_status=echo Last Script Executed; echo Script Revision: 3.3.0; echo Script Built on: 7.12.13;
stdin=serial
stdout=serial
stderr=serial
reset_button=0
ver=U-Boot 2009.01 SX_PPC_M460EX SX_3.2.0330-82-EMC ppc (Feb 27 2013 - 12:13:42)

Environment size: 12502/16380 bytes
=>

All that was just the bootloader. Where's SwitchOS?

So, yeah. Last section is poorly named. Remember how the autoboot process was stopped last time? Well, here's what we get without interrupting it...

[...]
Hit any key to stop autoboot:  0 
Waiting for PHY auto negotiation to complete... done
ENET Speed is 1000 Mbps - FULL duplex connection (EMAC0)
Using ppc_4xx_eth0 device
ping failed; host 172.17.255.252 is not alive
   Loading Kernel Image ... OK
ABCDEFdrv_table_install:         syslog at 0x00033d84 with minor  0 (DRV_SETUP) (DRV_INIT)
drv_table_install:         isrlog at 0x00033d84 with minor  1 (DRV_INIT)
drv_table_install:  userinterface at 0x00004db0 with minor  0 (DRV_SETUP) (DRV_INIT)
drv_table_install:          stty0 at 0x0006e804 with minor  0 (DRV_SETUP) (DRV_INIT)
drv_table_install:           eth0 at 0x0006f194 with minor  0 (DRV_SETUP) (DRV_INIT)
drv_table_install:           i2c0 at 0x00079d94 with minor  0 (DRV_SETUP) (DRV_INIT)
drv_table_install:           i2c1 at 0x00079d94 with minor  1 (DRV_INIT)
drv_table_install:         itcpip at 0x0003c05c with minor  0tcpipInit: Starting internal TCP/IP stack.
 (DRV_SETUP) (DRV_INIT)
kernel_main: Drivers installed, installing INIT process with stack size = 8192.
Bar [ 0 ] is the first memory bar
sk_init_main: Starting process based initialization - 8590.
02:60:48:10:ff:78 UDP socket 3 created
TCP socket 4 created
sk_init_main: Process based initialization complete - 8590.
sk_init_main: Installing task table.
task_table_install:        console at 0x00015c24 stack 0x00452410/26624 : 4
task_table_install:          inetd at 0x000359a0 stack 0x00470c10/8192 : 5
task_table_install:       poll_cqs at 0x000aa44c stack 0x00458c10/8192 : 6
task_table_install:     poll_ports at 0x000aa584 stack 0x0045ac10/8192 : 7
task_table_install:        env_mon at 0x00082f94 stack 0x0045ec10/8192 : 8
task_table_install:    env_bin_api at 0x000143a0 stack 0x00460c10/8192 : 9
task_table_install: incoming_fw_files at 0x00013e84 stack 0x00462c10/8192 : 10
sk_init_main: Task table installed.  Starting tasks and exiting.
----------------------------- Board Info -----------------------------
* Chasis Type         : STINGRAY
* Number of Ports     : 18
* U-Boot Revision     : 3.2.330
* Firmware Revision   : 9.9.1030
* INI file Revision   : 0x21010009
* SwitchOS Revision   : 1.282
* SwitchOS Build Date : 2013-07-23
* SwitchOS Build Time : 15:40:30
* SwitchOS Build Path : /emc/tdowning/ppc460_release/july_23_2013
----------------------------------------------------------------------
15:33:09 12/16/2017
Switch-B(4)>

It almost looks like it's going to netboot if it can find a server to give it an image. But we don't have one and that's totally OK. The Switch-B(4)> is the CLI prompt. There's a help command that lists available commands, but it does not provide a lot of detail. But here are a few potentially useful commands:

  • env show inventory lists out vital product data (VPD) for the replaceable components: switch chassis, power supplies, and fan boards
  • env show fans how fast are they spinning? 0 is probably not good
  • env set uid led blink will blink the blue LED on the left side on the switch's connector panel -- useful for finding the switch you want to work on
  • env set port bad led on is much the same, but activates an orange LED just above the blue UID indicator
  • baz swportinfo 1 will provide port statistics and the like. Change 1 to the number of the switch port of interest
  • build info about the software build
  • board a little bit about the hardware
  • info a bit about the hardware and SwitchOS
  • env show alarms pretty much self explanatory

Putting MLNX-OS on it

As mentioned above, I desire to have a real OS on these switches. Found some leads over on https://forums.servethehome.com/ which said conversion is possible, but "PM me for more details." The following is a translation of the nice gentleman's "Mellanox Switch Conversion Guide" document into the procedure I have used.

Pre-requisiteses

(Yes. I know I misspelled that.)

Make switch not boot automatically

Apply power, interrupt autoboot, run

=> setenv autostart off
=> setenv autoload off
=> saveenv

We'll revert that by running

=> setenv autostart yes
=> setenv autoload n
=> saveenv

at some point later on. (Those are the settings as the switch arrived.)

Get bootable bits from MLNX-OS

The image-PPC_M460EX-3.6.1002.img OS images are actually ZIP archives containing a compressed tar file image-PPC_M460EX-ppc-m460ex-20160609-202426.tgz and some package meta-data that we do not need. From image-PPC_M460EX-ppc-m460ex-20160609-202426.tgz, extract the kernel image and device tree files:

$ gzip -dc image-PPC_M460EX-ppc-m460ex-20160609-202426.tgz | tar xvvf - ./boot/vmlinuz-uni ./boot/fdt-uni

And copy boot/vmlinuz-uni and boot/fdt-uni to a TFTP server that the switch will be able to retrieve these files from. Since my switch has an IP address of 172.17.255.120 as it was delivered, I'm simply adding a secondary IP address to the existing TFTP server instead of changing the switch's IP address.

Let's run Linux!

"Linux" in this section is just the kernel, not an OS -- it will panic the first time through and really not do anything useful.

One these SX6018 switches, the on-board flash has areas for 2 OS images. We're going to overwrite the first one (using U-Boot to TFTP and then flash it) and then boot it. The kernel will not have a root filesystem yet and it will not do anything useful. But it's a step toward having a working MLNX-OS.

We need both the kernel and the matching device tree file from the MLNX-OS distro. The strategy is "use TFTP to transfer kernel image to RAM, save it to flash; use TFTP to transfer the device tree, save it to flash." Here we go:

=> tftp 400000 172.17.255.118:MLNX-OS_PPC_M460EX_3.6.1002/boot/vmlinuz-uni
Using ppc_4xx_eth0 device
TFTP from server 172.17.255.118; our IP address is 172.17.255.120
Filename 'MLNX-OS_PPC_M460EX_3.6.1002/boot/vmlinuz-uni'.
Load address: 0x400000
Loading: #################################################################
         #################################################################
         ##
done
Bytes transferred = 1927366 (1d68c6 hex)
=> protect off FF000000 FF1FFFFF
................ done
Un-Protected 16 sectors
=> erase FF000000 FF1FFFFF

................ done
Erased 16 sectors
=> cp.b ${fileaddr} FF000000 ${filesize}
Copy to Flash... done
=> tftp 800000 172.17.255.118:MLNX-OS_PPC_M460EX_3.6.1002/boot/fdt-uni
Using ppc_4xx_eth0 device
TFTP from server 172.17.255.118; our IP address is 172.17.255.120
Filename 'MLNX-OS_PPC_M460EX_3.6.1002/boot/fdt-uni'.
Load address: 0x800000
Loading: #
done
Bytes transferred = 10017 (2721 hex)
=> cp.b ${fileaddr} FF1E0000 ${filesize}
Copy to Flash... done
=> protect on FF000000 FF1FFFFF
................ done
Protected 16 sectors
=> imls
Legacy Image at FF000000:
   Verifying Checksum ... OK
Legacy Image at FF400000:
   Verifying Checksum ... OK
Legacy Image at FF600000:
   Verifying Checksum ... OK
=>

File transfers have been finished. The Linux kernel and device tree files have been saved to flash. U-Boot has found the newly transferred image and thinks it's valid (this is what the imls was about.) Let us see what happens when it is run...

=> run mlxlinux
Using mlnx460ex machine description
Linux version 3.10.27-MELLANOXuni-m460ex (@) (gcc version 4.7.2 (GCC) ) PPC_M460EX 3.6.1002 #1 2016-06-09 20:24:26
Zone ranges:
  DMA      [mem 0x00000000-0x2fffffff]
  Normal   empty
  HighMem  [mem 0x30000000-0x7fffffff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x00000000-0x7fffffff]
MMU: Allocated 1088 bytes of context maps for 255 contexts
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 522752
Kernel command line: root=/dev/mtdblock6 rootfstype=jffs2 ro reset_button=0 console=ttyS0,9600
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Sorting __ex_table...
Memory: 2075976k/2097152k available (3556k kernel code, 21176k reserved, 196k data, 109k bss, 148k init)
Kernel virtual memory layout:
  * 0xfffcf000..0xfffff000  : fixmap
  * 0xffc00000..0xffe00000  : highmem PTEs
  * 0xffa00000..0xffc00000  : consistent mem
  * 0xffa00000..0xffa00000  : early ioremap
  * 0xf1000000..0xffa00000  : vmalloc & ioremap
SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Preemptible hierarchical RCU implementation.
NR_IRQS:512 nr_irqs:512 16
UIC0 (32 IRQ sources) at DCR 0xc0
UIC1 (32 IRQ sources) at DCR 0xd0
UIC2 (32 IRQ sources) at DCR 0xe0
UIC3 (32 IRQ sources) at DCR 0xf0
clocksource: timebase mult[800000] shift[23] registered
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512
devtmpfs: initialized
NET: Registered protocol family 16
256k L2-cache enabled
PCIE0: Checking link...
PCIE0: No device detected.
PCI host bridge /plb/pciex@d00000000 (primary) ranges:
 MEM 0x0000000e00000000..0x0000000e7fffffff -> 0x0000000080000000 
 MEM 0x0000000f00000000..0x0000000f000fffff -> 0x0000000000000000 
  IO 0x0000000f80000000..0x0000000f8000ffff -> 0x0000000000000000
4xx PCI DMA offset set to 0x00000000
4xx PCI DMA window base to 0x0000000000000000
DMA window size 0x0000000080000000
PCIE0: successfully set as root-complex
PCIE1: Checking link...
PCIE1: Device detected, waiting for link...
PCIE1: link is up !
PCI host bridge /plb/pciex@d20000000 (primary) ranges:
 MEM 0x0000000e80000000..0x0000000effffffff -> 0x0000000080000000 
 MEM 0x0000000f00100000..0x0000000f001fffff -> 0x0000000000000000 
  IO 0x0000000f80010000..0x0000000f8001ffff -> 0x0000000000000000
4xx PCI DMA offset set to 0x00000000
4xx PCI DMA window base to 0x0000000000000000
DMA window size 0x0000000080000000
PCIE1: successfully set as root-complex
PCI: Probing PCI hardware
PCI host bridge to bus 0000:40
pci_bus 0000:40: root bus resource [io  0xfffe0000-0xfffeffff] (bus address [0x0000-0xffff])
pci_bus 0000:40: root bus resource [mem 0xe00000000-0xe7fffffff] (bus address [0x80000000-0xffffffff])
pci_bus 0000:40: root bus resource [mem 0xf00000000-0xf000fffff] (bus address [0x00000000-0x000fffff])
pci_bus 0000:40: root bus resource [bus 40-ff]
PCI: Hiding 4xx host bridge resources 0000:40:00.0
pci 0000:40:00.0: PCI bridge to [bus 41-7f]
PCI host bridge to bus 0001:80
pci_bus 0001:80: root bus resource [io  0x0000-0xffff]
pci_bus 0001:80: root bus resource [mem 0xe80000000-0xeffffffff] (bus address [0x80000000-0xffffffff])
pci_bus 0001:80: root bus resource [mem 0xf00100000-0xf001fffff] (bus address [0x00000000-0x000fffff])
pci_bus 0001:80: root bus resource [bus 80-ff]
PCI: Hiding 4xx host bridge resources 0001:80:00.0
pci 0001:80:00.0: PCI bridge to [bus 81-bf]
pci 0000:40:00.0: disabling bridge window [io  0x0000-0xffffffffffffffff] to [bus 41-7f] (unused)
pci 0000:40:00.0: disabling bridge window [mem 0x00000000-0xffffffffffffffff pref] to [bus 41-7f] (unused)
pci 0000:40:00.0: disabling bridge window [mem 0x00000000-0xffffffffffffffff] to [bus 41-7f] (unused)
pci 0001:80:00.0: disabling bridge window [io  0x0000-0xffffffffffffffff] to [bus 81-bf] (unused)
pci 0000:40:00.0: PCI bridge to [bus 41-7f]
pci 0001:80:00.0: BAR 8: assigned [mem 0xe80000000-0xe800fffff]
pci 0001:81:00.0: BAR 0: assigned [mem 0xe80000000-0xe800fffff 64bit]
pci 0001:80:00.0: PCI bridge to [bus 81-bf]
pci 0001:80:00.0:   bridge window [mem 0xe80000000-0xe800fffff]
bio: create slab <bio-0> at 0
Switching to clocksource timebase
NET: Registered protocol family 2
TCP established hash table entries: 8192 (order: 4, 65536 bytes)
TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 8192 bind 8192)
TCP: reno registered
UDP hash table entries: 512 (order: 1, 8192 bytes)
UDP-Lite hash table entries: 512 (order: 1, 8192 bytes)
Configuring USB GPIO 16 + 19
bounce pool size: 64 pages
jffs2: version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
msgmni has been set to 1494
alg: No test for stdrng (krng)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
serial8250.0: ttyS0 at MMIO 0x4ef600300 (irq = 20) is a U6_16550A
console [ttyS0] enabled
4ef600300.serial: ttyS0 at MMIO 0x4ef600300 (irq = 20) is a 16550
brd: module loaded
4ff000000.nor_flash: Found 1 x16 devices at 0x0 in 16-bit bank. Manufacturer ID 0x000089 Chip ID 0x00881e
Intel/Sharp Extended Query Table at 0x010A
Intel/Sharp Extended Query Table at 0x010A
Intel/Sharp Extended Query Table at 0x010A
Intel/Sharp Extended Query Table at 0x010A
Intel/Sharp Extended Query Table at 0x010A
Using buffer write method
Using auto-unlock on power-up/resume
cfi_cmdset_0001: Erase suspend on write enabled
6 ofpart partitions found on MTD device 4ff000000.nor_flash
Creating 6 MTD partitions on "4ff000000.nor_flash":
0x000000000000-0x0000001e0000 : "KERNEL_1"
0x0000001e0000-0x000000200000 : "FDT_1"
0x000000200000-0x0000003e0000 : "KERNEL_2"
0x0000003e0000-0x000000400000 : "FDT_2"
0x000000f80000-0x000000fa0000 : "UBOOTENV"
0x000000fa0000-0x000001000000 : "UBOOT"
NAND device: Manufacturer ID: 0xec, Chip ID: 0xd3 (Samsung NAND 1GiB 3,3V 8-bit), 1024MiB, page size: 2048, OOB size: 64
2 NAND chips detected
Scanning device for bad blocks
Bad eraseblock 898 at 0x000007040000
Bad eraseblock 11548 at 0x00005a380000
Bad eraseblock 14499 at 0x000071460000
4 ofpart partitions found on MTD device 4e0000000.ndfc.nand
Creating 4 MTD partitions on "4e0000000.ndfc.nand":
0x000000000000-0x000020000000 : "ROOT_1"
0x000020000000-0x000040000000 : "ROOT_2"
0x000040000000-0x000046400000 : "CONFIG"
0x000046400000-0x00007c000000 : "VAR"
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
PPC 4xx OCP EMAC driver, version 3.54
MAL v2 /plb/mcmal, 2 TX channels, 16 RX channels
ZMII /plb/opb/emac-zmii@ef600d00 initialized
RGMII /plb/opb/emac-rgmii@ef601500 initialized with MDIO support
TAH /plb/opb/emac-tah@ef601350 initialized
TAH /plb/opb/emac-tah@ef601450 initialized
/plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
eth0: EMAC-0 /plb/opb/ethernet@ef600e00, MAC 00:02:c9:63:ef:ee
eth0: found Generic MII PHY (0x00)
/plb/opb/emac-rgmii@ef601500: input 1 in RGMII mode
eth1: EMAC-1 /plb/opb/ethernet@ef600f00, MAC 00:02:c9:63:ef:ef
eth1: found Generic MII PHY (0x01)
i2c /dev entries driver
ibm-iic 4ef600700.i2c: using standard (100 kHz) mode
rtc-ds1307 0-0068: SET TIME!
rtc-ds1307 0-0068: rtc core: registered ds1338 as rtc0
rtc-ds1307 0-0068: 56 bytes nvram
ibm-iic 4ef600800.i2c: using standard (100 kHz) mode
TCP: cubic registered
rtc-ds1307 0-0068: setting system clock to 2017-12-17 15:10:43 UTC (1513523443)
jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000000: 0xaa55 instead
jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000004: 0xaa55 instead
jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000008: 0xaa55 instead
jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0000000c: 0xaa55 instead
jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000010: 0xaa55 instead
jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000014: 0xaa55 instead
jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000018: 0xaa55 instead
jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0000001c: 0xaa55 instead
jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000020: 0xaa55 instead
jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000024: 0xaa55 instead
jffs2: Further such events for this erase block will not be printed
jffs2: Cowardly refusing to erase blocks on filesystem with no valid JFFS2 nodes
jffs2: empty_blocks 4094, bad_blocks 1, c->nr_blocks 4096
VFS: Cannot open root device "mtdblock6" or unknown-block(31,6): error -5
Please append a correct "root=" boot option; here are the available partitions:
1f00            1920 mtdblock0  (driver?)
1f01             128 mtdblock1  (driver?)
1f02            1920 mtdblock2  (driver?)
1f03             128 mtdblock3  (driver?)
1f04             128 mtdblock4  (driver?)
1f05             384 mtdblock5  (driver?)
1f06          524288 mtdblock6  (driver?)
1f07          524288 mtdblock7  (driver?)
1f08          102400 mtdblock8  (driver?)
1f09          880640 mtdblock9  (driver?)
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(31,6)
CPU: 0 PID: 1 Comm: swapper Not tainted 3.10.27-MELLANOXuni-m460ex PPC_M460EX
Call Trace(ef840000): name=swapper, state=0
[ef845e00] [c0005614] show_stack+0x54/0x150 (unreliable)
[ef845e40] [c02758d8] panic+0xcc/0x204
[ef845e90] [c0354d00] mount_block_root+0x274/0x284
[ef845ee0] [c0355048] prepare_namespace+0x1a0/0x1e8
[ef845ef0] [c0354930] kernel_init_freeable+0x1b0/0x1c4
[ef845f30] [c0001ac0] kernel_init+0x18/0xf8
[ef845f40] [c000afcc] ret_from_kernel_thread+0x5c/0x64
Rebooting in 180 seconds..

after a 3 minute delay, the switch resets (very noticeably -- the fans spin up to max RPM briefly)

Some interesting tidbits gleaned from this:

  • It's Linux 3.10.27 plus Mellanox modifications
  • 2Gibytes of RAM on board
  • U-Boot eventually passed root=/dev/mtdblock6 rootfstype=jffs2 ro reset_button=0 console=ttyS0,9600 as kernel command line arguments
  • There's a real time clock which seems to be functioning
  • And finally, Linux doesn't do much if it doesn't have a root filesystem -- if it can't start the first user-land process (init/upstart/systemd/whatever) it will panic.

User-mode stuffs (making a switch bootstrap OS taking bits from MLNX-OS)

Above we've noted that the kernel is being told to look for a JFFS2 image. We'll need to build one now. The JFFS2 creation utility, mkfs.jffs2 takes a directory containing the desired files as input and creates a JFFS2 image as output. (Much like creating an ISO9660 image.) On Debian, mkfs.jffs2 is contained in the mtd-utils package.

Remember that MLNX-OS image we downloaded from Mellanox's web site? And how we pulled a kernel and flattened device tree from it? Well now we're going to build something for the kernel to run.

Bits of shell script here documenting the contents of the JFFS2 directory

# TAKE NOTE:
#   $jffs2root is the destination directory -- the jffs2 image file will be created from its contents
#   $mlnxosroot is the unpacked MLNX-OS distribution -- some files from here will be copied into ${jffs2root}
#
# Some of these items will need to be run with elevated privileges -- make use of sudo as needed.

cd ${jffs2root}
mkdir -p -m 775 bin dev etc/init.d etc/rc.d lib sbin usr var/run var/lib
mkdir -p -m 0 proc sys
mkdir -p -m 1777 tmp var/tmp
mkdir -p -m 700 root
cd usr
ln -s ../../bin ./bin    # makes scp and sftp work a little easier
cd ..
cp -p ${mlnxosroot}/sbin/busybox bin/

# busybox decides what it should act as based on its argv[0], so all these symlinks to busybox let us 
# use a single binary to do many different things
cd ${jffs2root}/bin
for bb_applet in \
    [ [[ arp arping ash awk base64 basename bbconfig blkid blockdev brctl \
    cal cat chgrp chmod chown chroot chrt chvt cksum clear cmp comm cp cttyhack cut \
    date dc dd deallocvt depmod devmem df diff dirname dmesg dnsdomainname dos2unix du dumpkmap \
    echo ed egrep eject env expand expr \
    false fbsplash fdformat fdisk fgconsole fgrep find findfs flock fold free freeramdisk fsync fuser \
    getopt getty grep groups gunzip gzip halt hd hdparm head hexdump hostid hostname hwclock \
    id ifconfig ifdown ifenslave ifup init insmod install ionice iostat ip ipaddr ipcalc ipcrm ipcs iplink iproute iprule \
    kbd_mode kill killall killall5 klogd \
    last less linux32 linux64 linuxrc ln loadfont loadkmap logger login logname logread losetup ls lsmod lsof lspci lsusb \
    makedevs md5sum mdev mesg mkdir mkfifo mknod mkswap mktemp modinfo modprobe more mount mountpoint mpstat mv \
    nameif netstat nice nohup nslookup ntpd od openvt \
    patch pgrep pidof ping ping6 pivot_root pkill pmap poweroff printenv printf ps pstree pwd pwdx \
    rdate rdev readahead readlink realpath reboot renice reset resize rev rm rmdir rmmod route run-parts runlevel \
    script sed seq setarch setconsole setfont setkeycodes setlogcons setserial setsid sh \
    sha1sum sha256sum sha512sum showkey sleep sort split stat strings stty su sulogin sum \
    swapoff swapon switch_root sync sysctl syslogd \
    tac tail tee telnet test tftp time timeout top touch tr traceroute traceroute6 true tty \
    udhcpc udhcpc6 umount uname unexpand uniq unix2dos uptime users usleep uudecode uuencode \
    vconfig vi volname wall watch wc wget which who whoami xargs yes zcat zcip; do
        rm ./${bb_applet}
        ln -s busybox ${bb_applet}
done

# Linux kernel execs /sbin/init as process 1
cd ${jffs2root}/sbin
ln -s ../bin/busybox getty
ln -s ../bin/busybox init

# I prefer GNU bash to the Busybox /bin/sh
cd ${jffs2root}/bin
cp -p ${mlnxosroot}/bin/bash .
# bash also needs libtinfo and libdl
cd ${jffs2root}/lib
cp -p ${mlnxosroot}/lib/libdl-2.17.so .
cp -p ${mlnxosroot}/lib/libtinfo.so.5.9 .
ln -s libdl-2.17.so libdl.so.2
ln -s libtinfo.so.5.9 libtinfo.so.5
# and a very minimal set of terminfo files
cd ../etc
cp -pr ${mlnxosroot}/etc/terminfo .
# These utilities will useful in putting a JFFS2 image directly onto the flash storage
cp -p ${mlnxosroot}/usr/sbin/flash* ${jffs2root}/bin
cp -p ${mlnxosroot}/usr/sbin/nand* ${jffs2root}/bin

# And some SSH client goodness would be nice, too
cd ${jffs2root}
mkdir -m 755 etc etc/ssh
cd ${jffs2root}/bin
cp -p ${mlnxosroot}/usr/bin/ssh .
cp -p ${mlnxosroot}/usr/bin/scp .
cp -p ${mlnxosroot}/usr/bin/sftp .
cd ${jffs2root}/lib
cp -p ${mlnxosroot}/usr/lib/libcrypto.so.1.0.1e .
cp -p ${mlnxosroot}/usr/lib/libcrypt-2.17.so .
cp -p ${mlnxosroot}/lib/libdl-2.17.so .
cp -p ${mlnxosroot}/usr/lib/libz.so.1.2.7 .
cp -p ${mlnxosroot}/lib/libresolv-2.17.so .
cp -p ${mlnxosroot}/lib/librt-2.17.so .
ln -s libcrypto.so.1.0.1e libcrypto.so.10
ln -s libcrypt-2.17.so libcrypt.so.1
ln -s libz.so.1.2.7 libz.so.1
ln -s libresolv-2.17.so libresolv.so.2
ln -s librt-2.17.so librt.so.1
ln -s libdl-2.17.so libdl.so.2
# SSH also needs working NSS libraries
cp -p ${mlnxosroot}/lib/libnss_compat-2.17.so .
cp -p ${mlnxosroot}/lib/libnss_dns-2.17.so .
cp -p ${mlnxosroot}/lib/libnss_files-2.17.so .
ln -s libnss_compat-2.17.so libnss_compat.so.2
ln -s libnss_dns-2.17.so libnss_dns.so.2
ln -s libnss_files-2.17.so libnss_files.so.2

# kernel modules so that we can DHCP and other things...
cd ${mlnxosroot}/lib/modules/3.10.27-MELLANOXuni-m460ex
sudo find kernel -depth -print0 | sudo cpio -pdmv0a ${jffs2root}/lib/modules/3.10.27-MELLANOXuni-m460ex/
sudo cp -p modules.* ${jffs2root}/lib/modules/3.10.27-MELLANOXuni-m460ex/

# GNU tar is here instead of the busybox version
cd ${jffs2root}
cp -p ${mlnxosroot}/bin/tar bin/
cp -p ${mlnxosroot}/usr/sbin/flash_eraseall sbin/

# make an /etc/init.d/rcS that gets /proc and /sys mounted
cp /dev/null etc/init.d/rcS
(echo '#!/bin/sh'; echo; echo 'mount -t proc none /proc'; echo 'mount -t sysfs none /sys') | tee etc/init.d/rcS
chmod 755 etc/init.d/rcS
ln -s ../init.d/rcS etc/rc.d/rc.sysinit

# make an inittab file -- this will get us a working terminal
cp /dev/null etc/inittab
echo '::sysinit:/etc/init.d/rcS' >> etc/inittab
echo '#::askfirst:/bin/sh' >> etc/inittab
echo 'ttyS0:2345:respawn:/sbin/getty -L 9600 ttyS0 vt102' >> etc/inittab
echo '#::respawn:cttyhack sh -l' >> etc/inittab
chmod 644 etc/inittab

# /etc/passwd listing for 'root'  The password a classic UNIX (DES based) hash of "changeme" a common default root password on Sun gear.
cp /dev/null etc/passwd
echo 'root:Xxxgg2TS4senE:0:0::/root:/bin/sh' > etc/passwd
chmod 644 etc/passwd

# /etc/nsswitch.conf tells libc where to look up things like users
cp /dev/null etc/nsswitch.conf
echo passwd: files >> etc/nsswitch.conf
echo group: files >> etc/nsswitch.conf
chmod 644 etc/nsswitch.conf

# Set up the shell's environment
echo "PS1='switch-bootstrap:\w\\\$\ '" > etc/profile

# Copy name make symlinks for shared libraries (ldconfig doesn't want to help here.  :(  )
cd ${mlnxosroot}/lib
cp -p ld-2.17.so libc-2.17.so libcrypt-2.17.so libm-2.17.so libpthread-2.17.so librt-2.17.so ${jffs2root}/lib/
cd ${jffs2root}/lib
ln -s ld-2.17.so ld.so.1
ln -s ld-2.17.so ld-linux-so.2
ln -s libc-2.17.so libc.so.6
ln -s libcrypt-2.17.so libcrypt.so.1
ln -s libm-2.17.so libm.so.6
ln -s libpthread-2.17.so libpthread.so.0
ln -s librt-2.17.so librt.so.1

# device files (we don't need no steeenking udev)  These will definitely need root privileges to create.
cd ${jffs2root}/dev
mknod -m 622 console c 5 1
mknod -m 666 null c 1 3
mknod -m 444 random c 1 8
mknod -m 644 ttyS0 c 4 64
mknod -m 444 urandom c 1 9
mknod -m 666 zero c 1 5

And now, let's create the actual JFFS2 image file.

cd ${jffs2root}/..
sudo /usr/sbin/mkfs.jffs2 --verbose --eraseblock=0x20000 --pagesize=2048 --pad \
    --no-cleanmarkers --big-endian --squash-uids --root=jffs2-root \
    --output=bootstrap-mlnx-os-image-ppc-m460ex-20160609-202426.jffs2

Copy it to the TFTP server (172.17.255.118, remember?) And then, on the switch, see about pulling it down, flashing it, and trying to get the MLNX-OS kernel to run it. U-Boot conversation with the switch follows:

U-Boot 2009.01 SX_PPC_M460EX SX_3.2.0330-82-EMC ppc (Feb 27 2013 - 12:13:42)

CPU:   AMCC PowerPC 460EX Rev. B at 1000 MHz (PLB=166, OPB=83, EBC=83 MHz)
       Security/Kasumi support
       Bootstrap Option H - Boot ROM Location I2C (Addr 0x52)
       Internal PCI arbiter disabled
       32 kB I-Cache 32 kB D-Cache
Board: Mellanox PPC460EX Board
FDEF:  No
I2C:   ready
DRAM:   2 GB (ECC enabled, 333 MHz, CL3)
FLASH: 16 MB
NAND:  1024 MiB
PCI:   Bus Dev VenId DevId Class Int
PCIE0: link is not up.
PCIE1: successfully set as root-complex
        01  00  15b3  c738  0c06  00
Net:   ppc_4xx_eth0, ppc_4xx_eth1
Hit any key to stop autoboot:  0 
=> setenv ipaddr 172.16.10.80
=> setenv gatewayip 172.16.10.3
=> tftp 400000 172.17.0.16:mellanox-sx6018/bootstrap-mlnx-os-image-ppc-m460ex-20160609-202426.jffs2
Using ppc_4xx_eth0 device
TFTP from server 172.17.255.118; our IP address is 172.17.255.120
Filename 'MLNX-OS_PPC_M460EX_3.6.1002/bootstrap-mlnx-os-image-ppc-m460ex-20160609-202426.jffs2'.
Load address: 0x400000
Loading: ################################################################# 
         #################################################################
         ###############################
done
Bytes transferred = 2359296 (240000 hex)
=> nand erase clean 0 1FFFFFFF

NAND erase: device 0 offset 0x0, size 0x1fffffff
Skipping bad block at  0x07040000                                            
Erasing at 0x1ffe0000 -- 100% complete. Cleanmarker written at 0x1ffe0000.
OK
=> nand write ${fileaddr} 0 ${filesize}

NAND write: device 0 offset 0x0, size 0x240000
 2359296 bytes written: OK
=>

There was a single bad block of NAND reported on this switch. Small numbers of bad blocks in the NAND area are OK. The U-Boot tftp command filled out the fileaddr and filesize variables for us. Note that we cleared 512Mibytes of the flash storage with the nand erase ... command, but the file we transferred was only about 2.25Mibytes in size. Next up are 2 minor changes to the kernel command lines:

=> printenv image_kernel_args jffs2_args
## Error: "image_kernel_args" not defined
jffs2_args=setenv bootargs root=${rootdev} rootfstype=jffs2 ro reset_button=${reset_button} ${image_kernel_args} ${extra_args}
=> setenv rootdev /dev/mtdblock6
=> setenv image_kernel_args loglevel=2
=> setenv jffs2_args "setenv bootargs root=${rootdev} rootfstype=jffs2 rw reset_button=${reset_button} ${image_kernel_args} ${extra_args}"
=>

And, to boot it up:

=> run mlxlinux

(none) login: root
Password: changeme
login[859]: root login on 'ttyS0'
switch-bootstrap:~# 

And there's a root shell prompt in a very basic Linux OS running on the switch. The jiggering with getty in the inittab lets us have a fully configured terminal to play around with. Job control and having Ctrl-C are both very nice things.

Putting MLNX-OS on the switch

There is now a very basic Linux OS installed and bootable (with a little manual jiggering) Next up is getting MLNX-OS copied to one of the flash areas and some tweaking done.

We're going to put the MLNX-OS root filesystem in the "ROOT_2" flash partition -- cat /proc/mtd will list these out for us. We'll also be taking over the CONFIG and VAR partitions. Note that the flash layout information comes from the flattened device tree structure we pulled from the MLNX-OS distribution a long time ago.

We're going to need some network

(at least) 2 options here. First one is to set things by hand:

switch-bootstrap:/# ip link set up dev eth0
switch-bootstrap:/# ip addr add 172.16.10.80/24 dev eth0
switch-bootstrap:/# ip route add default via 172.16.10.3

Alternatively, you can do some DHCP:

switch-bootstrap:/# modprobe af_packet
switch-bootstrap:/# ip link set up dev eth0
switch-bootstrap:/# udhcpc -i eth0
DHCP client started on eth0
Sending discover...
Sending select for 172.16.10.80...
Lease of 172.16.10.80 obtained, lease time 3600
switch-bootstrap:/#  

Cleaning out the flash

Erase the MTD devices:

switch-bootstrap:/# flash_eraseall -j /dev/mtd7
Erasing 128 Kibyte @ 1ffe0000 -- 99 % complete. Cleanmarker written at 1ffe0000.
switch-bootstrap:/# flash_eraseall -j /dev/mtd8
Erasing 128 Kibyte @ 63e0000 -- 99 % complete. Cleanmarker written at 63e0000.
switch-bootstrap:/# flash_eraseall -j /dev/mtd9
Erasing 128 Kibyte @ 13f60000 -- 37 % complete. Cleanmarker written at 13f60000.
Skipping bad block at 0x13f80000
Erasing 128 Kibyte @ 2b040000 -- 80 % complete. Cleanmarker written at 2b040000.
Skipping bad block at 0x2b060000
Erasing 128 Kibyte @ 35be0000 -- 99 % complete. Cleanmarker written at 35be0000.
switch-bootstrap:/#

Small numbers of bad blocks are expected and nothing to worry about.

Make some directories where we can deposit mount the soon-to-be-create filesystems:

switch-bootstrap:/# mkdir -m 755 /mnt
switch-bootstrap:/# mkdir -m 0 /mnt/root2

And mount the flash there. (JFFS2 magic: No mkfs step is needed. Which is a lie. flash_eraseall took care of that for us.)

switch-bootstrap:/# mount -t jffs2 /dev/mtdblock7 /mnt/root2
switch-bootstrap:/# mkdir -m 0 /mnt/root2/config /mnt/root2/var
switch-bootstrap:/# mount -t jffs2 /dev/mtdblock8 /mnt/root2/config
switch-bootstrap:/# mount -t jffs2 /dev/mtdblock9 /mnt/root2/var
switch-bootstrap:/# busybox df -m /dev/mtdblock[789]
Filesystem           1M-blocks      Used Available Use% Mounted on
/dev/mtdblock7             512        11       501   2% /mnt/root2
/dev/mtdblock8             100         2        98   2% /mnt/root2/config
/dev/mtdblock9             860        18       842   2% /mnt/root2/var
switch-bootstrap:/#

MLNX-OS preparation

Way up above, we downloaded a MLNX-OS distribution from Mellanox's web server. The kernel and device tree and busybox were all extracted from it. We want to put the whole shebang on the switch so we can do all the cool things.

Extract a working version of the OS image like so:

$ wget http://www.mellanox.com/downloads/Software/image-PPC_M460EX-3.6.1002.img
[...]
$ mkdir Mlnx-OS-3.6.1002
$ cd Mlnx-OS-3.6.1002
$ unzip -p ../image-PPC_M460EX-3.6.1002.img image-PPC_M460EX-ppc-m460ex-20160609-202426.tgz | gzip -dc | sudo tar xvvpf - --numeric-owner
[...]
$ 

tar is run as root so that device nodes and permissions are restored correctly at extract time. --numeric-owner tells tar not to translate the Mlnx-OS user and group names to the host OS's equivalents.

Let's save some space!

Skip this for now

26% or so. Observe:

$ du -sm .
886     .
$ sudo find . -depth -type f -perm -111 -print | xargs -n1 sudo strip --strip-unneeded --preserve-dates --strip-debug
[...]
$ sudo cp -pr ../Mlnx-OS-3.6.1002.orig/lib/modules/* lib/modules/
$ du -sm .
840     .
$

It's too big for the 512Mibyte flash partition, but we'll take care of that later on when we create the JFFS2 image for it.

MLNX-OS hacks

Wherein we'll change a few things in the MLNX-OS image before putting it onto the switch...

/etc/fstab change

Fix up the fstab entries for /, /var, and /config to just use the /dev/mtdblockN device names instead of filesystem labels. Additionally, we don't have swap and don't care about the bootloader files... Also also, we're using JFFS2 instead of ext3 for the filesystem type:

/dev/mtdblock7        /                   jffs2     defaults,noatime     1 1
# LABEL=BOOT_1        /boot               ext3      defaults,noatime     1 2
# LABEL=BOOTMGR       /bootmgr            ext3      defaults,noatime     1 2
/dev/mtdblock8        /config             jffs2     defaults,noatime     1 2
/dev/mtdblock9        /var                jffs2     defaults,noatime     1 2
# LABEL=SWAP_1        swap                swap      defaults,noatime     0 0
tmpfs                 /dev/shm            tmpfs     defaults             0 0
devpts                /dev/pts            devpts    gid=5,mode=620       0 0
sysfs                 /sys                sysfs     defaults             0 0
proc                  /proc               proc      defaults             0 0
/dev/cdrom            /mnt/cdrom          iso9660   noauto,ro            0 0
/dev/fd0              /mnt/floppy         auto      noauto               0 0
VPI enablement

The VPI functionality is governed by a trio of programs that live in /opt/tms/bin/. Putting the patched versions of chad, hwd, and ibd will make sure that's working as we want it to.

$ sudo install -o root -g root -m 755 --backup --verbose ../customized-binaries/{chad,hwd,ibd} opt/tms/bin/
'../customized-binaries/chad' -> 'opt/tms/bin/chad' (backup: 'opt/tms/bin/chad~')
'../customized-binaries/hwd' -> 'opt/tms/bin/hwd' (backup: 'opt/tms/bin/hwd~')
'../customized-binaries/ibd' -> 'opt/tms/bin/ibd' (backup: 'opt/tms/bin/ibd~')
$
Ethernet name mangling

Small tweaks here to get the eth0 and eth1 interfaces named mgmt0 and mgmt1 as MLNX-OS sees them. This could be done with the switch's configuration database somehow, but we haven't got that reverse-engineered yet. So for now, let us be expedient instead of perfect.

Create an /etc/mactab similar to the following. Your switch's Ethernet interfaces' MAC addresses are available by running ip link list from the bootstrap OS.

mgmt0   00:02:c9:64:16:fc
mgmt1   00:02:c9:64:16:fd

And some updates to the rename_ifs init script and needed, too. Apply the following patch to it:

--- etc/rc.d/init.d/rename_ifs.mlnx-os.orig     2016-06-09 11:43:05.000000000 -0600
+++ etc/rc.d/init.d/rename_ifs  2019-01-28 10:30:01.228780822 -0700
@@ -379,7 +379,8 @@
 start() {
         echo $"Running renaming interfaces"
 
-        do_rename_ifs
+        # do_rename_ifs
+        ${NAMEIF}
 
         [ $RETVAL -eq 0 ] && touch /var/lock/subsys/rename_ifs
         return $RETVAL
customer_rootflop.sh needs changes, too

Apply this diff to it:

--- etc/customer_rootflop.sh.mlnx-os.orig       2016-09-28 12:16:17.000000000 -0600
+++ etc/customer_rootflop.sh    2019-02-15 14:24:55.097140577 -0700
@@ -1561,7 +1561,8 @@
 
         oldifs=$IFS
         IFS=`echo \n`
-        system_profile=`echo $out | grep FEATURE_EN_1: | awk {'print $NF'}`
+        # system_profile=`echo $out | grep FEATURE_EN_1: | awk {'print $NF'}`
+        system_profile="3"
         IFS=$oldifs
         mlx_check_err 0 "$system_profile" "cant parse system profile"
 
@@ -1569,7 +1570,8 @@
 
         oldifs=$IFS
         IFS=`echo \n`
-        num_swids=`echo $out | grep FEATURE_EN_10: | awk {'print $NF'}`
+        # num_swids=`echo $out | grep FEATURE_EN_10: | awk {'print $NF'}`
+        num_swids="0"
         IFS=$oldifs
         mlx_check_err 0 "$num_swids" "cant parse number of swids"
Root logins are good!

The distribution's /etc/shadow file is a symlink to one maintained by an proprietary tool. Break the symlink and set root's password to the same as admin's. (admin)

$ sudo rm etc/shadow
# it's a symlink
$ sudo tee etc/shadow << __EOF__
root:$6$NFLgdAQn$eZXt2gnpaJsxf3Hy5OwUoX.0yAw6QBVtyvZL48YmEDGtNI6zijqqyBnqeC10DmWb.WghOBjgAOtbivx9C5ZUL/:10000:0:99999:7:::
admin:$6$NFLgdAQn$eZXt2gnpaJsxf3Hy5OwUoX.0yAw6QBVtyvZL48YmEDGtNI6zijqqyBnqeC10DmWb.WghOBjgAOtbivx9C5ZUL/:10000:0:99999:7:::
apache:*:10000:0:99999:7:::
avahi:!!:10000:0:99999:7:::
dbus:!!:10000:0:99999:7:::
haldaemon:!!:10000:0:99999:7:::
monitor:$6$NlsGJ1Mh$ZAJmr0o4/8ZsG5r9L/W0PA9u3dPC6WL4/DDkkpDPnyQbqeCLoRqyH4X35HHD2AxGQORKCs58bB/FjnPhunril0:10000:0:99999:7:::
nfsnobody:!!:10000:0:99999:7:::
nobody:*:10000:0:99999:7:::
ntp:*:10000:0:99999:7:::
pcap:*:10000:0:99999:7:::
qemu:!!:10000:0:99999:7:::
rpc:!!:10000:0:99999:7:::
rpcuser:!!:10000:0:99999:7:::
sshd:*:10000:0:99999:7:::
statsd:*:10000:0:99999:7:::
tcpdump:!!:10000:0:99999:7:::
vcsa:!!:10000:0:99999:7:::
xmladmin:$6$nHVLuh/.$nkTB77KylkvyyjnHlfKiLzEJvzOCM2PWYLHuyV/grWi417KfCmZ0C2maEua8amzfe8P/Np3M32dbSEnrVmlsD0:10000:0:99999:7:::
xmluser:$6$Z9Nazq9n$fPXUf.qAIDvisF6cAyXYje1OueJwtJMTjYcnhVxYASxL8jpcOZG3G4dXBxfON3BNB.8lDWaNtSqAKN23RRX6z1:10000:0:99999:7:::
__EOF__
$ sudo chmod 600 etc/shadow
$ sudo chown 0:0 etc/shadow
Ensure our nice, unlocked bootloader doesn't get replaced

Apply this patch to sbin/aiset.sh

--- sbin/aiset.sh~      2016-09-25 07:38:19.000000000 -0600
+++ sbin/aiset.sh       2019-02-23 05:43:06.339255148 -0700
@@ -1,4 +1,5 @@
 #!/bin/sh
+exit 0
 
 #
 #  Filename:  $Source: /windy/home/scm/CVS_TMS/src/base_os/common/script_files/aiset.sh,v $
Make layout_settings.sh empty
$ sudo cp /dev/null etc/layout_settings.sh
$
Customize your image_layout.sh

This file was provided by a nice person across the ocean.

$ sudo install -o root -g root -m 755 --backup --verbose ../customized-binaries/image_layout.sh etc/
Two more empty files
$ sudo cp /dev/null etc/.firstboot
$ sudo cp /dev/null var/opt/tms/.firstmfg
$
Break out config and var filesystems
$ sudo mv config ../Mlnx-OS-3.6.1002-config
$ sudo mv var ../Mlnx-OS-3.6.1002-var
$ sudo mkdir -m 0 config var

These will each be going onto their own JFFS2 filesystems to be flashed individually to the switch.

Put MLNX-OS on the switch

Create JFFS2 images

One for each of the flash partitions we'll be using: /, /config, and /var.

$ sudo /usr/sbin/mkfs.jffs2 --verbose --eraseblock=0x20000 --pagesize=2048 --pad --no-cleanmarkers --big-endian --root=Mlnx-OS-3.6.1002 -m size -X zlib --output=mlnx-os-image-ppc-m460ex-3.6.1002-20160609-202426-root.jffs2
$ sudo /usr/sbin/mkfs.jffs2 --verbose --eraseblock=0x20000 --pagesize=2048 --pad --no-cleanmarkers --big-endian --root=Mlnx-OS-3.6.1002-config -m size -X zlib --output=mlnx-os-image-ppc-m460ex-3.6.1002-20160609-202426-config.jffs2
$ sudo /usr/sbin/mkfs.jffs2 --verbose --eraseblock=0x20000 --pagesize=2048 --pad --no-cleanmarkers --big-endian --root=Mlnx-OS-3.6.1002-var -m size -X zlib --output=mlnx-os-image-ppc-m460ex-3.6.1002-20160609-202426-var.jffs2

Remember how the unpacked image was ~850Mibytes? Well, our JFFS2 image for / is now down to ~370Mibytes:

$ ls -larth *1002*jffs2
-rw-r--r-- 1 root root 467M Feb 25 10:48 mlnx-os-image-ppc-m460ex-3.6.1002-20160609-202426-root.jffs2
-rw-r--r-- 1 root root 128K Feb 25 10:48 mlnx-os-image-ppc-m460ex-3.6.1002-20160609-202426-config.jffs2
-rw-r--r-- 1 root root 128K Feb 25 10:48 mlnx-os-image-ppc-m460ex-3.6.1002-20160609-202426-var.jffs2
$
Copy JFFS2 images to the switch

Back on the switch now...

switch-bootstrap:/# cd /tmp
switch-bootstrap:/# scp -S /bin/ssh user@172.17.0.16:proj/mellanox-sx6018/mlnx-os-image-ppc-m460ex-3.6.8010*jffs2 .
user@172.17.0.16's password: ****************
mlnx-os-image-ppc-m460ex-3.6.8010.20180820-180416-config.jffs2                                                                                                        100%  128KB 128.0KB/s   00:00    
mlnx-os-image-ppc-m460ex-3.6.8010.20180820-180416-root.jffs2                                                                                                          100%  370MB 755.2KB/s   08:21    
mlnx-os-image-ppc-m460ex-3.6.8010.20180820-180416-var.jffs2                                                                                                           100%  128KB 128.0KB/s   00:00    
switch-bootstrap:/tmp# 
And write them to the flash
switch-bootstrap:/tmp# nandwrite -q -p /dev/mtd7 /tmp/mlnx-os-image-ppc-m460ex-3.6.8010.20180820-180416-root.jffs2
switch-bootstrap:/tmp# nandwrite -q -p /dev/mtd9 /tmp/mlnx-os-image-ppc-m460ex-3.6.8010.20180820-180416-var.jffs2

Also write the config image if this is your first time...

switch-bootstrap:/tmp# nandwrite -q -p /dev/mtd8 /tmp/mlnx-os-image-ppc-m460ex-3.6.8010.20180820-180416-config.jffs2
Use switch's TFTP client to suck it down

First we gotta get our switch's IP layer going.

# ip link set up dev eth0
# ip addr add 172.16.10.81/24 dev eth0
# ip route add default via 172.16.10.3
# 

Suck down and unpack the tarball

# cd /mnt/root2/var
# tftp -g -r mellanox-sx6018/Mlnx-OS-3.6.1002.tar.gz 172.17.0.16
# cd ..
# gzip -dc var/Mlnx-OS-3.6.1002.tar.gz | tar xvvpf -
[...]
#

Switch configuration database entries

Run these in the switch before rebooting.

# cd /mnt/root2/opt/tms/bin/
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/cluster/config/enable bool false
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/cluster/config/expected_nodes uint16 0
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/cluster/config/id string ""
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/cluster/config/interface string mgmt0
# LD_LIBRARY_PATH=/mnt/root2/lib:/mnt/root2/usr/lib ../../../usr/bin/openssl rand -hex 24
WARNING: can't open config file: /etc/pki/tls/openssl.cnf
aa1115b42438bc122ffc4a3c346abd2c41889ade78461c2b
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/cluster/config/shared-secret string aa1115b42438bc122ffc4a3c346abd2c41889ade78461c2b
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/net/interface/config/mgmt0/addr/ipv4/dhcp bool true
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/system/hostid string MT1311X05279
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/system/hostname string mellanox-sx6018-1
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/system/hwname string M460EX
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/system/layout string MFL1
# ./mddbreq -c /mnt/root2/config/mfg/mfdb set modify - /mfg/mfdb/system/model string ppc

Reboot into MLNX-OS

Unmount the JFFS2 filesystems, reboot, tell the kernel to run the stuff in /dev/mtdblock7 instead of /dev/mtdblock6. Profit! If things are correct here, the switch will DHCP an address on its mgmt0 interface. Still running on the switch...

# cd /
# umount /mnt/root2/var
# umount /mnt/root2/config
# umount /mnt/root2
# sync
# reboot -f
[...]
U-Boot 2009.01 SX_PPC_M460EX SX_3.2.0330-82-EMC ppc (Feb 27 2013 - 12:13:42)
[...]
Hit any key to stop autoboot:  0 
=> setenv rootdev /dev/mtdblock7
=> setenv image_kernel_args loglevel=2
=> setenv jffs2_args "setenv bootargs root=${rootdev} rootfstype=jffs2 rw reset_button=${reset_button} ${image_kernel_args} ${extra_args}"
=> run mlxlinux

Update switch ASIC firmware

The EMC switch is an unmanaged switch. Meaning it has no subnet manager functionality. MLNX-OS needs more smarts in the SwitchX ASIC's brain.

Boot the switch into the EMC OS

We need this to get the SwitchX ASIC up and running. So that a client system can connect to it. So that we can put new firmware onto the ASIC.

Pre-reqs

  • Get a machine with mstflint installed. For Debian systems, do sudo apt-get install mstflint. We also need some more, low level Infiniband tools to make the flashing process work, so sudo apt-get install ibutils infiniband-diags opensm those while we're at it.

Back up the existing SwitchX ASIC firmware

Backups are good!

Let's see how our connection to the switch is looking...

server-with-IB-card$ sudo ibstat
CA 'mlx4_0'
	CA type: MT4099
	Number of ports: 2
	Firmware version: 2.40.5030
	Hardware version: 0
	Node GUID: 0x0002c903003e2180
	System image GUID: 0x0002c903003e2183
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 1
		LMC: 0
		SM lid: 1
		Capability mask: 0x0251486a
		Port GUID: 0x0002c903003e2181
		Link layer: InfiniBand
	Port 2:
		State: Down
		Physical state: Polling
		Rate: 10
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x0251486a
		Port GUID: 0x0002c903003e2182
		Link layer: InfiniBand
server-with-IB-card$ 

Port 1 of our card has a 56Gbps connection to the switch. That's promising. :)

The opensm subnet manager was started as soon as it was installed (that's a Debian thing). So let's see if we can see the switch on the network:

server-with-IB-card$ sudo ibswitches 
Switch	: 0x0002c90300e404e0 ports 18 "SwitchX -  Mellanox Technologies" base port 0 lid 2 lmc 0
server-with-IB-card$

Note here that the switch has been given Lid 2 here.

Query firmware on the switch ASIC. We address it by Lid, which isn't assigned without a subnet manager:

server-with-IB-card$ sudo mstflint -d lid-0x2 q full
Image type:          FS2
FW Version:          9.9.1260
FW Release Date:     5.6.2014
MIC Version:         1.5.0
Device ID:           51000
Description:         Node             Sys image
GUIDs:               0002c90300e404e0 0002c90300e404e0 
Description:         Base             Switch
MACs:                    0002c9e404e0     0002c9e40540
VSD:                 n/a
PSID:                EMC1240110020
server-with-IB-card$ 

Back up the SwitchX ASIC firmware image to somewhere safe:

server-with-IB-card$ sudo mstflint -d lid-0x2 ri EMC1240110020.fw
server-with-IB-card$ mstflint -i EMC1240110020.fw q full
Image type:          FS2
FW Version:          9.9.1260
FW Release Date:     5.6.2014
MIC Version:         1.5.0
Device ID:           51000
Description:         Node             Sys image
GUIDs:               0002c90300e404e0 0002c90300e404e0 
Description:         Base             Switch
MACs:                    0002c9e404e0     0002c9e40540
VSD:                 n/a
PSID:                EMC1240110020
server-with-IB-card$

Burn the SwitchX firmware that MLNX-OS needs:

server-with-IB-card$ sudo mstflint --allow_psid_change -i MT_1240212020.fw -d lid-0x2 b

    Current FW version on flash:  9.9.1260
    New FW version:               9.3.8170

    Note: The new FW version is older than the current FW version on flash.

 Do you want to continue ? (y/n) [n] : y


    You are about to replace current PSID on flash - "EMC1240110020" with a different PSID - "MT_1240212020".
    Note: It is highly recommended not to change the PSID.

 Do you want to continue ? (y/n) [n] : y
Burning FS2 FW image without signatures - OK  
Restoring signature                     - OK
server-with-IB-card$

One last reboot (I promise!)

Now tell the switch to start MLNX-OS and watch it go! Interrupt the autoboot sequence and tell it to run mlxlinux. MLNX-OS should start. The Infiniband port should come to life. The bits will flow!

Switch configuration

That's a whole other subject. Go see Mlnx-OS switch configuration