Another I.T. blog: IOS Upgrade on Cisco WS-C4507R Chassis with Dual Supervisor V Engines

Today we will upgrade the IOS version on both WS-X4516 supervisor engines V in a WS-C4507R chassis. This blog post assumes that your 4507R chassis's supervisor engine already has network support for you to SSH into it.

First, go to the Cisco support site and download the latest IOS version (you need a Cisco support contract to have access to new IOS images). Place this image on your TFTP server. In this example, the TFTP server is a CentOS Linux machine called alice.company.com.

scp ~/Downloads/cat4500-entservicesk9-mz.150-2.SG5.bin alice.company.com:/tmp
ssh alice.company.com

If you don't have a TFTP server installed, then set one up.

sudo yum -y install tftp-server
sudo vi /etc/xinetd.d/tftpd
sudo chkconfig xinetd on
chkconfig --list xinetd
sudo mkdir -p /tftpboot/ios

Now move the IOS image into the /tftpboot/ios directory.

sudo mv /tmp/cat4500-entservicesk9-mz.150-2.SG5.bin /tftpboot/ios

(Optional) Create a symbolic link to the image so that you can remember which hardware it's for. When your site has several different Cisco models, those symbolic links can be handy!

sudo ln -s /tftpboot/ios/cat4500-entservicesk9-mz.150-2.SG5.bin /tftpboot/ios/ws-c4507r

Check the file size of the new IOS image?

du -sb /tftpboot/ios/cat4500-entservicesk9-mz.150-2.SG5.bin

19458176 /tftpboot/ios/cat4500-entservicesk9-mz.150-2.SG5.bin

That means we need 19458176 bytes of free space on both supervisor engines internal flash storage. To check this, we must connect to the chassis's IP address and check the remaining space.

ssh 172.16.1.1
switch> enable

switch# show version | inc bytes of memory

cisco WS-C4507R (MPC8245) processor (revision 4) with 524288K bytes of memory.

We appear to have enough space. If we didn't, we could check the flash content and erase some old IOS images. For example, let's pretend there is the old cat4000-i9s-mz.122-20.EW3.bin IOS file on the bootflash. We could remove it like this :

switch# delete bootflash:cat4000-i9s-mz.122-20.EW3.bin

Check the redundancy status of both supervisor engines.

switch# show redundancy states
my state = 13 -ACTIVE
peer state = 4 -STANDBY COLD
Mode = Duplex
Unit = Primary
Unit ID = 1

Redundancy Mode (Operational) = RPR
Redundancy Mode (Configured) = RPR
Redundancy State = RPR
Maintenance Mode = Disabled
Manual Swact = enabled
Communications = Up

client count = 35
client_notification_TMR = 240000 milliseconds
keep_alive TMR = 9000 milliseconds
keep_alive count = 1
keep_alive threshold = 18
RF debug mask = 0x0

As we can see, we are running in RPR redundancy mode (or Route Processor Redundancy mode). This means that if the active supervisor engines fails or reloads, all ports will loose connection for several minutes while they synchronize with the other supervisor engine. That is not super duper.

Fortunately, there is another redundancy mode which offers a faster switchover. This mode is called Stateful SwitchOver or SSO for short. When the supervisors are running in SSO redundancy mode, the switch will keep working fine with layer 2 during a supervisor switchover, but all layer 3 connections will loose the neighbor relationship for 50 milliseconds because they are synchronized while they are running.

So let's change our redundancy mode from RPR to SSO.

switch# conf terminal

switch(config)# redundancy

switch(config-red)# mode sso

Changing to sso mode will reset the standby. Do you want to continue?[confirm]

As you can see, when we do this, the standby supervisor engine will reload. So be sure to check this supervisor's console output. Once the standby engine is back online, double-check the redundancy mode again.

switch# sh redundancy states

my state = 13 -ACTIVE

peer state = 4 -STANDBY COLD

Mode = Duplex

Unit = Primary

Unit ID = 1

Redundancy Mode (Operational) = RPR

Redundancy Mode (Configured) = Stateful Switchover

Redundancy State = RPR

Maintenance Mode = Disabled

Manual Swact = enabled

Communications = Up

client count = 35

client_notification_TMR = 240000 milliseconds

keep_alive TMR = 9000 milliseconds

keep_alive count = 1

keep_alive threshold = 18

RF debug mask = 0x0

Ok, so now only the standby supervisor engine is running in SSO mode. That's because we can't be in SSO mode on both supervisor engines without a reload of the active one. Which means that the standby has to take over. During that supervisor switchover, there will be a layer 2 downtime and a layer 3 downtime of about 1 to 3 minutes depending on the amount of configured ports.

So, in order to continue with the IOS upgrade, you need to schedule a network maintenance!

Download the new IOS to both supervisor engines.

switch# copy tftp:/ios/cat4500-entservicesk9-mz.150-2.SG5.bin bootflash:

Address or name of remote host [172.16.1.33]?
Source filename [/ios/cat4500-entservicesk9-mz.150-2.SG5.bin]?
Destination filename [cat4500-entservicesk9-mz.150-2.SG5.bin]?

The file will be transfered from the TFTP server into the active supervisor engine's internal flash storage. You will see a series of exclamation points (one for each packet) and then a few captial C letters. The C letters are showned when IOS verifies the new IOS image. You can do it manually with the verify command if you prefer.

While we're working with our TFTP server, we should make a backup of our configuration.

switch# copy run start
switch# copy start tftp

Next, copy the IOS to the standby supervisor engine.

switch# copy bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin slavebootflash:

Now check to see if both supervisor engines have the new IOS image?

switch# dir bootflash:

Directory of bootflash:/

2 -rwx 19458176 Sep 24 2012 13:08:32 -04:00 cat4500-entservicesk9-mz.150-2.SG5.bin

59244544 bytes total (26308040 bytes free)

switch# dir slavebootflash:
Directory of slavebootflash:/

1 -rwx 13478072 Jun 29 2006 11:02:01 -04:00 cat4500-entservicesk9-mz.122-31.SG.bin
3 -rwx 19458176 Sep 24 2012 13:52:19 -04:00 cat4500-entservicesk9-mz.150-2.SG5.bin

59244544 bytes total (6849736 bytes free)

We must make sure that configuration changes from the active supervisor engine is properly transfered to the standby one.

switch# conf terminal
switch(config)# redundancy
switch(config-red)# main-cpu
switch(config-r-mc)# auto-sync standard
switch(config-r-mc)# end

Now manually force a resynchronization of the supervisor engines.

switch# copy run start

Check your syslog output. When the synchronization occurs, you should see lines like these :

Sep 24 14:13:04 c4507r 230: 000258: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The bootvar has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 231: 000259: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The config-reg has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 232: 000260: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The startup-config has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 233: 000261: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The private-config has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 234: 000262: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC_RATELIMIT: The vlan database has been successfully synchronized to the standby supervisor

Prepare the switch to boot the new IOS image.

switch# config terminal
switch(config)# config-register 0x2
switch(config)# boot system flash bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin

switch(config)# end

switch# copy running-config start-config

We can now update the IOS image on the standby supervisor image. This can be done anytime because there is no traffic disruption. To see the standby supervisor upgrade messages, make sure to connect to the standby supervisor's console port.

switch# redundancy reload peer

This will reload the standby supervisor engine. You will see these syslog messages from the active one :

Sep 24 14:11:07 c4507r 224: 000249: Sep 24 14:11:06: %C4K_REDUNDANCY-5-CONFIGSYNC: The startup-config has been successfully synchronized to the standby supervisor

Sep 24 14:11:07 c4507r 225: 000250: Sep 24 14:11:07: %C4K_REDUNDANCY-5-CONFIGSYNC: The private-config has been successfully synchronized to the standby supervisor

Sep 24 14:11:17 c4507r 226: 000251: Sep 24 14:11:16: %C4K_REDUNDANCY-3-COMMUNICATION: Communication with the peer Supervisor has been lost

Sep 24 14:11:17 c4507r 227: 000252: Sep 24 14:11:16: %C4K_REDUNDANCY-3-SIMPLEX_MODE: The peer Supervisor has been lost

At the standby supervisor's console, you will see the upgrade messages. Once the new IOS is up and running on the standby supervisor, you should see this in your syslog server :

Sep 24 14:13:03 c4507r 228: 000256: Sep 24 14:13:02: %C4K_REDUNDANCY-2-IOS_VERSION_CHECK_FAIL: IOS version mismatch. Active supervisor version is 12.2(31)SG,. Standby supervisor version is 15.0(2)SG5,. Redundancy feature may not work as expected.

Sep 24 14:13:03 c4507r 229: 000257: Sep 24 14:13:03: %C4K_REDUNDANCY-3-COMMUNICATION: Communication with the peer Supervisor has been established

As the messages tell you, the IOS version is not the same on both supervisor engines. That's normal because we haven't updated the active one yet. We can check the version of both engines with this :

switch# show module

M MAC addresses Hw Fw Sw Status

--+--------------------------------+---+------------+----------------+---------

1 0017.e0fa.58c0 to 0017.e0fa.58c1 4.0 12.2(20r)EW1 12.2(31)SG Ok

2 0017.e0fa.58c2 to 0017.e0fa.58c3 4.0 12.2(20r)EW1 15.0(2)SG5 Ok

We can see that module in slot 2 (the standby supervisor) is running version 15.0(2)SG5 while the active module in slot 1 is running version 12.2(31)SG.

To upgrade the active supervisor engine, we must issue the following. Make sure you're connected to the active supervisor's console port when you issue this command. THIS COMMAND WILL CAUSE A LAYER 2 AND LAYER 3 NETWORK OUTAGE! So make sure you do this on a scheduled maintenance.

switch# redundancy force-switchover

The active supervisor will reload, forcing a supervisor engine switchover. This will make the standby supervisor engine take control of the chassis. During the switchover, the supervisor that used to be the active one reloads into the new IOS version. After a few minutes, we can see that they are both running the same IOS version :

ssh 172.16.1.1
switch> enable

switch# sh mod | inc 15.0
1 0017.e0fa.58c0 to 0017.e0fa.58c1 4.0 12.2(20r)EW1 15.0(2)SG5 Ok
2 0017.e0fa.58c2 to 0017.e0fa.58c3 4.0 12.2(20r)EW1 15.0(2)SG5 Ok

We can also see that our redundancy has changed for both engines from RPR to SSO.

switch# sh redundancy states
my state = 13 -ACTIVE
peer state = 8 -STANDBY HOT
Mode = Duplex
Unit = Secondary
Unit ID = 2

Redundancy Mode (Operational) = Stateful Switchover
Redundancy Mode (Configured) = Stateful Switchover
Redundancy State = Stateful Switchover
Maintenance Mode = Disabled
Manual Swact = enabled
Communications = Up

client count = 60
client_notification_TMR = 240000 milliseconds
keep_alive TMR = 9000 milliseconds
keep_alive count = 1
keep_alive threshold = 18
RF debug mask = 0x0

That's it! We now have both supervisor engines running the new IOS version and the statefull switchover redundancy mode.

Troubleshooting

Standby Supervisor Engine Reload Loop

Sometimes the standby supervisor might go into a reload loop because of this message :

Current BOOT file is --- flash:cat4500-entservicesk9-mz.150-2.SG5.bin

Invalid filename flash:cat4500-entservicesk9-mz.150-2.SG5.bin. It must begin with device name.

You are then presented with a ROMMON prompt. Hit Ctrl-C to prevent a reload. Then, at the prompt, issue a boot command to load the IOS.

rommon 1 > boot bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin

The standby supervisor engine will boot the IOS. You can then check your configuration and fix the problem.

switch# show running-config | inc boot system

boot system flash flash:cat4500-entservicesk9-mz.150-2.SG5.bin

boot system flash bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin

First line is our problem. It says to boot from flash: instead of bootflash: like the second line. So to fix our problem, we only need to remove that first line.

switch# conf terminal

switch(config)# no boot system flash flash:cat4500-entservicesk9-mz.150-2.SG5.bin

switch(config)# end

switch# copy run start

switch# sh run | inc boot system

boot system flash bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin

There, that should do it. Now issue the standby supervisor reload again and see how it goes?

switch# redundancy reload peer

TFTP Configuration Backup Error

When we backup the switch's configuration to our TFTP server, we might have an access denied. That's because our TFTP server doesn't have the right to write into the /tftpboot directory. A simple, yet not very secure way to fix this is to change the permissions right before we do the config backup. Then place it back to what it was.

sudo chown o+rwx /tftpboot

switch# copy start tftp

sudo chmod o-w /tftpboot

References

Configuring Supervisor Engine Redundancy on the Catalyst 4507R and Catalyst 4510R Switches

Configuring Supervisor Engine Redundancy using RPR and SSO
Catalyst 4500 Series Switch Software Configuration Guide, 15.0(2)SG

13 comments:

Anonymous12 March, 2013 05:17
Thank you so much David.

Following your tutorial, I was able to successfully replace a SUP and everything worked as a charm.

Keep up the good work :)

Cheers,
Chris
Anonymous17 May, 2013 12:31
Thanks for this guide. I found it very useful and it really helped me upgrade of two (old) 4500's

Brian

Anonymous09 September, 2013 11:11
Hello David,
I have a question. My 4507 currently is running version 12.2(50)SG8. I see version 15.0.2SG3 with the feature that I need. I was wondering if I could go and install this version. Since it is 15.X do I have to have a licence to activate it? Your comments and help will be greatly appreciated.

Best Regards,
Robert
Anonymous12 May, 2015 07:50
Thanks you for documentation !

Best Regards,
Unknown26 August, 2016 13:48
Thanks it's helpful..
Sandeep18 October, 2016 03:07
Hi David. Many thanks for this article. Very well written. I had a query regarding upgrading the ROMMON and IOS at the same time for a 4510 with SSO. Have you come across that scenario before, & is it possible to reload the peer and have it come back up with the new ROMMON and IOS and then do a redundancy switch-over? And is this truly a "hitless" change? I am quite sure that there has to be a layer 2 and or layer 3 disconnection. But I keep reading "zero downtime" upgrades which confuses me.

Note: Only a member of this blog may post a comment.

Thursday, September 27, 2012

IOS Upgrade on Cisco WS-C4507R Chassis with Dual Supervisor V Engines

Troubleshooting

Standby Supervisor Engine Reload Loop

TFTP Configuration Backup Error

References

13 comments: