First, go to the Cisco support site and download the latest IOS version (you need a Cisco support contract to have access to new IOS images). Place this image on your TFTP server. In this example, the TFTP server is a CentOS Linux machine called alice.company.com.
scp ~/Downloads/cat4500-entservicesk9-mz.150-2.SG5.bin alice.company.com:/tmp
ssh alice.company.com
If you don't have a TFTP server installed, then set one up.
sudo yum -y install tftp-server
sudo vi /etc/xinetd.d/tftpd
sudo chkconfig xinetd on
chkconfig --list xinetd
sudo mkdir -p /tftpboot/ios
Now move the IOS image into the /tftpboot/ios directory.
sudo mv /tmp/cat4500-entservicesk9-mz.150-2.SG5.bin /tftpboot/ios
(Optional) Create a symbolic link to the image so that you can remember which hardware it's for. When your site has several different Cisco models, those symbolic links can be handy!
sudo ln -s /tftpboot/ios/cat4500-entservicesk9-mz.150-2.SG5.bin /tftpboot/ios/ws-c4507r
Check the file size of the new IOS image?
du -sb /tftpboot/ios/cat4500-entservicesk9-mz.150-2.SG5.bin
19458176 /tftpboot/ios/cat4500-entservicesk9-mz.150-2.SG5.bin
ssh 172.16.1.1
switch> enable
switch# show version | inc bytes of memory
cisco WS-C4507R (MPC8245) processor (revision 4) with 524288K bytes of memory.
We appear to have enough space. If we didn't, we could check the flash content and erase some old IOS images. For example, let's pretend there is the old cat4000-i9s-mz.122-20.EW3.bin IOS file on the bootflash. We could remove it like this :
switch# delete bootflash:cat4000-i9s-mz.122-20.EW3.bin
Check the redundancy status of both supervisor engines.
switch# show redundancy states
my state = 13 -ACTIVE
peer state = 4 -STANDBY COLD
Mode = Duplex
Unit = Primary
Unit ID = 1
Redundancy Mode (Operational) = RPR
Redundancy Mode (Configured) = RPR
Redundancy State = RPR
Maintenance Mode = Disabled
Manual Swact = enabled
Communications = Up
client count = 35
client_notification_TMR = 240000 milliseconds
keep_alive TMR = 9000 milliseconds
keep_alive count = 1
keep_alive threshold = 18
RF debug mask = 0x0
As we can see, we are running in RPR redundancy mode (or Route Processor Redundancy mode). This means that if the active supervisor engines fails or reloads, all ports will loose connection for several minutes while they synchronize with the other supervisor engine. That is not super duper.
Fortunately, there is another redundancy mode which offers a faster switchover. This mode is called Stateful SwitchOver or SSO for short. When the supervisors are running in SSO redundancy mode, the switch will keep working fine with layer 2 during a supervisor switchover, but all layer 3 connections will loose the neighbor relationship for 50 milliseconds because they are synchronized while they are running.
So let's change our redundancy mode from RPR to SSO.
switch# conf terminal
switch(config)# redundancy
switch(config-red)# mode sso
Changing to sso mode will reset the standby. Do you want to continue?[confirm]
As you can see, when we do this, the standby supervisor engine will reload. So be sure to check this supervisor's console output. Once the standby engine is back online, double-check the redundancy mode again.
switch# sh redundancy states
my state = 13 -ACTIVE
peer state = 4 -STANDBY COLD
Mode = Duplex
Unit = Primary
Unit ID = 1
Redundancy Mode (Operational) = RPR
Redundancy Mode (Configured) = Stateful Switchover
Redundancy State = RPR
Maintenance Mode = Disabled
Manual Swact = enabled
Communications = Up
client count = 35
client_notification_TMR = 240000 milliseconds
keep_alive TMR = 9000 milliseconds
keep_alive count = 1
keep_alive threshold = 18
RF debug mask = 0x0
Ok, so now only the standby supervisor engine is running in SSO mode. That's because we can't be in SSO mode on both supervisor engines without a reload of the active one. Which means that the standby has to take over. During that supervisor switchover, there will be a layer 2 downtime and a layer 3 downtime of about 1 to 3 minutes depending on the amount of configured ports.
So, in order to continue with the IOS upgrade, you need to schedule a network maintenance!
So, in order to continue with the IOS upgrade, you need to schedule a network maintenance!
switch# copy tftp:/ios/cat4500-entservicesk9-mz.150-2.SG5.bin bootflash:
Address or name of remote host [172.16.1.33]?
Source filename [/ios/cat4500-entservicesk9-mz.150-2.SG5.bin]?
Destination filename [cat4500-entservicesk9-mz.150-2.SG5.bin]?
The file will be transfered from the TFTP server into the active supervisor engine's internal flash storage. You will see a series of exclamation points (one for each packet) and then a few captial C letters. The C letters are showned when IOS verifies the new IOS image. You can do it manually with the verify command if you prefer.
While we're working with our TFTP server, we should make a backup of our configuration.
switch# copy run start
switch# copy start tftp
Next, copy the IOS to the standby supervisor engine.
switch# copy bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin slavebootflash:
Now check to see if both supervisor engines have the new IOS image?
switch# dir bootflash:
Directory of bootflash:/
2 -rwx 19458176 Sep 24 2012 13:08:32 -04:00 cat4500-entservicesk9-mz.150-2.SG5.bin
59244544 bytes total (26308040 bytes free)
switch# dir slavebootflash:
Directory of slavebootflash:/
1 -rwx 13478072 Jun 29 2006 11:02:01 -04:00 cat4500-entservicesk9-mz.122-31.SG.bin
3 -rwx 19458176 Sep 24 2012 13:52:19 -04:00 cat4500-entservicesk9-mz.150-2.SG5.bin
59244544 bytes total (6849736 bytes free)
We must make sure that configuration changes from the active supervisor engine is properly transfered to the standby one.
switch# conf terminal
switch(config)# redundancy
switch(config-red)# main-cpu
switch(config-r-mc)# auto-sync standard
switch(config-r-mc)# end
Now manually force a resynchronization of the supervisor engines.
switch# copy run start
Check your syslog output. When the synchronization occurs, you should see lines like these :
Sep 24 14:13:04 c4507r 230: 000258: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The bootvar has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 231: 000259: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The config-reg has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 232: 000260: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The startup-config has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 233: 000261: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The private-config has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 234: 000262: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC_RATELIMIT: The vlan database has been successfully synchronized to the standby supervisor
Prepare the switch to boot the new IOS image.
switch(config)# config-register 0x2
switch(config)# boot system flash bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin
switch(config)# end
switch# copy running-config start-config
We can now update the IOS image on the standby supervisor image. This can be done anytime because there is no traffic disruption. To see the standby supervisor upgrade messages, make sure to connect to the standby supervisor's console port.
switch# redundancy reload peer
This will reload the standby supervisor engine. You will see these syslog messages from the active one :
Sep 24 14:11:07 c4507r 224: 000249: Sep 24 14:11:06: %C4K_REDUNDANCY-5-CONFIGSYNC: The startup-config has been successfully synchronized to the standby supervisor
Sep 24 14:11:07 c4507r 225: 000250: Sep 24 14:11:07: %C4K_REDUNDANCY-5-CONFIGSYNC: The private-config has been successfully synchronized to the standby supervisor
Sep 24 14:11:17 c4507r 226: 000251: Sep 24 14:11:16: %C4K_REDUNDANCY-3-COMMUNICATION: Communication with the peer Supervisor has been lost
Sep 24 14:11:17 c4507r 227: 000252: Sep 24 14:11:16: %C4K_REDUNDANCY-3-SIMPLEX_MODE: The peer Supervisor has been lost
At the standby supervisor's console, you will see the upgrade messages. Once the new IOS is up and running on the standby supervisor, you should see this in your syslog server :
Sep 24 14:13:03 c4507r 228: 000256: Sep 24 14:13:02: %C4K_REDUNDANCY-2-IOS_VERSION_CHECK_FAIL: IOS version mismatch. Active supervisor version is 12.2(31)SG,. Standby supervisor version is 15.0(2)SG5,. Redundancy feature may not work as expected.
Sep 24 14:13:03 c4507r 229: 000257: Sep 24 14:13:03: %C4K_REDUNDANCY-3-COMMUNICATION: Communication with the peer Supervisor has been established
As the messages tell you, the IOS version is not the same on both supervisor engines. That's normal because we haven't updated the active one yet. We can check the version of both engines with this :
switch# show module
M MAC addresses Hw Fw Sw Status
--+--------------------------------+---+------------+----------------+---------
1 0017.e0fa.58c0 to 0017.e0fa.58c1 4.0 12.2(20r)EW1 12.2(31)SG Ok
2 0017.e0fa.58c2 to 0017.e0fa.58c3 4.0 12.2(20r)EW1 15.0(2)SG5 Ok
We can see that module in slot 2 (the standby supervisor) is running version 15.0(2)SG5 while the active module in slot 1 is running version 12.2(31)SG.
To upgrade the active supervisor engine, we must issue the following. Make sure you're connected to the active supervisor's console port when you issue this command. THIS COMMAND WILL CAUSE A LAYER 2 AND LAYER 3 NETWORK OUTAGE! So make sure you do this on a scheduled maintenance.
switch# redundancy force-switchover
The active supervisor will reload, forcing a supervisor engine switchover. This will make the standby supervisor engine take control of the chassis. During the switchover, the supervisor that used to be the active one reloads into the new IOS version. After a few minutes, we can see that they are both running the same IOS version :
ssh 172.16.1.1
switch> enable
switch# sh mod | inc 15.0
1 0017.e0fa.58c0 to 0017.e0fa.58c1 4.0 12.2(20r)EW1 15.0(2)SG5 Ok
2 0017.e0fa.58c2 to 0017.e0fa.58c3 4.0 12.2(20r)EW1 15.0(2)SG5 Ok
We can also see that our redundancy has changed for both engines from RPR to SSO.
switch# sh redundancy states
my state = 13 -ACTIVE
peer state = 8 -STANDBY HOT
Mode = Duplex
Unit = Secondary
Unit ID = 2
Redundancy Mode (Operational) = Stateful Switchover
Redundancy Mode (Configured) = Stateful Switchover
Redundancy State = Stateful Switchover
Maintenance Mode = Disabled
Manual Swact = enabled
Communications = Up
client count = 60
client_notification_TMR = 240000 milliseconds
keep_alive TMR = 9000 milliseconds
keep_alive count = 1
keep_alive threshold = 18
RF debug mask = 0x0
That's it! We now have both supervisor engines running the new IOS version and the statefull switchover redundancy mode.
Troubleshooting
Standby Supervisor Engine Reload Loop
Sometimes the standby supervisor might go into a reload loop because of this message :
Current BOOT file is --- flash:cat4500-entservicesk9-mz.150-2.SG5.bin
Invalid filename flash:cat4500-entservicesk9-mz.150-2.SG5.bin. It must begin with device name.
You are then presented with a ROMMON prompt. Hit Ctrl-C to prevent a reload. Then, at the prompt, issue a boot command to load the IOS.
rommon 1 > boot bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin
The standby supervisor engine will boot the IOS. You can then check your configuration and fix the problem.
switch# show running-config | inc boot system
boot system flash flash:cat4500-entservicesk9-mz.150-2.SG5.bin
boot system flash bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin
First line is our problem. It says to boot from flash: instead of bootflash: like the second line. So to fix our problem, we only need to remove that first line.
switch# conf terminal
switch(config)# no boot system flash flash:cat4500-entservicesk9-mz.150-2.SG5.bin
switch(config)# end
switch# copy run start
switch# sh run | inc boot system
boot system flash bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin
There, that should do it. Now issue the standby supervisor reload again and see how it goes?
switch# redundancy reload peer
TFTP Configuration Backup Error
When we backup the switch's configuration to our TFTP server, we might have an access denied. That's because our TFTP server doesn't have the right to write into the /tftpboot directory. A simple, yet not very secure way to fix this is to change the permissions right before we do the config backup. Then place it back to what it was.
sudo chown o+rwx /tftpboot
switch# copy start tftp
sudo chmod o-w /tftpboot
Thank you so much David.
ReplyDeleteFollowing your tutorial, I was able to successfully replace a SUP and everything worked as a charm.
Keep up the good work :)
Cheers,
Chris
Hi Chris,
DeleteMy pleasure! I'm glad I could help :)
DA+
Thanks for this guide. I found it very useful and it really helped me upgrade of two (old) 4500's
ReplyDeleteBrian
Hi Brian, glad I could help! :)
DeleteHello David,
ReplyDeleteI have a question. My 4507 currently is running version 12.2(50)SG8. I see version 15.0.2SG3 with the feature that I need. I was wondering if I could go and install this version. Since it is 15.X do I have to have a licence to activate it? Your comments and help will be greatly appreciated.
Best Regards,
Robert
Hello Robert,
DeleteI'm not a Cisco IOS License expert. I think you pay by features rather than by specific versions. But don't quote me on that!
One way to find out if you can have version 15.x is simply to try and download it from your Cisco account. If you can't, then it's probably that your Cisco contract doesn't cover it. If you can, then you're in luck.
In the end, I've always found it was easier to say « I'm sorry » than to ask for permission ;)
HTH,
DA+
P.S. Of course, a good way to have a better answer that mine is to call your friendly Cisco account manager.
Thanks for answering David.
DeleteBest Regards,
Robert
Thanks you for documentation !
ReplyDeleteBest Regards,
Thanks it's helpful..
ReplyDeleteGlad you liked it!
DeleteSo i have question plz:
Deletewhen i am done with the upgrading and the configuration shouldn't i put the config-register back to be 0x2102 or leave it as 0x2 before shipping it to the customer?
Thanks
Hey Ahmed,
DeleteI would think so, yes. But please double-check with your Cisco support team to be 100 % positive on this one.
HTH,
David
Hi David. Many thanks for this article. Very well written. I had a query regarding upgrading the ROMMON and IOS at the same time for a 4510 with SSO. Have you come across that scenario before, & is it possible to reload the peer and have it come back up with the new ROMMON and IOS and then do a redundancy switch-over? And is this truly a "hitless" change? I am quite sure that there has to be a layer 2 and or layer 3 disconnection. But I keep reading "zero downtime" upgrades which confuses me.
ReplyDelete