Grub Fallback: Boot good kernel if new one crashes
t’s hard to believe but I didn’t know about Grub fallbackfeature. So every time when I needed to reboot remote server into a new kernel I had to test it on local server to make sure it won’t panic on remote unit. And if kernel panic still happened I had to ask somebody who has physical access to the server to reboot the hardware choose proper kernel in Grub. It’s all boring and not healthful – it’s much better to use Grub’s native fallback feature.
Grub is default boot loader in most Linux distributions today, at least major distros like Centos/Fedora/RedHat, Debian/Ubuntu/Mint, Arch use Grub. This makes it possible to use Grub fallback feature just out of the box. Here is example scenario.
There is remote server hosted in New Zealand and you (sitting in Denmark) have access to it over the network only (no console server). In this case you cannot afford that the new kernel makes server unreachable, e.g. if new kernel crash during boot it won’t load network interface drivers so your Linux box won’t appear online until somebody reboots it into workable kernel. Thankfully Grub can be configured to try loading new kernel once and if it fails Grub will load another kernel according to configuration. You can see my example grub.conf below:
default=saved timeout=5 splashimage=(hd0,1)/boot/grub/splash.xpm.gz hiddenmenu fallback 0 1 title Fedora OpenVZ (2.6.32-042stab053.5) root (hd0,1) kernel /boot/vmlinuz-2.6.32-042stab053.5 ro root=UUID=6fbdddf9-307c-49eb-83f5-ca1a4a63f584 rd_MD_UUID=1b9dc11a:d5a084b5:83f6d993:3366bbe4 rd_NO_LUKS rd_NO_LVM rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=sv-latin1 rhgb quiet crashkernel=auto initrd /boot/initramfs-2.6.32-042stab053.5.img savedefault fallback title Fedora (22.214.171.124-88.fc14.i686) root (hd0,1) kernel /boot/vmlinuz-126.96.36.199-88.fc14.i686 ro root=UUID=6fbdddf9-307c-49eb-83f5-ca1a4a63f584 rd_MD_UUID=1b9dc11a:d5a084b5:83f6d993:3366bbe4 rd_NO_LUKS rd_NO_LVM rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=sv-latin1 rhgb quiet initrd /boot/initramfs-188.8.131.52-88.fc14.i686.img savedefault fallback
According to this configuration Grub will try to load ‘Fedora OpenVZ’ kernel once and if it fails system will be loaded into good ‘Fedora’ kernel. If ‘Fedora OpenVZ’ loads well you’ll be able to reach the server over the network after reboot. Notice lines ‘default=saved’ and ‘savedefault fallback’ which are mandatory to make fallback feature working.
I’ve heard that official Grub fallback feature may work incorrectly on RHEL5 (and Centos 5) so there is elegant workaround (found here):
1. Add param ‘panic=5? to your new kernel line so it looks like below:
title Fedora OpenVZ (2.6.32-042stab053.5) root (hd0,1) kernel /boot/vmlinuz-2.6.32-042stab053.5 ro root=UUID=6fbdddf9-307c-49eb-83f5-ca1a4a63f584 rd_MD_UUID=1b9dc11a:d5a084b5:83f6d993:3366bbe4 rd_NO_LUKS rd_NO_LVM rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=sv-latin1 rhgb quiet crashkernel=auto panic=5 initrd /boot/initramfs-2.6.32-042stab053.5.img
This param will make crashed kernel to reboot itself in 5 seconds.
2. Point default Grub param to good kernel, e.g. ‘default=0?.
3. Type in the following commands (good kernel appears in grub.conf first and new kernel is second one):
# grub grub> savedefault --default=1 --once savedefault --default=1 --once grub> quit
This will make Grub to boot into new kernel once and if it fails it will load good kernel. Now you can reboot the server and make sure it will 100% appear online in a few minutes. I usually prefer native Grub fallback feature but if you see it doesn’t work for you it makes sense to try above mentioned workaround.