Wednesday, June 16, 2010

FIXING A SOLARIS BOOT ARCHIVE

Solaris 10 x86 machines (since the release of Solaris 1/06) use Grub for booting. To boot the machine, Grub loads a boot archive, which is a ramdisk image of the kernel and key data. This allows the kernel to boot without performing I/O on the root filesystem. If you're familiar with Linux, this is very similar to the way initrd works.
The boot archive resides at: /platform/i86pc/boot_archive, while the list of files included in the archive can be found at: /boot/solaris/filelist.ramdisk.
When a Solaris 10 x86 machine is shutdown, the system checks whether the boot archive needs updating. If it does, you will see a message of the form:
updating /platform/i86pc/boot_archive...this may take a minute
Solaris generally keeps the archive updated correctly, but sometimes the archive can become corrupt (for example, due to a crash or a bad patch update).
If the archive is corrupt, the system will hang when Grub tries to boot Solaris. This is accompanied by a message about a corrupt ram disk, but you may just be left with a blank screen if you're on a serial terminal. To allow the system to boot again you need to rebuild the boot archive.
Special Case: mdi_ib_cache
If on booting you get an error message of the form:
WARNING - The following files in / differ from the boot archive:
cannot find: /etc/devices/mdi_ib_cache: No such file or directory
The recommended action is to reboot and select "Solaris failsafe"
option from the boot menu. Then follow prompts to update the boot archive.
You do not need to use failsafe mode. This error is caused by the boot archive tripping over changes in the devid-cache, see bug 6256649.
Fixing this issue is as simple as clearing the boot archive cache, rebuilding the boot archive and rebooting. Log in at the maintenance prompt, then run:
svcadm clear system/boot-archive
bootadm update-archive
shutdown -i 6
Rebuilding the Boot Archive
Boot into Failsafe Mode
Reset the machine and select the 'Solaris failsafe' option from the Grub menu. The system will boot from a standalone image of Solaris (kept at /boot/x86.miniroot-safe), bypassing the broken boot archive.
Simple Root Partition
If your root filesystem is a simple partition (not mirrored by Solaris) Solaris will offer to mount your Solaris install on /a, accept this. You may also be prompted to repair your boot archive, if so follow the instructions and reboot.
If you need or want to manually update the boot archive, run the following command, then reboot:
bootadm update-archive -R /a
You should now have a working system.
Mirrored Root Partition
If your system uses a metadevice mirror for the root partition you will receive a message about the partitions being skipped because they are meta devices, for example:
Searching for installed OS instances...
/dev/dsk/c3t0d0s0 is under md control, skipping.
/dev/dsk/c3t1d0s0 is under md control, skipping.
No installed OS instance found.
The process to fix the boot archive in this case is longer, but provided you follow all the steps, still straightforward.
Mount Filesystem
Start by mounting the first half of the mirror onto /a. The name of the first half of the mirror is given in the 'md control' message, in this case '/dev/dsk/c3t0d0s0':
mount /dev/dsk/c3t0d0s0 /a
Update vfstab
You must now update the /a/etc/vfstab file to use this single partition as the root filesystem. If you don't, the two halves of the mounted root filesystem will be out of sync after you reboot and you'll have serious problems. NB. You only need to change the entry for / in vfstab.
Take a backup of the vfstab:
cp /a/etc/vfstab /a/etc/vfstab_backup
Open the vfstab file in an editor and comment out the existing line for the root filesystem then add a new line for the single partition. For example you might have the following line in your vfstab:
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options

/dev/md/dsk/d10 /dev/md/rdsk/d10 / ufs 1 no -
In our example the partition is called /dev/dsk/c3t0d0s0, so would update the vfstab with (note the rdsk in the second column of the new entry):
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options

# /dev/md/dsk/d10 /dev/md/rdsk/d10 / ufs 1 no -
/dev/dsk/c3t0d0s0 /dev/rdsk/c3t0d0s0 / ufs 1 no -
Update Boot Archive
We can now safely update the boot archive and reboot:
bootadm update-archive -R /a
shutdown -i 6
At the Grub menu select the normal Solaris option (not failsafe). If the boot fails because of mount problems, reboot into failsafe mode and check your vfstab file is correct.
Fixing the Root Mirror
You should now have a working Solaris system, but your root filesystem is no longer mirrored. To fix this you need to rebuild the meta device.
Identify the name of the root filesystem metadevice from vfstab (it's the line you commented out earlier: /dev/md/dsk/d10 in this example). Then use metastat to determine the components of the mirror:
# metastat d10
d10: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d12
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 20482875 blocks (9.8 GB)

d11: Submirror of d10
State: Okay
Size: 20482875 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c3t0d0s0 0 No Okay Yes


d12: Submirror of d10
State: Okay
Size: 20482875 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c3t1d0s0 0 No Okay Yes
In our example we can see that the mirror is composed of two submirrors, d11 (Device: c3t0d0s0) and d12 (Device: c3t1d0s0).
The boot archive was fixed on c3t0d0s0, so d11 is the good half, and d12 the faulty half. You should therefore detach d12
metadetach d10 d12
We can now switch back to using the metadevice as the root filesystem (using the backup of the vfstab you created earlier) and reboot:
cp /a/etc/vfstab_backup /a/etc/vfstab
shutdown -i 6
Once your systems has rebooted the root filesystem is back on the metadevice, but it's not mirrored, to re-enable the mirror you just need to reattach the device you detached above (in this case d12):
metattach d10 d12
The mirror will now resync, you can check on its progress with metastat:
metastat d12
Once syncing is complete you have a working mirrored root filesystem.

No comments:

Post a Comment