Friday, February 27, 2009

Activating an Inactive Raid Element

You know its a tough problem when even the title needs an explanation. Consider a two disk, mirrored, software raid. If one disk fails, our data is safe. Imagine that disk A has the OS and a data partition. Disk B is the mirror for disk A's data partition. Now, disk A fails. How do we access the data on disk B (the mirror) on another machine... Without damaging the data!

Obvious first step, get it to a second machine. Boot the box and ensure the disk appears in fdisk. All good from a hardware standpoint. Lets look at the software:
# cat /proc/mdstat
Personalities :
md_d0 : inactive sdb1[0](S)
5237056 blocks

unused devices:
Since our partition is a type fd, it was recognized, but the kernel didn't have enough info to reactivate the raid.

Without taking you though the failures... well, just one:
# mdadm --assemble -v /dev/md0 /dev/sdb1
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sdb1: Device or resource busy
mdadm: /dev/sdb1 has no superblock - assembly aborted
(That's to seed the errors into the search engines.)

As I was saying: Without taking you through all the failures, here's how we get to our data.
# mdadm --stop /dev/md_d0
mdadm: stopped /dev/md_d0
# cat /proc/mdstat
Personalities :
unused devices:
# mdadm --assemble -v /dev/md0 /dev/sdb1 --run
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: added /dev/sdb1 to /dev/md0 as 0
mdadm: /dev/md0 has been started with 1 drive (out of 2).
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[0]
    5237056 blocks [2/1] [U_]

unused devices:
# mount /dev/md0 /mnt
# ls -l /mnt
total 16
-rw-r--r-- 1 root root     0 2009-02-26 23:38 keep.me

14 comments:

  1. Just the information I was looking for. Thank you.

    ReplyDelete
  2. Fantastic ... you save my life !

    ReplyDelete
  3. much thanks.
    that did the trick quickly.

    ReplyDelete
  4. This is just what I needed, thanks!

    ReplyDelete
  5. Thank you very much, that helped me a lot!

    ReplyDelete
  6. Thanks!!!!! :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :) :)

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. Thank you, you saved my bacon ! After having a 4 disk raid system (raid 5) have a disk failure. I just could not figure how to active the array sans the failed disk, this worked wonders. Thank you !

    ReplyDelete
  9. Thank you so much. I tried to add the replacement disk to my broken RAID5 array and kept getting "mdadm: cannot get array info for /dev/md1". Your instructions guided me in the right direction, but I would suggest changing your instructions to this instead:

    I first stopped /dev/md1:
    mdadm --stop /dev/md1

    I then told mdadm to find all arrays and force-start them:
    mdadm --assemble --scan --run

    They then started running in a degraded (1 disk missing) state, and that allowed me to add the replacement disks:
    mdadm --manage /dev/md0 --add /dev/sdd1
    mdadm --manage /dev/md1 --add /dev/sdd2

    I watched the process with:
    watch cat /proc/mdstat and let the arrays rebuild to completion, since using the computer while they are building causes the rebuild process to throttle itself and take way longer

    ReplyDelete
  10. Oh yeah, sorry, BEFORE you decide which array you must stop, you need to run "cat /proc/mdstat" to see which arrays are listed as inactive, then issue the stop command to ALL those arrays.

    ReplyDelete
  11. GREAT!
    Thank you very much - it saved me a lot of try&error ;-)

    ReplyDelete
  12. saved my day.

    the /proc/mdstat and mdadm --assemble -v /dev/md0 /dev/sdb1 section brought me here :)

    thanks

    ReplyDelete
  13. grande! thank you, made my day!

    ReplyDelete