Friday, February 27, 2009

Activating an Inactive Raid Element

You know its a tough problem when even the title needs an explanation. Consider a two disk, mirrored, software raid. If one disk fails, our data is safe. Imagine that disk A has the OS and a data partition. Disk B is the mirror for disk A's data partition. Now, disk A fails. How do we access the data on disk B (the mirror) on another machine... Without damaging the data!

Obvious first step, get it to a second machine. Boot the box and ensure the disk appears in fdisk. All good from a hardware standpoint. Lets look at the software:
# cat /proc/mdstat
Personalities :
md_d0 : inactive sdb1[0](S)
5237056 blocks

unused devices:
Since our partition is a type fd, it was recognized, but the kernel didn't have enough info to reactivate the raid.

Without taking you though the failures... well, just one:
# mdadm --assemble -v /dev/md0 /dev/sdb1
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sdb1: Device or resource busy
mdadm: /dev/sdb1 has no superblock - assembly aborted
(That's to seed the errors into the search engines.)

As I was saying: Without taking you through all the failures, here's how we get to our data.
# mdadm --stop /dev/md_d0
mdadm: stopped /dev/md_d0
# cat /proc/mdstat
Personalities :
unused devices:
# mdadm --assemble -v /dev/md0 /dev/sdb1 --run
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: added /dev/sdb1 to /dev/md0 as 0
mdadm: /dev/md0 has been started with 1 drive (out of 2).
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[0]
    5237056 blocks [2/1] [U_]

unused devices:
# mount /dev/md0 /mnt
# ls -l /mnt
total 16
-rw-r--r-- 1 root root     0 2009-02-26 23:38 keep.me

Monday, February 23, 2009

Making GFS available to Nodes

A big misunderstanding with GFS is that it is a network aware filesystem. Actually, you still need something to get the date from the server to the clients. For RHEL5, that something is GNBD... For anything newer than FC4... Its something else?

Assuming you have a two node RHEL 5 cluster, with a GFS file system presented to one node, we will now make it visible to a second node. First, load the needed packages:
yum install gnbd_utils
Turns out this process is not as well automated as most services. My guess is that this is a temporary solution, most likely to be replaced with iSCSI.

On the server side (side with the GFS volume), append to /etc/rc.d/rc.local:
gndb_serv
gndb_export -t /dev/gfs/test -e shared

Make sure the volume name on the server is unique and not VolGoup00 or vg0.

On the client side, append to /etc/rc.d/rc.local:
modprobe gndb
gndb_import -i gfs1.terran.lan

In this example, we exported an LVM, so we'll do a vgsan to see it.

Single Node GFS

Pretty much just for testing, here's a single node cluster.conf:

<cluster name="clust" config_version="1">
  <clusternodes>
    <clusternode name="gfs1.terran.lan" votes="1" nodeid="1">
      <fence>
        <method name="single">
          <device name="fence_dev" nodeid="gfs1.terran.lan"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fence_devices>
    <fence_device agent="fence_xvm" name="fence_dev"/>
  </fence_devices>
</cluster>

Saturday, February 21, 2009

Gnome Application Menus

I really don't like the way the Gnome applications menu works. Oh... I know KDE is better, but its what Red Hat uses, so I got to deal with it. The issue is that in Red Hat and Fedora, the menu is locked down to prevent you from adding new entries. Why, you ask... Good question:

To add menu items, we first have to be root. Next we go to a directory that holds a group of files that describe the menu entries. Lastly, we hack a new entry-- which is the tricky part.
$ su -
# cd /usr/share/applications
# ls
The files labeled .desktop are descriptor files with a proprietary format. The easiest thing is to copy one that is already there. As an example we'll add /usr/bin/xclock using gnome-about.desktop as our template. Unfortunately, gnome-about.desktop has a bunch of garbage in it for internationalization. Who'd want that?
# grep -v "[-=]GN\|.\[ gnome-about.desktop > ourName.desktop
vi ourName.desktop
We need to change some things, most of which should be obvious.
[Desktop Entry]
Encoding=UTF-8
Name=Xclock
Comment=Menu test using Xclock
Exec=/usr/bin/xclock
Icon=gnome-xterm.png
Terminal=false
Type=Application
StartupNotify=true
Save the file. Click the applications menu and select "Other". (If you don't see other, right click on "Applications", select "Edit Menus", check the box next to "Other". Click "Close".

We're not done, but do notice the icon. In the file we did not specify a path, so it looks under either /usr/share/pixmaps or /usr/share/icons. Edit the file again and make this change:
Icon=/usr/share/icons/gnome/32x32/actions/add.png
Save the file, and check the menu item again. The icon should have changed.

Now lets get it into a menu group other than "Other". This is the hacky part. Edit the file, add the following:
Categories=Utility;
Look through the menu again. In Red Hat and Fedora, it the item should have moved to the "Accessories" folder. How did it end up in "Accessories" when we told it to go to "Utilities". Unfortunately, the names we see are not the name of the groups. To see the mapping of groups to names, try this:
grep "Name=" /usr/share/desktop-directories/* | \
grep -iv more | cut -d/ -f5 | sort -t: -k2 | sed "s~\..*:Name~~"
Use the name on the left, to get it in the group on the right.

See, I told you this was a nasty hack.

Friday, February 20, 2009

Interesting Clustering Option

I've been playing with clustering, and found that there was a small issue with my original cluster.conf. The issue of concern was the use of manual fencing, as there are "some" distributions for which this causes problems. The trick is to make it work without the addition of outside hardware.

Yet, there is an option for VM fencing which is fun, because you can't hook a VM to a remote power switch. What you could do, is link the fencing daemon to the hypervisor, and accomplish the same thing. That's not what VM fencing does.

It does nothing. Just like manual... but its uniformily integrated. Wierd, but then that's the nature of all these hacks.
<fence_devices>
  <fence_device agent="fence_xvm" name="fence_dev"/>
</fence_devices>

Wednesday, February 18, 2009

GFS vs Multi VM, Pt. 2

Got it. First, read GFS vs Multi VM to get the back story. Several attempts at a GFS cluster failed. It works now... with a few sizeable caveats.

1. Clustering in Fedora 8 is broken, and can not be fixed.
2. Clustering in Fedora 10 will not run on less than 312Mb.
3. Filesystem must be an LVM.

On two, up2date, Fedora 10 VMs with a shared partition, load the three packages, and cluster.conf as previously specified. Do not use Conga or sys-con-cluster! The file must be hand edited. Ensure all host names and IPs are in /etc/hosts.

On the shared partition, pvcreate, vgcreate, lvcreate. On both nodes, vgscan, then execute:
# lvdisplay | grep -i "lv name"
Both nodes should see the LVM.

Start the cluster services:
# service cman start
# service clvmd start
If you connection to the node drops, you've crashed the memory.

From only one node, create the GFS share:
# mkfs.gfs2 -t gfs-test:gfsx -j 2 -p lock_dlm /dev/gfsx/test
For -t, the prefix must match the cluster name in the cluster.conf. The -j specifies the max number of nodes (journals.)

Now mount:
# mount /dev/gfsx/test /gfs
# gfs2_tool list
253:2 gfs-test:gfsx
All commands except mkfs.gfs2 are executed on both nodes.

Needless to say, I glad we're finally done with this one.

Wednesday, February 11, 2009

GFS -vs- Multi VM

Trying to connect two VMs to the same file system using GFS. The docuementation is okay up to the point where we need to mount GFS filesystem. That's when we get:
/sbin/mount.gfs2: waiting for gfs_controld to start
/sbin/mount.gfs2: gfs_controld not running
/sbin/mount.gfs2: error mounting lockproto lock_dlm
Appearently we have to cluster. Gee, it would have been nice if they had included those instructions in the "Prerequisites" section. So... What does it take?

I tried the system-config-cluster and didn't get the results I wanted. It wanted one node to be a primary, the other a secondary. I tried hacking cluster.conf, but too many variables (literal and metaphorical). So I built two VMs, attached them to the same LVM, and tried to get GFS to work.

Load packages:
# yum install cman gfs2-utils lvm2-cluster
These seem to be the minimum. There is a bug with Fedora 8, where it won't load unless you do two yum updates first. If you have an x86_64 system, make sure it only loads the 64 bit version. Easy enough.

The heart of the cluster is the non-existent /etc/custer/cluster.conf, which can be generated by several automated tools, all of which are broken. In other words, you absolutely must enter it by hand. Here's what we need:
<?xml version="1.0"?>
<cluster name="clust" config_version="1">
  <cman two_node="1" expected_votes="1">
  </cman>
  <clusternodes>
    <clusternode name="gfs1.terran.lan" votes="1" nodeid="1">
      <fence>
        <method name="single">
          <device name="human" ipaddr="gfs1.terran.lan"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="gfs2.terran.lan" votes="1" nodeid="2">
      <fence>
        <method name="single">
          <device name="human" ipaddr="gfs2.terran.lan"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fence_devices>
    <fence_device name="human" agent="fence_manual"/>
  </fence_devices>
</cluster>
Again, this is a bare minimum config.

Now try it:
# service cman start
Glory! It started.

Time to mount:
# mount /dev/sdb1 /gfs
/sbin/mount.gfs2: can't connect to gfs_controld: Connection rufused
Arrrrrgh! The error message:
No entry for lock_dlm_plock found
Is dlm missing from kernel?
No, it is not missing:
# lsmod | grep dlm
lock_dlm     25449 0
gfs2           489593 1 lock_dlm
dlm             123049 4 lock_dlm
configfs       32617 2 dlm
Okay, back to the drawing board.

More virsh Documentation Oversites

Tried to add a disk to a VM, the documentation was lacking. Here's the virsh command:
virsh attach-disk vmname /dev/disk sdb
In this example, it will appear as /dev/sdb to the VM. Unfortunately, partprobe did not work, we had to reboot.

In this case, I've added it to two VMs with the anticipation of granting multiple write access via GFS. Stay tuned.

Tuesday, February 03, 2009

Super Bowl 2009

More correctly, the commercials. The game-- didn't care, didn't even watch. Here are my favorite commercials in no particular order:
Audi: The Chase
Pepsi: I'm Good
Monster.com: Moose Head
Hulu.com: Al(i)e(n)c Baldwin

The good folks at Time Magazine will let you browse others.