Monday, May 27, 2013

PVM, aka Beowulf Cluster

I stumbled upon a little project this weekend and found myself without a Beowulf cluster to help out.  It had been several years since I'd built a computational cluster, so I noticed a few "new" gothchas.  But... before we get to the fun stuff, let's review:
  • No, Beowulf is not "dead technology"
  • No, Hadoop is not the perfect tool for every job
That which was once called a Beowulf cluster is actually the use of a Message Passing Interface (MPI) that when deployed across nodes, create a Parallel Virtual Machine (PVM).  Ready for this-- the current generation of Hadoop (et al) clusters, are actually MPIs, and thus PVMs.  Given network attached storage, a map-reduce cluster is, in theory, Beowulf compliant.

To set up the absolute simplest PVM, we need two nodes, with an NFS share, and a user account.  The user needs an SSH key pair distributed to all nodes such that the user can login to any machine, from any machine.  Each node's hostname must be able to resolve via DNS or /etc/hosts.  Each node's hostname and address must be statically configured, and cannot be "localhost".

The first step is to install the base package from an EPEL repo.  (I'm using Scientific Linux 6.)  The package is delivered as source and must be compiled with a minimal set of options:
yum install -y pvm --nogpgcheck
rpm -ql pvm | grep -m1 pvm3
/usr/share/pvm3
This shows us where the RPM installed the source.  The issue with this incarnation is that it is still configured for RSH rather than SSH:
export PVM_ROOT=/usr/share/pvm3/
cd $PVM_ROOT
find . -type f -exec sed -i "s~bin/rsh~bin/ssh~g" {} \;
make; make install
Unfortunately, there are still hard-coded references to RSH in some of the binary libraries, so we spoof the references with a symlink:
ln -s /usr/bin/ssh /usr/bin/rsh
Repeat these steps on all (both) nodes.

On only one of the nodes (it doesn't matter which one) validate that PVM is not running, configure the PVM_ROOT variable, and start the first instance as the non-root user:
ps -ef | awk '!/awk/ && /pvm/'
echo "export PVM_ROOT=/usr/share/pvm3" >> ~/.bashrc
echo id | pvm
pvm> id
t40001
Console: exit handler called
pvmd still running.
ps -ef | awk '!/awk/ && /pvm/'
compute <snip> /usr/share/pvm3/lib/LINUXX86_64/pvmd3
Notice that the PVM deamon launched and remained resident.  Individual commands can be piped to PVM, or an interactive console can be used.  From the same node, remotely configure the next node:
ssh pvm2 'echo "export PVM_ROOT=/usr/share/pvm3" \
                >> ~/.bashrc'
# should not prompt for a password
ssh pvm2 'echo $PVM_ROOT'
/usr/share/pvm3
ssh pvm2 'rm -f /tmp/pvm*'
The last line is the very, very, important.  From the first node, remotely start the second node:
pvm
pvmd already running.
pvm> conf
conf
1 host, 1 data format
   HOST     DTID     ARCH   SPEED       DSIG
   pvm1    40000 LINUXX86_64    1000 0x00408c41
pvm> add pvm2
add pvm2
1 successful
   HOST     DTID
   pvm2    80000
pvm> conf
conf
2 hosts, 1 data format
   HOST     DTID     ARCH   SPEED       DSIG
   pvm1    40000 LINUXX86_64    1000 0x00408c41
   pvm2    80000 LINUXX86_64    1000 0x00408c41
In this sequence, we have accessed the console on pvm1 to view the clusters configuration (conf).  Next, we started the second node.  It is now displayed in the cluster's conf.

Just for fun, let's throw it the simplest of compute jobs:
pvm> spawn -4 -> /bin/hostname
4 successful
t8000b
t8000c
t4000c
t4000d
pvm>
[3:t4000d] pvm1
[3:t4000c] pvm1
[3:t4000c] EOF
[3:t4000d] EOF
[3:t8000b] pvm2
[3:t8000b] EOF
[3:t8000c] pvm2
[3:t8000c] EOF
[3] finished
There are a few things to notice about the output:
  1. The command asked the cluster to spawn the command "/bin/hostname" four times.
  2. The "->" option indicates we wanted the output returned to the console, which is completely abnormal... we only do this for testing.
  3. The prompt returned before the output.  The assumption is that our compute jobs will take an extended period of time.
  4. The responses were not displayed correctly.  They were displayed as they returned, because all this magic is happening asynchronously.
  5. Each job's responses, from each node, could be grep'ed from the output using a unique serial number, automatically assigned to the job.
To leave the console and return to the command prompt issue the quit command.  All started nodes will continue to run.  To shutdown the compute cluster, execute:
echo halt | pvm
Finally, remember this one last thing: The cluster is a peer-to-peer grid.  Any node can manage any other, any node can schedule jobs, and any node can issue a halt.

Monday, March 11, 2013

Fun with Unicode Characters

Whenever I am tasked with creating a web page, it ends up being the absolute bare minimum.  (If you don't believe me, just visit dougbunger.com!)  Of course I do it in the interest of fast rendering and bandwidth conservation... because I am a good Internet citizen.  So here are some fun unicode graphics that can be used as web page icons.  There are thousands of characters, but these seem to be a good cross platform sub-set.

And by the way: Excuse the font.
This -->  &
...is an ampersand.
&#8592;←     &#8593;↑     &#8594;→     &#8595;↓    
&#8596;↔     &#8597;↕    
&#8656;⇐     &#8657;⇑     &#8658;⇓     &#8659;⇓    
&#8660;⇔     &#8962;⌂    
&#9632;■     &#9633;□     &#9642;▪     &#9643;▫    
&#9650;▲     &#9658;►     &#9660;▼     &#9668;◄    
&#9675;○     &#9679;●     &#9686;◖     &#9688;◘    
&#9991;✇     &#9992;✈     &#10003;✓     &#10085;❥    
&#10162;➲     &#10163;➳     &#10168;➸     &#10172;➼    

Wednesday, March 06, 2013

Removing Old Linux Kernels

Today, I had trouble removing an obsolete kernel from my workstation. It should have been simple enough, but I tried to use yum erase rather than rpm -e, and kept running into errors. That is obviously the bad news, so let's make sure to report the good news: YUM is such an improvement over RPM alone, that it is smart enough to know which kernels are obsolete. For instance:
# rpm -qa kernel
kernel-2.6.32-279.el6.x86_64
kernel-2.6.32-279.19.1.el6.x86_64
kernel-2.6.32-279.22.1.el6.x86_64
# uname -r
2.6.32-279.22.1.el6.x86_64
# yum erase kernel
<snip>
Removing:
kernel   x86_64   2.6.32-279.el6
kernel   x86_64   2.6.32-279.19.1.el6
Is this ok [y/N]:
First, we determine the machine has three kernels. Second, we see that that it is running the most recent version, dot-22. Finally, YUM demonstrates that it is smart enough to erase the two old kernels, but not the current kernel.

One small problem: I don't want to remove dot-19 because I have a driver problem with dot-22. I only want to remove dot-null. Here's the trick:
# yum list kernel
Loaded plugins: refresh-packagekit, security
Installed Packages
kernel.x86_64   2.6.32-279.el6
kernel.x86_64   2.6.32-279.19.1.el6
kernel.x86_64   2.6.32-279.22.1.el6
# yum erase kernel-2.6.32-279.el6
The critical success factors are to drop the arch and t0 add a dash(-) between the package name and the version number.

Sunday, February 10, 2013

RHEL6 Udev Rules

I recently moved my home workstation from Fedora to Scientific Linux 6, on the grounds that Fedora has diverged too far from the current RedHat distribution.  Sure, bleeding edge is cool, but as a self professed Linux mercenary, I need to be in sync with what the real world is doing... not what it might be doing.

After the move, I've found myself annoyed by the way the Gnome desktop handles removable media, in particular media cards such as Flash and Secure Digital (SD).  One trick I learned a while back, was to make sure to assign an e2label to cards formatted with an ext filesystem.  This way, when Gnome automounts the media and places an icon on the desktop, the name is the e2label.  Without an e2label, the icon's text is the device size.  This is also true of FAT devices.

The real problem, however, is the fact that the device is owned by root.  Since the desktop is running as an unprivileged user (because we never login the GUI as root... right?) we are faced with an icon for an device that we can't drop-and-drag to.  Doh! Here's how I used Udev to trick the system into allowing my GUI account to use these devices.

First, insert the device, allow it to automount, and appear on the desktop.  (We won't worry with how the kernel, udev, fuse, and the desktop is accomplishing this.)  Assuming an ext device, it was probably mounted to a dynamic mountpoint under /media; in this case, we ended up with:
# mount | grep media
/dev/sdb1 on /media/Flash_1GB type ext3
   (rw,nosuid,nodev,uhelper=udisks)
# ls -ld /media/Flash_1GB
drwxr-xr-x. 4 root root 4096 Feb  9 15:55 /media/Flash_1GB/
The goal is to modify a few mount options and change the ownership of the device.  To accomplish this, we need to tell Udev to watch for a given device and respond in a specific manner.  This requires isolating a unique aspect of the device that can be used a s trigger.  The command to manage Udev changed with RHEL6:
# udevadm info --query=all --attribute-walk --name=/dev/sdb
<snip>
looking at device '/devices/<snip>/6:0:0:0/block/sdb':
    KERNEL=="sdb"
    SUBSYSTEM=="block"
    DRIVER==""
    ATTR{range}=="16"
    ATTR{ext_range}=="256"
    ATTR{removable}=="1"
    ATTR{ro}=="0"
    ATTR{size}=="2001888"
    <snip>
  looking at parent device '/devices/<snip>/6:0:0:0':
    KERNELS=="6:0:0:0"
    SUBSYSTEMS=="scsi"
    DRIVERS=="sd"
    <snip>
    ATTRS{vendor}=="Generic-"
    ATTRS{model}=="Compact Flash   "
    <snip>
There are a few things to notice about this output.  On the command line, the name is the disk, not the mounted partition.  The top most block is the device, blocks that follow are upstream devices.  We are most interested in the ATTR fields.  Don't be seduced by the first directive, "KERNEL=="sdb"... we all know that Linux is notorious for changing device letters on reboot.

Second, Udev rules are created as code snippets in the /etc/udev/rules.d dir.  For simplicity sake, create a file called 99-local.rules and add all machine specific rules to this one file.  Each rule is one line.  There are many sophisticated and elegant things that can be done by Udev, but my example is a simple sledgehammer:
SUBSYSTEMS=="scsi", 
ATTRS{model}=="Compact Flash   ",
ATTR{size}=="2001888",
RUN+="/bin/sh -c 'mount /media/Flash_1GB
   -o remount,noatime,nodiratime,async;
   chown doug:doug /media/Flash_1GB' "
The first directive tells the machine that we're dealing with a disk (we could have used "block".)  The second directive is an attribute that was listed for the device (notice the spaces: it has to exactly match the output from udevadm.)  The third attribute is the device size, so this rule applies just to this card, or atleast to cards with exactly this number of sectors.  The last part of the rule is the RUN command, which executes a set of bash commands.  In this case, I'm changing the default mount options, then I'm changing the mount point ownership.  Using the RUN feature provides infinite flexibility.

Tuesday, January 15, 2013

Eclipse Plugins For RedHat

You know how they say you shouldn't look at the sun during an eclipse or you'll go blind?  If there was any truth to that, why aren't there villages full of blind people in third world nations.  Why aren't there myths about the time that everyone on Earth went blind?  Think about it...  There had to be a first eclipse.  Who told the first dude not to look at it or he'd go blind?  Read on for the answer.

In the mean time, I've been beating myself up for a few days trying to get the Epic plugin for the Eclipse IDE installed.  I've got it on my Fedora desktop at home, and wanted it for a Linux machine at work, but the eclipse-epic RPM wasn't on Satellite. Simple enough: download it and install it from Epic-IDE.org... and spend the next several days wonder why it doesn't work.

The first issue is that the documentation has not been updated in several revisions, so the instructions for Epic are completely out of line with Eclipse 3.x.  Every time I would try follow a path through the point an click menus that seemed reasonable, I would get a message such as "could not find jar" or "no software site found at jar".  The obvious problem would be permissions, but all files and paths were world read-able.

The next obvious choice was to start hacking under the hood.  I looked at the Fedora eclipse-epic RPM and compared it to the Eclipse install tree.  I thought what seemed like a promising option when I found a plugins directory, but I could never get the machine to pickup the files.

Then, I tried something soooo stupid, it had to work.  I entered some arbitrary text into an unrelated field.  Of course it immediately launched exactly as expected.  The trick is to understand that which undocumented fields are required, but not checked by the application.  So, on Eclipse 3.x on RedHat Linux, to add a plugin:
Extract (unzip) the plugin
Launch Eclipse
Help / Install New Software...
Click Add... and Local...
Browse to (not into) the extracted directory
Click OK
*** In the Name field, provide some text ***
Click OK
The plugin options should appear in the pop-up window.  From this point, it should just be a case of checking boxes and accepting defaults.  Right?  Wrong!  Now we get dozens of lines of Java-esque errors, which for those of you who have ever worked with Java is line after line of completely useless garbage.  Take for instance the line:
No repository found containing: osgi.bundle,
A reasonable person might think that this and the fifty lines that follow it are telling you that there is a missing dependency.  Obviously, what that means is that the only way to install the plugin is to be root.

Remember those permission problems from earlier?  Its not that we didn't have permission to the files we just installed, its that we don't have permission to the Eclipse installation tree.  So...
Exit Eclipse
Open a terminal window and su to root
Launch Eclipse from the command line
Help / Install New Software...
Select "Available Software Sites"
Highlight the failed plugin
Click Remove and OK
Continue from "Click Add..." in the step above.

As for why the first human didn't go blind during the first eclipse?  He was too busy trying to figure out why his wheel wouldn't roll, because the instructions didn't mention that it had to be upright.