Thursday, March 13, 2014

KVM Network Bridge

Finally, a simple way to configure a bridged network for a KVM server:
virsh iface-bridge eth2 br0

Monday, May 27, 2013

PVM, aka Beowulf Cluster

I stumbled upon a little project this weekend and found myself without a Beowulf cluster to help out.  It had been several years since I'd built a computational cluster, so I noticed a few "new" gothchas.  But... before we get to the fun stuff, let's review:
  • No, Beowulf is not "dead technology"
  • No, Hadoop is not the perfect tool for every job
That which was once called a Beowulf cluster is actually the use of a Message Passing Interface (MPI) that when deployed across nodes, create a Parallel Virtual Machine (PVM).  Ready for this-- the current generation of Hadoop (et al) clusters, are actually MPIs, and thus PVMs.  Given network attached storage, a map-reduce cluster is, in theory, Beowulf compliant.

To set up the absolute simplest PVM, we need two nodes, with an NFS share, and a user account.  The user needs an SSH key pair distributed to all nodes such that the user can login to any machine, from any machine.  Each node's hostname must be able to resolve via DNS or /etc/hosts.  Each node's hostname and address must be statically configured, and cannot be "localhost".

The first step is to install the base package from an EPEL repo.  (I'm using Scientific Linux 6.)  The package is delivered as source and must be compiled with a minimal set of options:
yum install -y pvm --nogpgcheck
rpm -ql pvm | grep -m1 pvm3
/usr/share/pvm3
This shows us where the RPM installed the source.  The issue with this incarnation is that it is still configured for RSH rather than SSH:
export PVM_ROOT=/usr/share/pvm3/
cd $PVM_ROOT
find . -type f -exec sed -i "s~bin/rsh~bin/ssh~g" {} \;
make; make install
Unfortunately, there are still hard-coded references to RSH in some of the binary libraries, so we spoof the references with a symlink:
ln -s /usr/bin/ssh /usr/bin/rsh
Repeat these steps on all (both) nodes.

On only one of the nodes (it doesn't matter which one) validate that PVM is not running, configure the PVM_ROOT variable, and start the first instance as the non-root user:
ps -ef | awk '!/awk/ && /pvm/'
echo "export PVM_ROOT=/usr/share/pvm3" >> ~/.bashrc
echo id | pvm
pvm> id
t40001
Console: exit handler called
pvmd still running.
ps -ef | awk '!/awk/ && /pvm/'
compute <snip> /usr/share/pvm3/lib/LINUXX86_64/pvmd3
Notice that the PVM deamon launched and remained resident.  Individual commands can be piped to PVM, or an interactive console can be used.  From the same node, remotely configure the next node:
ssh pvm2 'echo "export PVM_ROOT=/usr/share/pvm3" \
                >> ~/.bashrc'
# should not prompt for a password
ssh pvm2 'echo $PVM_ROOT'
/usr/share/pvm3
ssh pvm2 'rm -f /tmp/pvm*'
The last line is the very, very, important.  From the first node, remotely start the second node:
pvm
pvmd already running.
pvm> conf
conf
1 host, 1 data format
   HOST     DTID     ARCH   SPEED       DSIG
   pvm1    40000 LINUXX86_64    1000 0x00408c41
pvm> add pvm2
add pvm2
1 successful
   HOST     DTID
   pvm2    80000
pvm> conf
conf
2 hosts, 1 data format
   HOST     DTID     ARCH   SPEED       DSIG
   pvm1    40000 LINUXX86_64    1000 0x00408c41
   pvm2    80000 LINUXX86_64    1000 0x00408c41
In this sequence, we have accessed the console on pvm1 to view the clusters configuration (conf).  Next, we started the second node.  It is now displayed in the cluster's conf.

Just for fun, let's throw it the simplest of compute jobs:
pvm> spawn -4 -> /bin/hostname
4 successful
t8000b
t8000c
t4000c
t4000d
pvm>
[3:t4000d] pvm1
[3:t4000c] pvm1
[3:t4000c] EOF
[3:t4000d] EOF
[3:t8000b] pvm2
[3:t8000b] EOF
[3:t8000c] pvm2
[3:t8000c] EOF
[3] finished
There are a few things to notice about the output:
  1. The command asked the cluster to spawn the command "/bin/hostname" four times.
  2. The "->" option indicates we wanted the output returned to the console, which is completely abnormal... we only do this for testing.
  3. The prompt returned before the output.  The assumption is that our compute jobs will take an extended period of time.
  4. The responses were not displayed correctly.  They were displayed as they returned, because all this magic is happening asynchronously.
  5. Each job's responses, from each node, could be grep'ed from the output using a unique serial number, automatically assigned to the job.
To leave the console and return to the command prompt issue the quit command.  All started nodes will continue to run.  To shutdown the compute cluster, execute:
echo halt | pvm
Finally, remember this one last thing: The cluster is a peer-to-peer grid.  Any node can manage any other, any node can schedule jobs, and any node can issue a halt.

Monday, March 11, 2013

Fun with Unicode Characters

Whenever I am tasked with creating a web page, it ends up being the absolute bare minimum.  (If you don't believe me, just visit dougbunger.com!)  Of course I do it in the interest of fast rendering and bandwidth conservation... because I am a good Internet citizen.  So here are some fun unicode graphics that can be used as web page icons.  There are thousands of characters, but these seem to be a good cross platform sub-set.

And by the way: Excuse the font.
This -->  &
...is an ampersand.
&#8592;←     &#8593;↑     &#8594;→     &#8595;↓    
&#8596;↔     &#8597;↕    
&#8656;⇐     &#8657;⇑     &#8658;⇓     &#8659;⇓    
&#8660;⇔     &#8962;⌂    
&#9632;■     &#9633;□     &#9642;▪     &#9643;▫    
&#9650;▲     &#9658;►     &#9660;▼     &#9668;◄    
&#9675;○     &#9679;●     &#9686;◖     &#9688;◘    
&#9991;✇     &#9992;✈     &#10003;✓     &#10085;❥    
&#10162;➲     &#10163;➳     &#10168;➸     &#10172;➼    

Wednesday, March 06, 2013

Removing Old Linux Kernels

Today, I had trouble removing an obsolete kernel from my workstation. It should have been simple enough, but I tried to use yum erase rather than rpm -e, and kept running into errors. That is obviously the bad news, so let's make sure to report the good news: YUM is such an improvement over RPM alone, that it is smart enough to know which kernels are obsolete. For instance:
# rpm -qa kernel
kernel-2.6.32-279.el6.x86_64
kernel-2.6.32-279.19.1.el6.x86_64
kernel-2.6.32-279.22.1.el6.x86_64
# uname -r
2.6.32-279.22.1.el6.x86_64
# yum erase kernel
<snip>
Removing:
kernel   x86_64   2.6.32-279.el6
kernel   x86_64   2.6.32-279.19.1.el6
Is this ok [y/N]:
First, we determine the machine has three kernels. Second, we see that that it is running the most recent version, dot-22. Finally, YUM demonstrates that it is smart enough to erase the two old kernels, but not the current kernel.

One small problem: I don't want to remove dot-19 because I have a driver problem with dot-22. I only want to remove dot-null. Here's the trick:
# yum list kernel
Loaded plugins: refresh-packagekit, security
Installed Packages
kernel.x86_64   2.6.32-279.el6
kernel.x86_64   2.6.32-279.19.1.el6
kernel.x86_64   2.6.32-279.22.1.el6
# yum erase kernel-2.6.32-279.el6
The critical success factors are to drop the arch and t0 add a dash(-) between the package name and the version number.

Sunday, February 10, 2013

RHEL6 Udev Rules

I recently moved my home workstation from Fedora to Scientific Linux 6, on the grounds that Fedora has diverged too far from the current RedHat distribution.  Sure, bleeding edge is cool, but as a self professed Linux mercenary, I need to be in sync with what the real world is doing... not what it might be doing.

After the move, I've found myself annoyed by the way the Gnome desktop handles removable media, in particular media cards such as Flash and Secure Digital (SD).  One trick I learned a while back, was to make sure to assign an e2label to cards formatted with an ext filesystem.  This way, when Gnome automounts the media and places an icon on the desktop, the name is the e2label.  Without an e2label, the icon's text is the device size.  This is also true of FAT devices.

The real problem, however, is the fact that the device is owned by root.  Since the desktop is running as an unprivileged user (because we never login the GUI as root... right?) we are faced with an icon for an device that we can't drop-and-drag to.  Doh! Here's how I used Udev to trick the system into allowing my GUI account to use these devices.

First, insert the device, allow it to automount, and appear on the desktop.  (We won't worry with how the kernel, udev, fuse, and the desktop is accomplishing this.)  Assuming an ext device, it was probably mounted to a dynamic mountpoint under /media; in this case, we ended up with:
# mount | grep media
/dev/sdb1 on /media/Flash_1GB type ext3
   (rw,nosuid,nodev,uhelper=udisks)
# ls -ld /media/Flash_1GB
drwxr-xr-x. 4 root root 4096 Feb  9 15:55 /media/Flash_1GB/
The goal is to modify a few mount options and change the ownership of the device.  To accomplish this, we need to tell Udev to watch for a given device and respond in a specific manner.  This requires isolating a unique aspect of the device that can be used a s trigger.  The command to manage Udev changed with RHEL6:
# udevadm info --query=all --attribute-walk --name=/dev/sdb
<snip>
looking at device '/devices/<snip>/6:0:0:0/block/sdb':
    KERNEL=="sdb"
    SUBSYSTEM=="block"
    DRIVER==""
    ATTR{range}=="16"
    ATTR{ext_range}=="256"
    ATTR{removable}=="1"
    ATTR{ro}=="0"
    ATTR{size}=="2001888"
    <snip>
  looking at parent device '/devices/<snip>/6:0:0:0':
    KERNELS=="6:0:0:0"
    SUBSYSTEMS=="scsi"
    DRIVERS=="sd"
    <snip>
    ATTRS{vendor}=="Generic-"
    ATTRS{model}=="Compact Flash   "
    <snip>
There are a few things to notice about this output.  On the command line, the name is the disk, not the mounted partition.  The top most block is the device, blocks that follow are upstream devices.  We are most interested in the ATTR fields.  Don't be seduced by the first directive, "KERNEL=="sdb"... we all know that Linux is notorious for changing device letters on reboot.

Second, Udev rules are created as code snippets in the /etc/udev/rules.d dir.  For simplicity sake, create a file called 99-local.rules and add all machine specific rules to this one file.  Each rule is one line.  There are many sophisticated and elegant things that can be done by Udev, but my example is a simple sledgehammer:
SUBSYSTEMS=="scsi", 
ATTRS{model}=="Compact Flash   ",
ATTR{size}=="2001888",
RUN+="/bin/sh -c 'mount /media/Flash_1GB
   -o remount,noatime,nodiratime,async;
   chown doug:doug /media/Flash_1GB' "
The first directive tells the machine that we're dealing with a disk (we could have used "block".)  The second directive is an attribute that was listed for the device (notice the spaces: it has to exactly match the output from udevadm.)  The third attribute is the device size, so this rule applies just to this card, or atleast to cards with exactly this number of sectors.  The last part of the rule is the RUN command, which executes a set of bash commands.  In this case, I'm changing the default mount options, then I'm changing the mount point ownership.  Using the RUN feature provides infinite flexibility.

Tuesday, January 15, 2013

Eclipse Plugins For RedHat

You know how they say you shouldn't look at the sun during an eclipse or you'll go blind?  If there was any truth to that, why aren't there villages full of blind people in third world nations.  Why aren't there myths about the time that everyone on Earth went blind?  Think about it...  There had to be a first eclipse.  Who told the first dude not to look at it or he'd go blind?  Read on for the answer.

In the mean time, I've been beating myself up for a few days trying to get the Epic plugin for the Eclipse IDE installed.  I've got it on my Fedora desktop at home, and wanted it for a Linux machine at work, but the eclipse-epic RPM wasn't on Satellite. Simple enough: download it and install it from Epic-IDE.org... and spend the next several days wonder why it doesn't work.

The first issue is that the documentation has not been updated in several revisions, so the instructions for Epic are completely out of line with Eclipse 3.x.  Every time I would try follow a path through the point an click menus that seemed reasonable, I would get a message such as "could not find jar" or "no software site found at jar".  The obvious problem would be permissions, but all files and paths were world read-able.

The next obvious choice was to start hacking under the hood.  I looked at the Fedora eclipse-epic RPM and compared it to the Eclipse install tree.  I thought what seemed like a promising option when I found a plugins directory, but I could never get the machine to pickup the files.

Then, I tried something soooo stupid, it had to work.  I entered some arbitrary text into an unrelated field.  Of course it immediately launched exactly as expected.  The trick is to understand that which undocumented fields are required, but not checked by the application.  So, on Eclipse 3.x on RedHat Linux, to add a plugin:
Extract (unzip) the plugin
Launch Eclipse
Help / Install New Software...
Click Add... and Local...
Browse to (not into) the extracted directory
Click OK
*** In the Name field, provide some text ***
Click OK
The plugin options should appear in the pop-up window.  From this point, it should just be a case of checking boxes and accepting defaults.  Right?  Wrong!  Now we get dozens of lines of Java-esque errors, which for those of you who have ever worked with Java is line after line of completely useless garbage.  Take for instance the line:
No repository found containing: osgi.bundle,
A reasonable person might think that this and the fifty lines that follow it are telling you that there is a missing dependency.  Obviously, what that means is that the only way to install the plugin is to be root.

Remember those permission problems from earlier?  Its not that we didn't have permission to the files we just installed, its that we don't have permission to the Eclipse installation tree.  So...
Exit Eclipse
Open a terminal window and su to root
Launch Eclipse from the command line
Help / Install New Software...
Select "Available Software Sites"
Highlight the failed plugin
Click Remove and OK
Continue from "Click Add..." in the step above.

As for why the first human didn't go blind during the first eclipse?  He was too busy trying to figure out why his wheel wouldn't roll, because the instructions didn't mention that it had to be upright.

Sunday, December 30, 2012

Defeating Facial Recognition

I saw a web advertisement from Merrill Edge Investments (not to be confused with Merrill Lynch... a risk latent, greedy, delusional, Wall Street investment firm that helped firm cheat millions of people out of their retirement earnings) for a new investment tool called Face Retirement.  The concept of the ad campaign is based on the work of Daniel Goldstein, PhD, who says people fail to invest for retirement is because they "can't see themselves as old."  His research indicated that if you showed someone a picture of what they would look like at age 65, it would motivate them to spend the next 15, 25, 35 years preparing for reaching that age.

I didn't believe it when I heard him say it the first time, but I salute him for managing to bilk the Wall Street suits out of the money to implement this as a web app.  I'll tell you why its a stupid idea in a few minutes, but first the fun stuff.

Merrill Edge implemented a facial recognition system that implements an aging feature.  You are suppose to use your web cam to center your face in an oval and snap a picture of yourself.  I decided to use this to demonstrate why facial recognition is not implemented in the wild.  You see, the basic concept that most people fail to understand about facial recognition is the question: What is a face?  Consider our first example.
As is obvious, this is a face-- it just happens to be the face of a dog (and not even a real dog, at that.)  The software successfully denied access, on the grounds that the face was not properly formatted.  In the second example, the application was presented with a "more" human face.

Again the software successfully denied access.  It did specifically recommend I remove my hat. 

Option number three was also a failure, thought this is a properly formatted, obviously human face.  Unfortunately, it is not a "real" face, but a picture from an AllState Insurance brochure.  If I owned stock in a facial recognition company, I might stop here, trumpeting how well the product had discarded this obvious attempt at trickery.  But I don't own any stock.  I can't remember why not... Oh, yes, thieving Wall Street scum bags.

But I digress.  Attempt number four:

Success!  I was granted access, using a photo of face.  Why did face three fail, but face four succeed?  Notice that face number three is not looking into the camera, but face four is a full frontal view.  This allowed the software to properly align the eyes, nose, and mouth.  The senorita from the cover of the "Instant Immersion Spanish" box does not have ears, but male model did not have eyes (they are closed.)

The fun thing about facial recognition is that most can be tricked by a photograph and the few that cannot, are usually tricked by a mask.  In all production quality systems, additional safeguards (heat sensors, echo location) have to be implemented to override these simple hacks.

As for why this idea of showing someone an aged image of themselves is stupid...  Its a short run fix.  Goldstein researches a favorite subject of mine, decision theory.  He indicates that we postpone long range decisions, because we do not see them as relevant.  By demonstrating aging, he hopes to bring a sense of reality to the abstract concept of time.  This works only until the car needs new tires or the muffler replaced.  No one is going to sit on the side of the road, replacing a bald, dry-rotted, flat tire, and say to themselves:
I've got one spare tire.  If this tire blew, the others are bound to go at any minute.  I can spend $500 on four new tires, or I can put the $500 toward retirement.  Would I rather have a safe car that I can drive to work, or I can make life better for "future me".
No.  The decision is simple: Replace the tires, screw "future me".  Short run always wins over long run, because "In the long run, we are all dead."  Which raises the question as to why Merrill Edge would spend the money on such a tool?  Are they altruistically trying to change human nature?

No.  Like I said... Its a short run fix.  The purpose of the tool is to "help you make a long run decision", knowing full well that short run realities will overcome the tool's effectiveness.  But by then, they've got what long run money you had available at that short run moment.

At isn't that what really matters.  That... and using stuffed dog puppets to validate new technology.

Saturday, December 29, 2012

No Reserved Words in XML

Or so goes the mantra, but if that's true then why can't I use this syntax:
<?xml version="1.0"?>
<parse>
    <rule id="1">
      <type>pattern</type>
      <description>might be time value</description>
    </rule>
</parse>
It passes all my tests for well formed XML.  Why can't I use it?

Well, obviously, I can... so here's the story.  I've got some Perl code that I've used dozens of times before, but all of a sudden didn't work with this data.  After scoring the code and the the internet for a reason that this data structure wouldn't work, I finally changed the data to read , and everything worked fine.  This would seem to be a happy ending except for on small detail:  Its not my data.

To better understand what's going on, let me explain "doesn't work."  There is a Perl module called XML::Simple.  It combines dozens of steps into three lines:
use XML::Simple;
my $xml = new XML::Simple;
my $data = $xml->XMLin("file.xml");
These lines open the file, read the lines, parse out the tags, and assign the values into a dynamically allocated hash.  By adding a call to a Data::Dumper, we can look at the hash structure of the XML data:
$VAR1 = {
  'rule' => [
    {
      'id' => '1',
      'type' => 'pattern',
      'description' => 'might be time value',
    },
Or that's how it should break out.  Instead, it breaks out like this:
$VAR1 = {
  'rule' => [
       '1' => {
           'type' => 'pattern',
           'description' => 'might be time value',
        },
Which broke my normal subroutines.  Yet, if I change "id" to "item", everything works as expected.  So if its true that there are no reserved words in XML, why doesn't it work?

It turns out, for some bizarre, undocumented reason, the Perl XML::Simple module has decided that there are reserved words in XML.  And those words are name, key, and id.  If those words are found as tags in an XML structure, they are promoted to elements. Though CPAN does not explain why this is the case, they do provide a solution:
my $data = $xml->XMLin("file.xml",KeyAttr=>[]);
By setting the option KeyAttr to "none", the parser behaves as it should.

Monday, December 17, 2012

dracut: FATAL: initial SELinux policy load failed

Here's an obnoxious install failure: Using the RHEL/Scientific Linux 6.3 DVD, it is possible for an install to crash on first boot with an SELinux error.  The problem is a bug in the post of the target policy RPM.  The bug is immediately fixed by running yum update... assuming you can figure out how to get the machine booted.  Luckily, the error is nice enough to tell you how move forward.


Reboot, and at the GRUB menu, append "selinux=0" to boot into Permissive mode.  From the root prompt:
ls /etc/selinux/targeted/policy/policy.24
ls: cannot access /etc/selinux/targeted/policy/policy.24:
No such file or directory
If possible, issue: yum update selinux-policy-*

If the machine is not network connected, the problem can be resolved by restoring the file policy.24 from install media.  And it it were that simple, you wouldn't need me.  You will have to force install two RPMs:
rpm -ivh --force selinux-policy-3.7.19-*
rpm -ivh --force selinux-policy-targeted-3.7.19-*
The second force install will take several minutes to complete.

Regardless of how the issue is resolved, it is best to relabel the filesystem:
touch /.autorelabel


Thursday, November 22, 2012

RHEL Cluster Anti-Affinity Configuration

I'm often amused by how vendors define "High Availability", aka HA.  Customers always talk about Five Nines, but "off the shelf" HA solutions seldom achieve 99% availability.  In the case of RedHat's HA cluster service, the default configuration might provide unattended failover within 60 seconds.  Given that Five Nines only allows 25.9 seconds per month, a single failure can blow away a service level agreement.

To counter the real-world lag of application failover, an system must be load balanced across a cluster.  A real HA environment would be at least three nodes, running two instances of the service.  If one node fails, the load balancer will redirect everything to the second instance, while the service is recovered on the third node.

There is an HA problem that VMware has addressed in their HA solution, that RedHat has not, known as the anti-affinity rule.  Affinity is when two "processes" favor the same resource.  An example would be when running a web and database instance on the same machine improve performance.  In the case of redundant services, running them on the same machine is pointless, if the machine fails.  To prevent this, we need an anti-affinity rule that requires the two processes to never be on the same machine.

RedHat cluster suite provides affinity in the form of child services.  If the cluster moves the web service to another node, the database has to follow.  What they don't provide is an anti-affinity rule to prevent the load balanced services from trying to run on a single node.  As a matter of fact, by default, all services will start on the same cluster node.  (It will be the node with the lowest number.)

I found I could implement anti-affinity from with in the service,s init.d script.  First, we add an /etc/sysconfig/ file for the process, with the following variables:
CLUST_ENABLED="true"
CLUST_MYSERVICE="service:bark"
CLUST_COLLISION="service:meow service:moo"
A collision is when the presence of a service prevents this service from starting on this node.  The names should be listed exactly as they appear in clustat.  Make sure the script sources the config file:
# source sysconfig file
[ -f /etc/sysconfig/$prog ] && . /etc/sysconfig/$prog
Next, add a new subroutine to the existing init.d script:
cluster(){
  # look for other services on this host
  K=$(for J in $CLUST_COLLISION; do \
          clustat | grep "$J.*$HOSTNAME.*started" \
          >/dev/null; \
          [ $? == 0 ] && echo "$J "; \
          done)
  if [ $K ]; then
    # show service names
    echo -n "Cluster, collision $prog: $K"
    # fail, but with a success return code
    failure; echo; exit 0
  fi
  # look for this service running on other nodes
  K=$(clustat | grep "$CLUST_MYSERVICE.*started" | \
          awk '{print $2}')
  if [ $K ]; then
    # show hostname of other instance
    echo -n "Cluster, $prog exists: `echo $K | cut -d. -f1`"
    # fail but with a success return code
    failure; echo; exit 0
  fi
}
Finally, add a reference to the cluster sub in the start sub:
start(){
  if [ $(ps -C cluster-$prog.sh | grep -c $prog) == 0 ]; then
    # only check cluster status if enabled
    [ "$CLUST_ENABLED" == "true" ] && cluster
    echo -n "Starting $prog"
Here's what happens in the case of a collision:
  • rgmanager issues a start
  • the cluster sub recognizes the collision, but tells rgmanage that it started successfully (exit 0)
  • rgmanager shows the service as running
  • 30 seconds pass
  • rgmanager issues a status against the service, which fails, since the init.d script lied about the service running
  • the cluster orders a relocation of the service
  • rgmanager issues a start... on a different node
  • there is no collision this time, so the init.d runs as expected