I set aside three matched systems to act as nodes. A fourth system is configured as the head. The cluster will be utilizing my standalone MySQL server and file server. I thought (incorrectly) that the head node would only be used as a control point, so it is my application server. This machine runs NIS, Bind, Apache, and Squid.
Starting point was Beowulf HowTo. It's rather old and out of date, but close enough to get things started. The single biggest factor is understanding that their is no beowulf-*.rpm it's lam-*.rpm. (Get it: wolf and lamb.) I didn't need to concern myself with NFS or NIS, as I already had those running.
Build a user, gen and distribute SSH keys. Here's a trick for that one. Since the user is going to be an NIS user, we put their home directory on the NFS server, and configure the nodes to auto mount the home directory.
# ls -s /home/cluso /net/scully/home/cluso(Yes, I know I spelled Clouseau wrong.)
# chown cluso.cluso /home/cluso
Now, gen the keys, and assign the users public key to the authorized keys file.
$ ssh-keygen -t rsaSince our own key is trusted, and we get the same directory no matter where we login, our keys always allow us in.
$ cp id_rsa.pub authorized_keys
I ignored all the MPI stuff, since I'm not doing C++ parallel computing. My goal is to be able to schedule Perl scripts across several machines. The benefit of using Beowulf for this rather than a bunch of crontabs, is that Beowulf will load balance the executions. Thus, the fastest node will be asked to execute the greatest number of incarnations of the script.
For the node configuration, two items of importance. First, disable iptables! Swendson emphasizes this fact, and yes you got to do it, or waste allot of time on rules. Besides, we're in a trusted environment. Second, the lamhosts file has moved. It is now /etc/lam/lamhosts.
Initialize the cluster with:
lamboot -v lamhostsRead the output. It took several tries to get all the nodes to respond. Since we are not doing MPI I could not use mpirun. Instead, the command is:
$ lamexec hostname
baltar.terran.lan
c17.terran.lan
c18.terran.lan
c19.terran.lan
Yeah! Oh wait... It ran on the head node also. Not what I wanted.
$ lamexec n1-3 hostnameMuch better. One other small problem is an annoying message about MPI. Let's get rid of that.
c17.terran.lan
c18.terran.lan
c19.terran.lan
$ lamexec n1-3 hostname 2> /dev/null
Now for the real fun. Lets execute 10 runs, without concern for the target nodes.
]$ lamexec -np 10 hostname 2> /dev/null
baltar.terran.lan
c17.terran.lan
c18.terran.lan
c19.terran.lan
baltar.terran.lan
c17.terran.lan
c18.terran.lan
baltar.terran.lan
c19.terran.lan
c17.terran.lan
since we can specify nodes as either n1-3 or n1,3, we have significant control over the processing capacity.
Man, that was easy.