- No, Beowulf is not "dead technology"
- No, Hadoop is not the perfect tool for every job
To set up the absolute simplest PVM, we need two nodes, with an NFS share, and a user account. The user needs an SSH key pair distributed to all nodes such that the user can login to any machine, from any machine. Each node's hostname must be able to resolve via DNS or /etc/hosts. Each node's hostname and address must be statically configured, and cannot be "localhost".
The first step is to install the base package from an EPEL repo. (I'm using Scientific Linux 6.) The package is delivered as source and must be compiled with a minimal set of options:
yum install -y pvm --nogpgcheckThis shows us where the RPM installed the source. The issue with this incarnation is that it is still configured for RSH rather than SSH:
rpm -ql pvm | grep -m1 pvm3
/usr/share/pvm3
export PVM_ROOT=/usr/share/pvm3/Unfortunately, there are still hard-coded references to RSH in some of the binary libraries, so we spoof the references with a symlink:
cd $PVM_ROOT
find . -type f -exec sed -i "s~bin/rsh~bin/ssh~g" {} \;
make; make install
ln -s /usr/bin/ssh /usr/bin/rshRepeat these steps on all (both) nodes.
On only one of the nodes (it doesn't matter which one) validate that PVM is not running, configure the PVM_ROOT variable, and start the first instance as the non-root user:
ps -ef | awk '!/awk/ && /pvm/'Notice that the PVM deamon launched and remained resident. Individual commands can be piped to PVM, or an interactive console can be used. From the same node, remotely configure the next node:
echo "export PVM_ROOT=/usr/share/pvm3" >> ~/.bashrc
echo id | pvm
pvm> id
t40001
Console: exit handler called
pvmd still running.
ps -ef | awk '!/awk/ && /pvm/'
compute <snip> /usr/share/pvm3/lib/LINUXX86_64/pvmd3
ssh pvm2 'echo "export PVM_ROOT=/usr/share/pvm3" \The last line is the very, very, important. From the first node, remotely start the second node:
>> ~/.bashrc'
# should not prompt for a password
ssh pvm2 'echo $PVM_ROOT'
/usr/share/pvm3
ssh pvm2 'rm -f /tmp/pvm*'
pvmIn this sequence, we have accessed the console on pvm1 to view the clusters configuration (conf). Next, we started the second node. It is now displayed in the cluster's conf.
pvmd already running.
pvm> conf
conf
1 host, 1 data format
HOST DTID ARCH SPEED DSIG
pvm1 40000 LINUXX86_64 1000 0x00408c41
pvm> add pvm2
add pvm2
1 successful
HOST DTID
pvm2 80000
pvm> conf
conf
2 hosts, 1 data format
HOST DTID ARCH SPEED DSIG
pvm1 40000 LINUXX86_64 1000 0x00408c41
pvm2 80000 LINUXX86_64 1000 0x00408c41
Just for fun, let's throw it the simplest of compute jobs:
pvm> spawn -4 -> /bin/hostnameThere are a few things to notice about the output:
4 successful
t8000b
t8000c
t4000c
t4000d
pvm>
[3:t4000d] pvm1
[3:t4000c] pvm1
[3:t4000c] EOF
[3:t4000d] EOF
[3:t8000b] pvm2
[3:t8000b] EOF
[3:t8000c] pvm2
[3:t8000c] EOF
[3] finished
- The command asked the cluster to spawn the command "/bin/hostname" four times.
- The "->" option indicates we wanted the output returned to the console, which is completely abnormal... we only do this for testing.
- The prompt returned before the output. The assumption is that our compute jobs will take an extended period of time.
- The responses were not displayed correctly. They were displayed as they returned, because all this magic is happening asynchronously.
- Each job's responses, from each node, could be grep'ed from the output using a unique serial number, automatically assigned to the job.
echo halt | pvmFinally, remember this one last thing: The cluster is a peer-to-peer grid. Any node can manage any other, any node can schedule jobs, and any node can issue a halt.