Saturday, December 29, 2012

No Reserved Words in XML

Or so goes the mantra, but if that's true then why can't I use this syntax:
<?xml version="1.0"?>
    <rule id="1">
      <description>might be time value</description>
It passes all my tests for well formed XML.  Why can't I use it?

Well, obviously, I can... so here's the story.  I've got some Perl code that I've used dozens of times before, but all of a sudden didn't work with this data.  After scoring the code and the the internet for a reason that this data structure wouldn't work, I finally changed the data to read , and everything worked fine.  This would seem to be a happy ending except for on small detail:  Its not my data.

To better understand what's going on, let me explain "doesn't work."  There is a Perl module called XML::Simple.  It combines dozens of steps into three lines:
use XML::Simple;
my $xml = new XML::Simple;
my $data = $xml->XMLin("file.xml");
These lines open the file, read the lines, parse out the tags, and assign the values into a dynamically allocated hash.  By adding a call to a Data::Dumper, we can look at the hash structure of the XML data:
$VAR1 = {
  'rule' => [
      'id' => '1',
      'type' => 'pattern',
      'description' => 'might be time value',
Or that's how it should break out.  Instead, it breaks out like this:
$VAR1 = {
  'rule' => [
       '1' => {
           'type' => 'pattern',
           'description' => 'might be time value',
Which broke my normal subroutines.  Yet, if I change "id" to "item", everything works as expected.  So if its true that there are no reserved words in XML, why doesn't it work?

It turns out, for some bizarre, undocumented reason, the Perl XML::Simple module has decided that there are reserved words in XML.  And those words are name, key, and id.  If those words are found as tags in an XML structure, they are promoted to elements. Though CPAN does not explain why this is the case, they do provide a solution:
my $data = $xml->XMLin("file.xml",KeyAttr=>[]);
By setting the option KeyAttr to "none", the parser behaves as it should.

No comments:

Post a Comment