Extract Element Names from XML Data in SQL Server 2005

Edit

Extract Element Names

One audience member asked me if there was a way to be able to determine the element names that exist within the XML stored in a table. His business problem was that he archives XML in a field of a table, and needed to identify whether one row's XML contained a different set of elements than another row's (i.e., if an untyped XML's "schema" changed between versions).

To demonstrate one possible solution, suppose that I have a table (Sample) with an identity column (ID) and an XML column (Info). Sample XML in this column might resemble:

<employee id="12">
  <name>Milton Waddams</name>
  <title>Accountant</title>
  <notes>
    <note>No record of him being employed here</note>
    <note>We fixed payroll glitch with this employee</note>
  </notes>
  <assignedAssets>
    <asset name="swingarm stapler">
      <note>Bill Lumbergh wants this reassigned</note>
    </asset>
    <asset name="calculator" />
    <asset name="telephone" />
  </assignedAssets>
</employee>

To get a raw listing of namespaces, element names, and parent element names, I might use the following:

SELECT    id,
          info.query('
             for $node in /descendant::node()[local-name() != ""] 
             return <node>
                    <namespace>{ namespace-uri($node) }</namespace>
                    <localname>{ local-name($node) }</localname>
                    <parent>{ local-name($node/..) }</parent>
                    </node>') AS nodes
FROM      sample

Note that the iterator uses an Axis to walk all nodes in the tree and a predicate to filter out Text Nodes (which will not have a local-name value).

Resulting contents:

<node>
  <namespace />
  <localname>employee</localname>
  <parent />
</node>
<node>
  <namespace />
  <localname>name</localname>
  <parent>employee</parent>
</node>
<node>
  <namespace />
  <localname>title</localname>
  <parent>employee</parent>
</node>
<node>
  <namespace />
  <localname>notes</localname>
  <parent>employee</parent>
</node>
<node>
  <namespace />
  <localname>note</localname>
  <parent>notes</parent>
</node>
<node>
  <namespace />
  <localname>note</localname>
  <parent>notes</parent>
</node>
<node>
  <namespace />
  <localname>assignedAssets</localname>
  <parent>employee</parent>
</node>
<node>
  <namespace />
  <localname>asset</localname>
  <parent>assignedAssets</parent>
</node>
<node>
  <namespace />
  <localname>note</localname>
  <parent>asset</parent>
</node>
<node>
  <namespace />
  <localname>asset</localname>
  <parent>assignedAssets</parent>
</node>
<node>
  <namespace />
  <localname>asset</localname>
  <parent>assignedAssets</parent>
</node>

By wrapping this into another query, I can shred the XML into a table structure and then combine it with the rest of the output to form one common tablespace:

SELECT  DISTINCT
        q1.id,
        T.n.value('namespace[1]', 'varchar(100)') AS Namespace,
        T.n.value('localname[1]', 'varchar(100)') AS Localname,
        T.n.value('parent[1]', 'VARCHAR(100)') AS parent
FROM    ( SELECT    id,
                    info.query('
                       for $node in /descendant::node()[local-name() != ""] 
                       return <node>
                              <namespace>{ namespace-uri($node) }</namespace>
                              <localname>{ local-name($node) }</localname>
                              <parent>{ local-name($node/..) }</parent>
                              </node>') AS nodes
          FROM      sample
        ) q1
        CROSS APPLY q1.nodes.nodes('/node') AS T ( n )

Results:


ID  Namespace  Localname       Parent
1              asset           assignedAssets
1              assignedAssets  employee
1              employee	
1              name            employee
1              note            asset
1              note            notes
1              notes           employee
1              title           employee

If there were multiple rows, each containing XML, then there would be more values in the ID column. Using traditional SQL querying, it would then be possible to compare different sets of rows to identify where elements exist in one set that do not exist in another.