Jason Follas.com

Page History: Extract Element Names from XML Data in SQL Server 2005


Compare Page Revisions



« Older Revision - Back to Page History - Newer Revision »


Page Revision: 2007/08/16 11:57


Extract Element Names

One audience member asked me if there was a way to be able to determine the element names that exist within the XML stored in a table. His business problem was that he archives XML in a field of a table, and needed to identify whether one row's XML contained a different set of elements than another row's (i.e., if an untyped XML's "schema" changed between versions).

To demonstrate one possible solution, suppose that I have a table (Sample) with an identity column (ID) and an XML column (Info). Sample XML in this column might resemble:

<employee id="12">
  <name>Milton Waddams</name>
  <title>Accountant</title>
  <notes>
    <note>No record of him being employed here</note>
    <note>We fixed payroll glitch with this employee</note>
  </notes>
  <assignedAssets>
    <asset name="swingarm stapler">
      <note>Bill Lumbergh wants this reassigned</note>
    </asset>
    <asset name="calculator" />
    <asset name="telephone" />
  </assignedAssets>
</employee>

To get a raw listing of namespaces, element names, and parent element names, I might use the following:

SELECT    id,
          info.query('
             for $node in /descendant::node()[local-name() != ""] 
             return <node>
                    <namespace>{ namespace-uri($node) }</namespace>
                    <localname>{ local-name($node) }</localname>
                    <parent>{ local-name($node/..) }</parent>
                    </node>') AS nodes
FROM      sample

Resulting contents:

<node>
  <namespace />
  <localname>employee</localname>
  <parent />
</node>
<node>
  <namespace />
  <localname>name</localname>
  <parent>employee</parent>
</node>
<node>
  <namespace />
  <localname>title</localname>
  <parent>employee</parent>
</node>
<node>
  <namespace />
  <localname>notes</localname>
  <parent>employee</parent>
</node>
<node>
  <namespace />
  <localname>note</localname>
  <parent>notes</parent>
</node>
<node>
  <namespace />
  <localname>note</localname>
  <parent>notes</parent>
</node>
<node>
  <namespace />
  <localname>assignedAssets</localname>
  <parent>employee</parent>
</node>
<node>
  <namespace />
  <localname>asset</localname>
  <parent>assignedAssets</parent>
</node>
<node>
  <namespace />
  <localname>note</localname>
  <parent>asset</parent>
</node>
<node>
  <namespace />
  <localname>asset</localname>
  <parent>assignedAssets</parent>
</node>
<node>
  <namespace />
  <localname>asset</localname>
  <parent>assignedAssets</parent>
</node>

By wrapping this into another query, I can shred the XML into a table structure and then combine it with the rest of the output to form one common tablespace:

SELECT  DISTINCT
        q1.id,
        T.n.value('namespace[1]', 'varchar(100)') AS Namespace,
        T.n.value('localname[1]', 'varchar(100)') AS Localname,
        T.n.value('parent[1]', 'VARCHAR(100)') AS parent
FROM    ( SELECT    id,
                    info.query('
                       for $node in /descendant::node()[local-name() != ""] 
                       return <node>
                              <namespace>{ namespace-uri($node) }</namespace>
                              <localname>{ local-name($node) }</localname>
                              <parent>{ local-name($node/..) }</parent>
                              </node>') AS nodes
          FROM      sample
        ) q1
        CROSS APPLY q1.nodes.nodes('/node') AS T ( n )

Results:


ID	Namespace	Localname	Parent
1			asset		assignedAssets
1			assignedAssets	employee
1			employee	
1			name		employee
1			note		asset
1			note		notes
1			notes		employee
1			title		employee

ScrewTurn Wiki version 2.0.13. Some of the icons created by FamFamFam.