A View Inside My Head

Jason's Random Thoughts of Interest

NAVIGATION - SEARCH

SQL Server 2008: Spatial Data, Part 8

 

In this, the eighth part in a series on the new Spatial Data types in SQL Server 2008, I'll step away from the database and do a little spatial coding using .NET.

Redistributable .NET Library

Up to this point in the series, I have demonstrated a lot of interesting (?) things that you can do with the new Spatial data types (Geometry and Geography) in SQL Server 2008.  You might be thinking, "That's swell, and all, but I wish I could do some of that stuff without needing to be tethered to a database."  Well, you know what?  You can!

I mentioned in a previous post that the Spatial data types were implemented as SQLCLR User-Defined Types.  I've since been corrected by Isaac Kunen, who stated that they are more accurately described as System-Defined Types, with the difference being that these are automatically installed and available for use as part of SQL Server 2008, regardless of whether the ENABLE CLR bit has been activated.  Semantics aside, these types are merely classes within a .NET assembly, and Microsoft is making this freely available as part of a Feature Pack for SQL Server (which will be redistributable as part of your stand-alone application, according to Isaac):

SQL Server 2008 RC0 Feature Pack Download Page

(Look for "Microsoft SQL Server System CLR Types," which includes the two Spatial types plus the HierarchyID type.  This link is for RC0, and may not be applicable to future versions as the product is finalized and released.)

Builder API

A new feature that was included with the first Release Candidate (RC0) is the Builder API.  This is a collection of interfaces and classes that helps you to construct spatial types by specifying one point at a time until all points have been added.

The Builder API is not only useful for creating new instances of spatial data, but also for consuming existing instances one point at a time (maybe to convert an instance into another format).  Documentation is light at the moment, so I'm still trying to grok exactly how to best utilize it.

For my first experiment with the API, I obtained some Zip Code Boundary data in ASCII format from the U.S. Census Bureau:

http://www.census.gov/geo/www/cob/z52000.html#ascii

My goal was to parse the data, and then create a new SqlGeography instance for each zip code.  (Note: SqlGeography is the .NET class name that T-SQL refers to simply as Geography).  The SqlGeographyBuilder class proved to be perfect for accomplishing this task.

At its core, the SqlGeographyBuilder implements the IGeographySink interface.  If you wanted to consume an existing SqlGeography instance, you could implement IGeographySink in your own class, and then invoke the SqlGeography's Populate() instance method, passing in your object as the parameter.  The Populate() method takes care of calling the appropriate IGeographySink methods within your class.

In this case, I'm not starting with an existing SqlGeography instance, so my code will need to call the methods of the SqlGeographyBuilder in the correct order:

IGeographySink

After EndGeography() has been invoked, the new instance is available via the ConstructedGeography property of the SqlGeographyBuilder class. 

Simple enough, right?  Yeah, I'm still a little lost myself...  But, here's some code to help demonstrate what's going on!

First, let's look at the ASCII data.  A single zip code's boundary might be defined as:

      1469      -0.824662148292608E+02       0.413848583827499E+02
      -0.824602851767940E+02       0.413864290595145E+02
      -0.824610630000000E+02       0.413860590000000E+02
      -0.824685900000000E+02       0.413841470000000E+02
      -0.824686034536111E+02       0.413843846804627E+02
      -0.824605990000000E+02       0.413863160000000E+02
      -0.824602851767940E+02       0.413864290595145E+02
END

 

The very first line happens to contain an identifier (maps to a second file that lists the actual USPS zip code).  The coordinate listed in the first line is not actually part of the boundary, but rather appears to be the population center of that area.  The actual boundary begins with the second line, and continues until you encounter the "END".  Also, in case you couldn't tell, coordinates in this data are in Longitude-Latitude order.

Since a Zip Code is a polygon, and since we are working with SqlGeography, we must be aware of ring ordering.  That is, the exterior ring of a polygon must be defined in a counter-clockwise order so that as you "walk the ring", the interior is always to your left.  If you reverse the order, then SqlGeography assumes that you're trying to define a polygon containing the entire world except for the small area inside of the polygon.

Well, in this case, the order of the points of the Zip Code boundary is defined in clockwise order... so, we must be aware of this and call into the SqlGeographyBuilder in the opposite order (so the last point defined in the ASCII data is the first point used while building our new instance). 

To accomplish this, I simply parse the Lat/Long coordinates as "double" types, and then push them onto a stack.  Then, I pop the stack and call into the Builder API with each point.  At the end, I obtain the new SqlGeography instance from the ConstructedGeography property. 

(Note: This is demonstrative code - some things should probably be cleaned up/refactored/error handled... You have been warned)

  public SqlGeography ParseAsGeography(string zipcode_points)
{
    StringReader sr = new StringReader(zipcode_points);
    string line = sr.ReadLine();

    Stack<double[]> Points = new Stack<double[]>();

    while (line != null  && line != "END")
    {
        if (line != String.Empty)
        {
            Points.Push(ParseLatLngValues(line));
        }

        line = sr.ReadLine();
    }

    return CreateGeography(Points);
}

  private
  double[] ParseLatLngValues(string line)
{
    //      -0.838170700000000E+02       0.409367390000000E+02double[] ret = newdouble[2];

    string lng = System.Text.RegularExpressions.Regex
.Matches(line, "\\S+")[0].Value; string lat = System.Text.RegularExpressions.Regex
.Matches(line, "\\S+")[1].Value; double.TryParse(lat, out ret[0]); double.TryParse(lng, out ret[1]); return ret; }

  private SqlGeography CreateGeography(Stack<double[]> points)
{
    SqlGeographyBuilder builder = new SqlGeographyBuilder();
    builder.SetSrid(4326);
    builder.BeginGeography(OpenGisGeographyType.Polygon);

    double[] point = points.Pop();

    builder.BeginFigure(point[0], point[1]);

    while (points.Count > 0)
    {
        point = points.Pop();
        builder.AddLine(point[0], point[1]);
    }

    builder.EndFigure();
    builder.EndGeography();

    return builder.ConstructedGeography;
}

SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

SQL Server 2008: Spatial Data, Part 5

SQL Server 2008: Spatial Data, Part 6

SQL Server 2008: Spatial Data, Part 7

kick it on DotNetKicks.com

Coding in SQL Server: An Evolution

Tuesday at the NWNUG meeting, Steven Smith spoke on various ways to squeeze performance out of your ASP.NET applications.  This was a fantastic talk, and gave me plenty to think about (since ASP.NET is not my forte, I only consider myself to have an intermediate skillset on this topic).

One suggestion that he made involved caching database writes.  That is, instead of immediately writing logging-type information to the database for every request, which is a relatively expensive operation considering the small payload size, that you could accumulate them in a short-term cache, and then perform the write operation periodically.  Fewer database calls = faster performance.

In his example, he spoke of his advertisement server that might serve many impressions per second, but he doesn't want each impression to incur an expensive database write.  So, he keeps track of the activity locally, and then persists to the database every 5 seconds using a single database call containing multiple data points.

The code that Steve demonstrated utilized XML to contain the data within a single block of text (read: can be passed in as a single parameter to a stored procedure):

<ROOT>
<Activity customerId="ALFKI" viewCount="5" />
<Activity customerId="ANATR" viewCount="7" />
</ROOT>

 

Now, consuming XML from T-SQL is an area that I know very well, so I cringed a little bit when Steve showed the actual stored procedure code itself:

CREATE PROCEDURE dbo.BulkLogCustomerViews @@doc text -- XML Doc... AS DECLARE @idoc int -- Create an internal representation (virtual table) of the XML document... EXEC sp_xml_preparedocument @idoc OUTPUT, @@doc -- Perform UPDATES UPDATE TopCustomerLog SET TopCustomerLog.ViewCount = TopCustomerLog.ViewCount + ox2.viewCount FROM OPENXML (@idoc, '/ROOT/Activity',1) WITH ( [customerId] NCHAR(5) , viewCount int ) ox2 WHERE TopCustomerLog.[customerId] = ox2.[customerId] -- Perform INSERTS INSERT INTO TopCustomerLog ( CustomerID , ViewCount ) SELECT [customerId] , viewCount FROM OPENXML (@idoc, '/ROOT/Activity',1) WITH ( customerId NCHAR(5) , viewCount int ) ox WHERE NOT EXISTS (SELECT customerId FROM TopCustomerLog WHERE TopCustomerLog.customerId = ox.customerId) -- Remove the 'virtual table' now... EXEC sp_xml_removedocument @idoc

Now, to Steve's credit, this code works just fine, and can probably be used as-is on all versions of SQL Server from 7.0 through 2008.  But, since we really don't write ASP applications consisting entirely of Response.Write any longer, I'd like to see Steve update his demo to use more modern techniques on the database as well.  ;-)

The first thing that he could do is update the procedure to utilize the XML data type that was first introduced in SQL Server 2005.  This would simplify the code a little bit, and would get rid of the dependency on the COM-based MSXML.dll, which the sp_xml_preparedocument and OPENXML() uses.

CREATE PROCEDURE dbo.BulkLogCustomerViews @doc xml AS -- Perform UPDATES UPDATE TopCustomerLog SET TopCustomerLog.ViewCount = TopCustomerLog.ViewCount + ox2.viewCount FROM ( SELECT T.activity.value('@customerId', 'nchar(5)') as CustomerID, T.activity.value('@viewCount', 'int') viewCount FROM @doc.nodes('/ROOT/Activity') as T(activity) ) ox2 WHERE TopCustomerLog.[customerId] = ox2.[customerId] -- Perform INSERTS INSERT INTO TopCustomerLog ( CustomerID , ViewCount ) SELECT [customerId] , viewCount FROM ( SELECT T.activity.value('@customerId', 'nchar(5)') as CustomerID, T.activity.value('@viewCount', 'int') viewCount FROM @doc.nodes('/ROOT/Activity') as T(activity) ) ox WHERE NOT EXISTS ( SELECT customerId FROM TopCustomerLog WHERE TopCustomerLog.customerId = ox.customerId )

 

Note that the XML data type in SQL Server doesn't need to be a well-formed document.  In this case, Steve could just pass in series of "Activity" elements (no "ROOT" element would be required by SQL Server, so he would also be able to simplify the .NET code that actually creates the XML string):

<Activity customerId="ALFKI" viewCount="5" />
<Activity customerId="ANATR" viewCount="7" />

 

Consequently, the XPath (XQuery, actually) within the nodes() method of the stored procedure code would need to change as well:

@doc.nodes('Activity') as T(activity)


But, we can kick this up a notch and use some SQL Server 2008 features as well. First, there's new "Upsert" capabilities (MERGE statement) that tries to simplify what Steve does with the UPDATE followed by INSERT:

CREATE PROCEDURE dbo.BulkLogCustomerViews @doc xml AS MERGE TopCustomerLog AS target USING (SELECT T.activity.value('@customerId', 'nchar(5)') as CustomerID, T.activity.value('@viewCount', 'int') as viewCount FROM @doc.nodes('Activity') as T(activity)) AS source ON (target.CustomerID = source.CustomerID) WHEN MATCHED THEN UPDATE SET target.ViewCount = target.ViewCount + source.viewCount WHEN NOT MATCHED THEN INSERT (CustomerID, ViewCount) VALUES (source.CustomerID, source.viewCount);

 

One more thing that could be done to further simplify this T-SQL is to use a Table-valued Parameter instead of the XML.  This would allow Steve to pass a fully populated table of data into the stored procedure and consume it directly by the MERGE statement.

The first step is to create a T-SQL type that defines the table structure of the parameter (this is a one-time operation, unless the table structure changes):

CREATE TYPE CustomerViewType AS TABLE ( CustomerID nchar(5) NOTNULL, ViewCount intNOTNULL );

 

Now, a parameter can be defined of this type, and used just like any other table-value variable:

ALTER PROCEDURE dbo.BulkLogCustomerViews @views CustomerViewType READONLY AS MERGE TopCustomerLog AS target USING @views AS source ON (target.CustomerID = source.CustomerID) WHEN MATCHED THEN UPDATE SET target.ViewCount = target.ViewCount + source.viewCount WHEN NOT MATCHED THEN INSERT (CustomerID, ViewCount) VALUES (source.CustomerID, source.viewCount);

 

On the ADO.NET side, the table-valued parameter could be represented as a DataTable object (other options also exist), and can be assigned directly as the value of the stored procedure's parameter object:

// Create a data table, and provide its structure DataTable customerViews = new DataTable(); customerViews.Columns.Add("CustomerID", typeof(string)); customerViews.Columns.Add("ViewCount", typeof(int)); // Fill with rows customerViews.Rows.Add("ALFKI", 5); customerViews.Rows.Add("ANATR", 7); using (SqlConnection conn = new SqlConnection("...")) { SqlCommand cmd = conn.CreateCommand(); cmd.CommandType = System.Data.CommandType.StoredProcedure; cmd.CommandText = "dbo.BulkLogCustomerViews"; SqlParameter param = cmd.Parameters.AddWithValue("@views", customerViews); conn.Open(); cmd.ExecuteNonQuery(); }

 

SQL Server 2008 RC0 Install: Sql2005SsmsExpressFacet

 

This morning's goal was to quickly install SQL Server 2008 RC0, and then move on with some project work.  Let's just say that my project work should resume by this afternoon...

In the interest of disk space, I removed an existing installation of SQL Server 2005 Developer Edition.  And then the installation of 2008 RC0 began by installing the "Microsoft.NET Framework 3.5 SP1 (Beta)"...  which is probably "install-smell" for me needing to pave my machine when the product finally RTM's.  But, I digress...

The installation went pretty smoothly until it came time for the "System Configuration Check" that takes place after you select everything that you would like to install, but before the files actually get installed.  In my case, this check failed because "The SQL Server 2005 Express Tools are installed.  To continue, remove the SQL Server 2005 Express Tools."  (This is the "Sql2005SsmsExpressFacet" rule of the installation)

Thank you, Microsoft, for that succinct failure message that includes instructions for resolution...  Except, I didn't have the SQL Server 2005 Express Tools installed.  They didn't show up in my Programs list, in the Start menu, or on my C: drive at all.  How am I to uninstall something that isn't installed?  Hrmmm....

After about an hour's search around my hard drive, I finally went into the registry, and discovered the following key:

HKLM\Software\Microsoft\Microsoft SQL Server\90\Tools\ShellSEM

Note: Jan Sotola reports that the affected 64-bit version key is:

HKLM\Software\Wow6432Node\Microsoft\...
   ...\Microsoft SQL Server\90\Tools\ShellSEM

Contained within was some registry information belonging to Red Gate SQL Prompt.  Apparently, despite my removing of the SQL 2005 Express Tools some time ago, this registry key was not removed because the Red Gate information was still there.

On a hunch, I renamed the key to "ShellSEM.old", and the SQL Server 2008 installation carried on.

UPDATE: Shortly after posting this, Theo Spears from Red Gate sent the following email:

"I apologise for this issue; the SQL Prompt team here has been working to address it. You and your readers may be interested to hear that we now have a version which works with SQL Server 2008 RC0, and no longer blocks the installation. To get a copy send us an email at support@red-gate.com"

 

I should clarify that my little rant above was not targeted at Red Gate, but I'm so happy to hear that they are proactively working to resolve this little issue.  I would have just liked for Microsoft to use more than a single registry key as evidence of a conflicting product installation, that's all.