A View Inside My Head

Renewed Jul 01

Thanks again, Microsoft!

SQL Server 2008: Spatial Data, Part 8 Jun 23

 

In this, the eighth part in a series on the new Spatial Data types in SQL Server 2008, I'll step away from the database and do a little spatial coding using .NET.

Redistributable .NET Library

Up to this point in the series, I have demonstrated a lot of interesting (?) things that you can do with the new Spatial data types (Geometry and Geography) in SQL Server 2008.  You might be thinking, "That's swell, and all, but I wish I could do some of that stuff without needing to be tethered to a database."  Well, you know what?  You can!

I mentioned in a previous post that the Spatial data types were implemented as SQLCLR User-Defined Types.  I've since been corrected by Isaac Kunen, who stated that they are more accurately described as System-Defined Types, with the difference being that these are automatically installed and available for use as part of SQL Server 2008, regardless of whether the ENABLE CLR bit has been activated.  Semantics aside, these types are merely classes within a .NET assembly, and Microsoft is making this freely available as part of a Feature Pack for SQL Server (which will be redistributable as part of your stand-alone application, according to Isaac):

SQL Server 2008 RC0 Feature Pack Download Page

(Look for "Microsoft SQL Server System CLR Types," which includes the two Spatial types plus the HierarchyID type.  This link is for RC0, and may not be applicable to future versions as the product is finalized and released.)

Builder API

A new feature that was included with the first Release Candidate (RC0) is the Builder API.  This is a collection of interfaces and classes that helps you to construct spatial types by specifying one point at a time until all points have been added.

The Builder API is not only useful for creating new instances of spatial data, but also for consuming existing instances one point at a time (maybe to convert an instance into another format).  Documentation is light at the moment, so I'm still trying to grok exactly how to best utilize it.

For my first experiment with the API, I obtained some Zip Code Boundary data in ASCII format from the U.S. Census Bureau:

http://www.census.gov/geo/www/cob/z52000.html#ascii

My goal was to parse the data, and then create a new SqlGeography instance for each zip code.  (Note: SqlGeography is the .NET class name that T-SQL refers to simply as Geography).  The SqlGeographyBuilder class proved to be perfect for accomplishing this task.

At its core, the SqlGeographyBuilder implements the IGeographySink interface.  If you wanted to consume an existing SqlGeography instance, you could implement IGeographySink in your own class, and then invoke the SqlGeography's Populate() instance method, passing in your object as the parameter.  The Populate() method takes care of calling the appropriate IGeographySink methods within your class.

In this case, I'm not starting with an existing SqlGeography instance, so my code will need to call the methods of the SqlGeographyBuilder in the correct order:

IGeographySink

After EndGeography() has been invoked, the new instance is available via the ConstructedGeography property of the SqlGeographyBuilder class. 

Simple enough, right?  Yeah, I'm still a little lost myself...  But, here's some code to help demonstrate what's going on!

First, let's look at the ASCII data.  A single zip code's boundary might be defined as:

      1469      -0.824662148292608E+02       0.413848583827499E+02
      -0.824602851767940E+02       0.413864290595145E+02
      -0.824610630000000E+02       0.413860590000000E+02
      -0.824685900000000E+02       0.413841470000000E+02
      -0.824686034536111E+02       0.413843846804627E+02
      -0.824605990000000E+02       0.413863160000000E+02
      -0.824602851767940E+02       0.413864290595145E+02
END

 

The very first line happens to contain an identifier (maps to a second file that lists the actual USPS zip code).  The coordinate listed in the first line is not actually part of the boundary, but rather appears to be the population center of that area.  The actual boundary begins with the second line, and continues until you encounter the "END".  Also, in case you couldn't tell, coordinates in this data are in Longitude-Latitude order.

Since a Zip Code is a polygon, and since we are working with SqlGeography, we must be aware of ring ordering.  That is, the exterior ring of a polygon must be defined in a counter-clockwise order so that as you "walk the ring", the interior is always to your left.  If you reverse the order, then SqlGeography assumes that you're trying to define a polygon containing the entire world except for the small area inside of the polygon.

Well, in this case, the order of the points of the Zip Code boundary is defined in clockwise order... so, we must be aware of this and call into the SqlGeographyBuilder in the opposite order (so the last point defined in the ASCII data is the first point used while building our new instance). 

To accomplish this, I simply parse the Lat/Long coordinates as "double" types, and then push them onto a stack.  Then, I pop the stack and call into the Builder API with each point.  At the end, I obtain the new SqlGeography instance from the ConstructedGeography property. 

(Note: This is demonstrative code - some things should probably be cleaned up/refactored/error handled... You have been warned)

public SqlGeography ParseAsGeography(string zipcode_points)
{
    StringReader sr = new StringReader(zipcode_points);
    string line = sr.ReadLine();

    Stack<double[]> Points = new Stack<double[]>();

    while (line != null  && line != "END")
    {
        if (line != String.Empty)
        {
            Points.Push(ParseLatLngValues(line));
        }

        line = sr.ReadLine();
    }

    return CreateGeography(Points);
}

private double[] ParseLatLngValues(string line)
{
    //      -0.838170700000000E+02       0.409367390000000E+02

    double[] ret = new double[2];

    string lng = System.Text.RegularExpressions.Regex
.Matches(line, "\\S+")[0].Value; string lat = System.Text.RegularExpressions.Regex
.Matches(line, "\\S+")[1].Value; double.TryParse(lat, out ret[0]); double.TryParse(lng, out ret[1]); return ret; }

private SqlGeography CreateGeography(Stack<double[]> points)
{
    SqlGeographyBuilder builder = new SqlGeographyBuilder();
    builder.SetSrid(4326);
    builder.BeginGeography(OpenGisGeographyType.Polygon);

    double[] point = points.Pop();

    builder.BeginFigure(point[0], point[1]);

    while (points.Count > 0)
    {
        point = points.Pop();
        builder.AddLine(point[0], point[1]);
    }

    builder.EndFigure();
    builder.EndGeography();

    return builder.ConstructedGeography;
}

SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

SQL Server 2008: Spatial Data, Part 5

SQL Server 2008: Spatial Data, Part 6

SQL Server 2008: Spatial Data, Part 7

kick it on DotNetKicks.com

Coding in SQL Server: An Evolution Jun 19

Tuesday at the NWNUG meeting, Steven Smith spoke on various ways to squeeze performance out of your ASP.NET applications.  This was a fantastic talk, and gave me plenty to think about (since ASP.NET is not my forte, I only consider myself to have an intermediate skillset on this topic).

One suggestion that he made involved caching database writes.  That is, instead of immediately writing logging-type information to the database for every request, which is a relatively expensive operation considering the small payload size, that you could accumulate them in a short-term cache, and then perform the write operation periodically.  Fewer database calls = faster performance.

In his example, he spoke of his advertisement server that might serve many impressions per second, but he doesn't want each impression to incur an expensive database write.  So, he keeps track of the activity locally, and then persists to the database every 5 seconds using a single database call containing multiple data points.

The code that Steve demonstrated utilized XML to contain the data within a single block of text (read: can be passed in as a single parameter to a stored procedure):

<ROOT>
   <Activity customerId="ALFKI" viewCount="5" />
   <Activity customerId="ANATR" viewCount="7" />
</ROOT>

 

Now, consuming XML from T-SQL is an area that I know very well, so I cringed a little bit when Steve showed the actual stored procedure code itself:

CREATE PROCEDURE dbo.BulkLogCustomerViews
  @@doc text -- XML Doc...
AS

DECLARE @idoc   int

-- Create an internal representation (virtual table) of 
the XML document...
EXEC sp_xml_preparedocument @idoc OUTPUT, @@doc -- Perform UPDATES UPDATE TopCustomerLog SET TopCustomerLog.ViewCount = TopCustomerLog.ViewCount
+ ox2.viewCount FROM OPENXML (@idoc, '/ROOT/Activity',1) WITH ( [customerId] NCHAR(5) , viewCount int ) ox2 WHERE TopCustomerLog.[customerId] = ox2.[customerId] -- Perform INSERTS INSERT INTO TopCustomerLog ( CustomerID , ViewCount ) SELECT [customerId] , viewCount FROM OPENXML (@idoc, '/ROOT/Activity',1) WITH ( customerId NCHAR(5) , viewCount int ) ox WHERE NOT EXISTS (SELECT customerId FROM TopCustomerLog WHERE TopCustomerLog.customerId = ox.customerId) -- Remove the 'virtual table' now... EXEC sp_xml_removedocument @idoc

 

Now, to Steve's credit, this code works just fine, and can probably be used as-is on all versions of SQL Server from 7.0 through 2008.  But, since we really don't write ASP applications consisting entirely of Response.Write any longer, I'd like to see Steve update his demo to use more modern techniques on the database as well.  ;-)

The first thing that he could do is update the procedure to utilize the XML data type that was first introduced in SQL Server 2005.  This would simplify the code a little bit, and would get rid of the dependency on the COM-based MSXML.dll, which the sp_xml_preparedocument and OPENXML() uses.

CREATE PROCEDURE dbo.BulkLogCustomerViews
  @doc xml
AS

-- Perform UPDATES 
UPDATE TopCustomerLog SET TopCustomerLog.ViewCount = TopCustomerLog.ViewCount
+ ox2.viewCount FROM ( SELECT T.activity.value('@customerId', 'nchar(5)')
as CustomerID, T.activity.value('@viewCount', 'int') viewCount FROM @doc.nodes('/ROOT/Activity') as T(activity) ) ox2 WHERE TopCustomerLog.[customerId] = ox2.[customerId] -- Perform INSERTS INSERT INTO TopCustomerLog ( CustomerID , ViewCount ) SELECT [customerId] , viewCount FROM ( SELECT T.activity.value('@customerId', 'nchar(5)')
as CustomerID, T.activity.value('@viewCount', 'int') viewCount FROM @doc.nodes('/ROOT/Activity') as T(activity) ) ox WHERE NOT EXISTS ( SELECT customerId FROM TopCustomerLog WHERE TopCustomerLog.customerId = ox.customerId )

 


Note that the XML data type in SQL Server doesn't need to be a well-formed document.  In this case, Steve could just pass in series of "Activity" elements (no "ROOT" element would be required by SQL Server, so he would also be able to simplify the .NET code that actually creates the XML string):

<Activity customerId="ALFKI" viewCount="5" />
<Activity customerId="ANATR" viewCount="7" />

 

Consequently, the XPath (XQuery, actually) within the nodes() method of the stored procedure code would need to change as well:

@doc.nodes('Activity') as T(activity)


But, we can kick this up a notch and use some SQL Server 2008 features as well. First, there's new "Upsert" capabilities (MERGE statement) that tries to simplify what Steve does with the UPDATE followed by INSERT:

CREATE PROCEDURE dbo.BulkLogCustomerViews
  @doc xml
AS

MERGE TopCustomerLog AS target
USING (SELECT T.activity.value('@customerId', 'nchar(5)') 
              as CustomerID, 
              T.activity.value('@viewCount', 'int') as viewCount
       FROM   @doc.nodes('Activity') as T(activity)) AS source 
ON    (target.CustomerID = source.CustomerID)
WHEN  MATCHED 
      THEN UPDATE SET target.ViewCount = target.ViewCount + source.viewCount
WHEN  NOT MATCHED
      THEN INSERT (CustomerID, ViewCount)
      VALUES (source.CustomerID, source.viewCount);

 

One more thing that could be done to further simplify this T-SQL is to use a Table-valued Parameter instead of the XML.  This would allow Steve to pass a fully populated table of data into the stored procedure and consume it directly by the MERGE statement.

The first step is to create a T-SQL type that defines the table structure of the parameter (this is a one-time operation, unless the table structure changes):

CREATE TYPE CustomerViewType AS TABLE 
( 
    CustomerID nchar(5) NOT NULL,
    ViewCount int NOT NULL
);

 

Now, a parameter can be defined of this type, and used just like any other table-value variable:

ALTER PROCEDURE dbo.BulkLogCustomerViews
  @views CustomerViewType READONLY
AS

MERGE TopCustomerLog AS target
USING @views AS source 
ON    (target.CustomerID = source.CustomerID)
WHEN  MATCHED 
      THEN UPDATE SET target.ViewCount = target.ViewCount 
+ source.viewCount WHEN NOT MATCHED THEN INSERT (CustomerID, ViewCount) VALUES (source.CustomerID, source.viewCount);

 

On the ADO.NET side, the table-valued parameter could be represented as a DataTable object (other options also exist), and can be assigned directly as the value of the stored procedure's parameter object:

// Create a data table, and provide its structure
DataTable customerViews = new DataTable();
customerViews.Columns.Add("CustomerID", typeof(string));
customerViews.Columns.Add("ViewCount", typeof(int));

// Fill with rows
customerViews.Rows.Add("ALFKI", 5);
customerViews.Rows.Add("ANATR", 7);

using (SqlConnection conn = new SqlConnection("..."))
{
  SqlCommand cmd = conn.CreateCommand();
  cmd.CommandType = System.Data.CommandType.StoredProcedure;
  cmd.CommandText = "dbo.BulkLogCustomerViews";

  SqlParameter param 
= cmd.Parameters.AddWithValue("@views", customerViews); conn.Open(); cmd.ExecuteNonQuery(); }

SQL Server 2008 RC0 Install: Sql2005SsmsExpressFacet Jun 19

This morning's goal was to quickly install SQL Server 2008 RC0, and then move on with some project work.  Let's just say that my project work should resume by this afternoon...

In the interest of disk space, I removed an existing installation of SQL Server 2005 Developer Edition.  And then the installation of 2008 RC0 began by installing the "Microsoft.NET Framework 3.5 SP1 (Beta)"...  which is probably "install-smell" for me needing to pave my machine when the product finally RTM's.  But, I digress...

The installation went pretty smoothly until it came time for the "System Configuration Check" that takes place after you select everything that you would like to install, but before the files actually get installed.  In my case, this check failed because "The SQL Server 2005 Express Tools are installed.  To continue, remove the SQL Server 2005 Express Tools."  (This is the "Sql2005SsmsExpressFacet" rule of the installation)

Thank you, Microsoft, for that succinct failure message that includes instructions for resolution...  Except, I didn't have the SQL Server 2005 Express Tools installed.  They didn't show up in my Programs list, in the Start menu, or on my C: drive at all.  How am I to uninstall something that isn't installed?  Hrmmm....

After about an hour's search around my hard drive, I finally went into the registry, and discovered the following key:

HKLM\Software\Microsoft\Microsoft SQL Server\90\Tools\ShellSEM

Note: Jan Sotola reports that the affected 64-bit version key is:
HKLM\Software\Wow6432Node\Microsoft\...
   ...\Microsoft SQL Server\90\Tools\ShellSEM

Contained within was some registry information belonging to Red Gate SQL Prompt.  Apparently, despite my removing of the SQL 2005 Express Tools some time ago, this registry key was not removed because the Red Gate information was still there.

On a hunch, I renamed the key to "ShellSEM.old", and the SQL Server 2008 installation carried on.

UPDATE: Shortly after posting this, Theo Spears from Red Gate sent the following email:

 

"I apologise for this issue; the SQL Prompt team here has been working to address it. You and your readers may be interested to hear that we now have a version which works with SQL Server 2008 RC0, and no longer blocks the installation. To get a copy send us an email at support@red-gate.com"

I should clarify that my little rant above was not targeted at Red Gate, but I'm so happy to hear that they are proactively working to resolve this little issue.  I would have just liked for Microsoft to use more than a single registry key as evidence of a conflicting product installation, that's all.

Software Development Meme Jun 08

This meme is brought to you by Michael Eaton, who just so happens to be a huge fan of my spatial talks.  ;-)

How old were you when you started programming?
Well, we might need to better define "programming" here. 

I started tinkering with computers when I was about 6 or 7 (is that right, Dad?).  In those days, it was more about reading a book about BASIC and typing in the program to run.  Then I would make changes to the program to see the results.  If modifying an existing program is considered "programming", then this is my epoch.

I really didn't start creating my own programs from scratch until I was a little older, maybe 8 or 9.

How did you get started in programming?
My father would give me and my siblings time on the computer (TRS-80s), which I spent going through books about BASIC.  I wanted to make the machine do something rather than just use the machine to do something, like play a game.  (Don't get me wrong, though: I spent plenty of time playing games, too)

What was your first language?
BASIC

What was the first real program you wrote?
I don't remember the small ones.  But an early one that sticks out in my mind was a "catch the falling egg in a basket" game, where a chicken would move randomly across the top of the screen, drop an egg, which you then had to catch in a basket at the bottom of the screen that you controlled using a joystick.  I think that was on my Commodore 64, so I had to be maybe 10 years old (side note: it's amazing how hard it is to pinpoint when these events take place as the years pile on).

What languages have you used since you started programming?
BASIC, Visual Basic, C/C++. C#, SQL, Java, JavaScript, LUA

What was your first professional programming gig?
I was a PC Technicial (and, later correctly changed to Programmer/Analyst) in the IT department at the BAX Global Hub facility.  There I helped maintain a variety of home-grown systems, and created some of my own (like the Automated Load Planning System that was used to assign where freight should be placed on aircraft for optimal center of gravity).

If you knew then what you know now, would you have started programming?
I don't see why not.  I really haven't been jaded in my career, or anything, and it still holds my interest.

If there is one thing you learned along the way that you would tell new developers, what would it be?
Choose a technology that you are passionate about and become an expert in it... and then evangelize that technology to share your passion.  There is so much stuff out there that nobody can be an expert in everything.  Developers that are only able to do a little bit of everything are a commodity resource that can be replaced by cheaper labor at any time.  However, if you have exceptional expertise in one or more select areas, then you become that much more valuable to somebody.

What's the most fun you've ever had ... programming?
I don't know that the actual task of programming has ever been "fun" for me, in the same sense that riding a roller coaster would be.  But, I've done some quirky things while programming just to have fun and be different.  Like implement a Z-Machine emulator in SQLCLR (as my submission to a programming competition) so that I can play Zork using only stored procedure calls...  Or use data from the WoW Armory site to demonstrate how to use SQL Server's XML functionality...  Or write Project Euler solutions in LUA and execute them in-game using WoW.

 

SQL Server 2008: Spatial Data, Part 7 May 16

The Open Geospatial Consortium's Simple Features specification, which SQL Server 2008's Geometry data type is based upon, defines standards for working with spatial data using a flat-earth (projected planar) model.  Ironically, these standards don't exactly cover the intricacies of using an ellipsoidal model, which is needed to "accurately" represent the world that we live in.  In other words, the OGC standards define how to work with paper maps of the world, but not globes.

Fortunately, the SQL Server team recognized that that the Geometry type is inadequate for a lot of scenarios, and implemented a second data type just for representing geospatial data using a true ellipsoidal model: Geography.

In this, the seventh part of a series about the SQL Server Spatial Data Type, I'll examine some of the key differences between the Geometry and the Geography type that developers should be aware of.

 

Latitude and Longitude


Locations on a flat model are defined in terms of X and Y.  There exists some point known as the Origin where X and Y are both zero.  From there, it is defined that values of X will increase (or decrease, in the case of negatve numbers) if you move horizontally away from the Origin.  Likewise, the values of Y will increase if you move vertically away from the Origin. 

By convention, both X and Y will grow to infinity, so flat models do not "wrap around" and start approaching the Origin again if you go too far in one direction.  Usually, a coordinate system will be based on some underlying representation of the real-world, so coordinates that are beyond the defined boundaries of that map are logically undefined.

By contrast, though, an ellipsoidal model does wrap around.  If you started at a point in the middle and kept traveling in a straight line to the right, you will eventually return to that starting point. 

So, it turns out not to be very practical to define points on a ball using X and Y.  Instead, points are defined using angles.  Longitude is the horizontal angle (how far East or West from a Prime Meridian) and ranges from -180 degrees to 180 degrees (with -180 and 180 being the same).  Latitude is the vertical angle (how far North or South from the Equator) and ranges from -90 degrees to 90 degrees (with -90 representing the South Pole and 90 representing the North Pole).

latlon

In terms of the Geography data type, just be aware that there is no X and Y.  Instead, you work with Long and Lat.

Note: All of the SQL Server 2008 CTPs to date that include Spatial support, including the most recent February 2008 version, use Lat-Long ordering within WKT.  This was a design decision based on the fact that the OGC standard did not already define parameter ordering for angular coordinates.  Starting with the first Release Candidate, however, these parameters will be swapped to use Long-Lat ordering.  Doing so will align SQL Server's spatial support with other platforms that have already implemented Long-Lat ordering.  Note also that that this is aligned with the concepts of X and Y, which by convention lists the X value first.

 

Straight Lines


The shortest distance between two points is a straight line.  But, a straight line on a flat-earth model is far different than a straight line on an ellipsoidal model.  To demonstrate, consider the shortest path from Redmond, WA, USA to Cambridge, England, UK:

s7_1 
On this planar projection, it certainly looks like the shortest path.  Even when examined on a 3D model, it looks correct:

s7_2
But, if the camera is moved towards the North Pole, then the error becomes apparent:

s7_3

In the ellipsoidal model, the shortest path between the points is not the red line, which roughly parallels the lines of Latitude, but rather the black arrow!  Converted back to a planar projection, this actual shortest path appears curved:

s7_4

(in this view, the black curve was [hastily] plotted by hand)


Instance Methods


The following slide shows in all of the instance methods that have been implemented for the Geography type as of the February 2008 CTP.  For comparison, instance methods from the Geometry type that do not exist in the Geography type are shown in gray.

GeographyMethods

It may be impossible to define some of these equivalent methods for Geography, simply because the rules are different.  For instance, if you define a set of points that make up a Polygon, what is considered to be the interior and exterior of that shape?  Since the world coordinates wrap around in an ellipsoidal model, you might be intending to represent a shape whose interior is the entire world except for the small portion.  There is simply no way to convey your intent using the methods as described by the OGC standard.

In an attempt to prevent this particular scenario, the SQL Server team has imposed a limit on the size of a Geography in the February 2008 CTP: you cannot define a Geography that is larger than a hemisphere.  

There may very well be logical solutions for working around some of the issues that prevented the SQL team from implementing all of Geometry's methods in the Geography type.  However, in this case, Microsoft appears to be waiting for the OGC to define certain rules as part of a standard rather than coming up with their own assumptions, which could be invalidated later by the standards group going in a different direction.

More on the Geography type later!

SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

SQL Server 2008: Spatial Data, Part 5

SQL Server 2008: Spatial Data, Part 6

(next part) SQL Server 2008: Spatial Data, Part 8

kick it on DotNetKicks.com

So Long, Dustin! May 16

Today is Dustin Campbell's last day as a resident of Ohio.  As announced last month, he has left DevExpress and joined the Blue Monster where he'll surely be doing great things as a Program Manager on the Visual Studio team. 

So, tomorrow morning, his family will embark on a one-way trip to Seattle and take up residence there.  I cannot imagine the whirlwind of emotion that they must be experiencing at the moment!  And, I can't imagine that his wife is doing all of this without ever having been to Seattle!

Over the past couple of years, Dustin has become one of my best friends, so seeing him leave is certainly bittersweet.  I was fortunate enough to spend a little time last night with him, his Trophy Wife, his absolutely adorable little baby, and his dog (who, funny enough, was hung-over the entire time since they did a trial run of a doggie sedative yesterday).  These are great people, and they'll surely be missed.

So, with mixed emotions, I say: So Long, Dustin!  Thanks for all of the Scotch!  (and I'll see you in two weeks in Orlando!)

Project Euler Comes to Azeroth Apr 24

It seems that a lot of my friends are doing Project Euler (according to my High School math teacher, this is pronounced "Oiler").  For example, Bill Wagner has been posting C# solutions, Darrell Hawley has ventured into the Python realm, and Dustin Campbell has been working on F# versions.

I love numbers, and spent a good portion of one summer playing with primes and number fields just for fun (since then, I've discovered WoW, and that takes up all of my time that would otherwise be spent exercising my brain).  Project Euler is actually right up my alley, and while in Seattle, I joked with Dustin that I should post solutions using LUA, and use World of Warcraft as my testbed... 

Problem 1 is finished.  :-D

WoWScrnShot_042408_205532

My no-frills WoW add-on is a simple ACE2 mod that includes AceConsole (for printing to the chat window in the lower left of the screenshot).  I won't bore you with the framework code, but will list my solutions as individual functions on my wiki (where it can grow without polluting my blog's RSS feed). 

As a taste, here's the Problem 1 solution written in LUA.  OnEnable() is my add-on's entry point, and it simply calls into the function Problem001(limit).

function ProjectEuler:OnEnable()
    ProjectEuler:Problem001(1000)
end
function ProjectEuler:Problem001(upperLimit)
self:Print("Sum of all numbers less than " .. upperLimit .. " that are divisible by 3 or 5") local f = function(factor) local n = math.ceil(upperLimit / factor) return n * (n-1) * factor / 2 end local result = f(3) + f(5) - f(15) self:Print(result) end

SQL Server 2008: Spatial Data, Part 6 Apr 11

 

In the Part 4 and Part 5 of the series, I demonstrated some instance methods of the Geometry type that returned a new Geometry based on existing instances.  In this part, I will concentrate on instance methods and properties of the Geometry type that return scalar values and Points.

STArea, STLength


Typically, your spatial data will represent something from the real world.  A LineString may be the collection of points gathered from a GPS device, and together they may represent the path that you took from your home to the office.  A Polygon may be the collection of points around the boundary of governmental territory, like a county or a parish within your state.

In both of these cases, the time will come when you will want to know the length of the LineString (or length of the perimeter of the Polygon) and the area within the Polygon.  OGC standard method STArea() returns a float indicating the area of the instance in square units (or 0, if the instance is not a Polygon and does not have area).  STLength() returns a float indicating the length of the instance in units (or 0, if the instance is a Point and does not have length). 

 

DECLARE @g GEOMETRY = 'POLYGON((10 10, 10 40, 40 40, 10 10))'
SELECT @g.STArea(), @g.STLength()

Results:

Area       Length
450 102.426406871193

 Spatial_6_1

 

STCentroid


Thinking back to Mr. Bollenbacher's 10th Grade Geometry class, we had to use a compass and straight edge to construct lines bisecting the angles of polygons (primarily triangles).  The point where the angle bisectors met was the exact center, or centroid, of the shape.  Centroids are important because any line that passes through a centroid will divide the Polygon into two parts of equal area.  It should be noted that a Centroid may not actually be on the surface of a Polygon.

The OGC standard method STCentroid() returns a Point indicating the centroid of the shape.  If the instance is not a Polygon (or MultiPolygon), then NULL will be returned.

DECLARE @g GEOMETRY = 'POLYGON((10 10, 10 40, 40 40, 10 10))'
SELECT @g.STCentroid().ToString()

Results:

POINT (20 30)

Spatial_6_2 
Note: SpatialViewer displays an individual point as an X.

STWithin, STContains


Two OGC standard methods returns a 1 or 0 indicating whether all of the points of one instance exist entirely inside of another instance.  STWithin() tests whether the base instance is inside of the parameter instance, while STContains() tests whether the parameter instance is inside of the base instance.

DECLARE @g geometry = 'POLYGON ((10 10, 13 30, 30 30, 30 15, 
10 10))'
DECLARE @h geometry = 'LINESTRING (16 16, 16 24, 25 18)'
SELECT @g.STContains(@h), @g.STWithin(@h)
SELECT @h.STContains(@g), @h.STWithin(@g)
Results: 1 0

0 1

Spatial_6_3

 

ST is {something}?


There are a number of OGC standard methods to check whether a given instance meets certain specifications: STIsClosed, STIsEmpty, STIsRing, STIsSimple, STIsValid

CLOSED: An instance is considered to be closed if the start point is the same as the end point.  By definition, a Polygon has to be closed, and a Point is not closed.  That really only leaves LineString.  For a collection of objects to be considered closed, all of its members must be closed.

EMPTY: A Geometry can be initialized in a special way as to not contain any points.  In SQL terms, this is sort of like having a NULL value, except it really is an instantiated object.  For example, LINESTRING EMPTY is a valid LineString, but it has no points.  Another humorous example is POINT EMPTY, which initializes to a Point without a Point....  so it's kind of Pointless, right?  (thank you, I'm here all week, tip your waitress).

RING: An instance is considered to be a ring if it is both Closed and Simple.

SIMPLE: An instance is considered to be simple if it does not cross over itself or otherwise touch itself.  For example, a LineString forming the letter 'S' is simple because it never comes in contact with itself.  But, a LineString that forms a Figure-Eight (8) is not simple because it would have to cross over itself.  Likewise, two circles (MultiPolygon) stacked on top of each other to form a Figure-Eight would not be simple because they touch each other.

VALID: A Geometry can cross over itself, but it cannot legally trace over itself.  That is, picture a LineString that backtracks over itself at some point, kind of like how I write my letter "P".  This is not considered to be Valid.

Spatial_6_4

Tip: SQL Server will allow an invalid Geometry to be instantiated, and Microsoft has provided an extension method called MakeValid() that will convert the invalid instance into a valid instance.  In the letter "P" example, instead of the vertical line going down and then back up (as I draw it by hand), the valid form will eliminate the duplication of points simply by start at the bottom and going up (so that the LineString never traces over itself).  If it's not possible to simplify a shape in this way so that there is only one continuous path, then it will be broken up into multiple valid shapes (i.e., a MultiLineString, etc).

 

STX, STY, Z, M


Individual coordinates of a Point can be accessed via the OGC Standard properties STX and STY.  Three-dimensional Points also have a Z coordinate, which can be accessed via Microsoft's extended Z property.  Likewise, four-dimensional Points have a M (for Measure) coordinate, which can be accessed via Microsoft's extended M property.  If Z or M is not defined for a given point, then NULL will be returned.

DECLARE @g geometry = 'POINT(1 2)'
DECLARE @h geometry = 'POINT(1 2 3 4)'
SELECT @g.STX, @g.STY, @g.Z, @g.M
SELECT @h.STX, @h.STY, @h.Z, @h.M

Results:

1    2    NULL    NULL
1    2    3       4

 

STPointOnSurface


When working with spatial data, especially without using a viewer, it can be kind of difficult to pick an arbitrary point that is inside of a Polygon (or on a LineString).  Thankfully, the OGC standard method STPointOnSurface() does just that.  Given a Geometry instance, it will return a somewhat random point that is guaranteed to be located within the interior of that instance.

DECLARE @g geometry = 'POLYGON((10 10, 14 15, 50 12, 45 30, 
10 30, 10 10))'
SELECT @g.STPointOnSurface().ToString() Results: POINT (23 25)

Spatial_6_5

 

STSrid


All of my examples to this point have used the default Spatial Reference ID of 0 (for the Geometry type) simply because I have not been specifying one.  The SRID is the mechanism that defines one geometry as being based on a different set of parameters than a geometry with a different SRID. 

For example, you may have a set of shapes defined where each unit represents one meter, while another set of shapes is based on a reference system where each unit represents 1.5 inches.  It's totally legal to mix these shapes together the same column of a table in your database, provided that you assign a different SRID to each.  SQL Server does not need to know what units represent, because it will never permit the interaction of a shape from one SRID with a shape from another SRID. 

The OGC standard property STSrid will get (or set) the SRID of the Geometry instance.

-- @g will have the default SRID = 0
DECLARE @g GEOMETRY = 'POLYGON((10 10, 10 40, 40 40, 10 10))' -- @h is defined with SRID = 123
DECLARE @h GEOMETRY = GEOMETRY::STGeomFromText('POLYGON((10 10, 40 10, 40 40, 10 10))', 123) select @g.STUnion(@h).ToString() -- Returns NULL because of different SRIDs. But, let's change

-- @g to use SRID = 123
SET @g.STSrid = 123 select @g.STUnion(@h).ToString() -- Returns POLYGON ((10 10, 40 10, 40 40, 10 40, 10 10))

 

Jason, What's Next?


Enough of this flat Earth stuff!  In the next part, I'll explore the Geography data type.  This is where things really start to get interesting.  Stay tuned!

SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

SQL Server 2008: Spatial Data, Part 5

(next part) SQL Server 2008: Spatial Data, Part 7

SQL Server 2008: Spatial Data, Part 8

kick it on DotNetKicks.com

Using PIVOT and RANK Together Apr 10

A friend of mine (name withheld, I didn't actually ask if I could blog this... ;-) asked for advice to what appears to be a simple problem until you try to implement it.  Consider the following somewhat normalized table:

AccountNum

Name

Email

0851774002 

John Doe  

jd@foo.com

0851774003   

John Doe   

jd@foo.com

0851774001   

John Doe   

jd@foo.com

0851774100   

John Doe   

jd@foo.com

0851693000   

Bob Public   

bob@bar.com

1138299000   

Jane Doe   

JaneD@baz.com

1353452000   

Jane Doe   

JaneD@baz.com

1028030000   

Jane Doe 

JaneD@baz.com

0851636000   

Jane Doe   

JaneD@baz.com


What he wanted was to collapse the data to one row per person, with a column for each Account Number.  That is, he needed to pivot the table.

When you pivot a table, unique values in the source column that you pivot on become new columns in the resulting table.  So, in this case, it would not make sense to pivot on the AccountNum column, because the result would be a new column named [0851774002], another one named [0851774003], etc.

Instead, an intermediate step needed to be performed that introduced a value that could be pivoted on.  This value needed to be consistent across the individual people (so that the first record for everybody contained the same value in this new column, the second record for everybody contained the same value, etc).

SQL Server 2005 introduced Ranking functions that provide the ability to rank a record within a partition.  In this case, we can use RANK() to assign a unique number for each record, and partition by the person's name (so that the RANK will reset for each person).  By prefixing some text to the rank number, we end up with something like:

SELECT    Name,
          Email,
          AccountNum,
          'AccountNum' +
          CAST(
RANK() OVER ( PARTITION BY Name, Email ORDER BY AccountNum )
AS VARCHAR(10)) R FROM myTable Results: Name Email AccountNum R =========== ============== ========== ============ Bob Public bob@bar.com 0851693000 AccountNum1 Jane Doe JaneD@baz.com 0851636000 AccountNum1 Jane Doe JaneD@baz.com 1028030000 AccountNum2 Jane Doe JaneD@baz.com 1138299000 AccountNum3 Jane Doe JaneD@baz.com 1353452000 AccountNum4 John Doe jd@foo.com 0851774001 AccountNum1 John Doe jd@foo.com 0851774002 AccountNum2 John Doe jd@foo.com 0851774003 AccountNum3 John Doe jd@foo.com 0851774100 AccountNum4
 

The new column (R) is the concatenation of the literal string "AccountNum" and the string representation of the number that the RANK function returned.  But the bigger point is that now this column can be used for pivoting, and result in a series of new columns called [AccountNum1], [AccountNum2], [AccountNum3], etc.

Pivoting in SQL Server 2005 requires explicit declaration of values as a column list.  In this case, we can't just say "Pivot on the R column", but rather must say "Pivot on the R column, and make new columns only for these specific values".  This restriction is a little bit of a downside because we need knowledge of the values in the column.  Or, in this case, we need to know how many possible Account Numbers a person could possibly have so that we create enough columns in the result.

The entire solution is as follows:

SELECT  *
FROM    ( SELECT    Name,
                    Email,
                    AccountNum,
                    'AccountNum'
                    + CAST(RANK() 
          OVER ( PARTITION BY Name, Email 
                 ORDER BY AccountNum ) AS VARCHAR(10)) R
          FROM      myTable
        ) AS rankedSource 
PIVOT 
( 
    MAX(AccountNum) 
    FOR R IN 
    ( [AccountNum1], [AccountNum2], [AccountNum3],
      [AccountNum4], [AccountNum5], [AccountNum6],
      [AccountNum7], [AccountNum8], [AccountNum9],
      [AccountNum10] ) 
) AS pivottable

 

And the results (showing only two of the AccountNum columns, even though there are actually 10)

Name       Email         AccountNum1  AccountNum2
========== ============= =========== =========== Bob Public bob@bar.com 0851693000 NULL Jane Doe JaneD@baz.com 0851636000 1028030000 John Doe jd@foo.com 0851774001 0851774002