A View Inside My Head

Jason's Random Thoughts of Interest


SQL Server 2008: Spatial Data, Part 7

The Open Geospatial Consortium's Simple Features specification, which SQL Server 2008's Geometry data type is based upon, defines standards for working with spatial data using a flat-earth (projected planar) model.  Ironically, these standards don't exactly cover the intricacies of using an ellipsoidal model, which is needed to "accurately" represent the world that we live in.  In other words, the OGC standards define how to work with paper maps of the world, but not globes.

Fortunately, the SQL Server team recognized that that the Geometry type is inadequate for a lot of scenarios, and implemented a second data type just for representing geospatial data using a true ellipsoidal model: Geography.

In this, the seventh part of a series about the SQL Server Spatial Data Type, I'll examine some of the key differences between the Geometry and the Geography type that developers should be aware of.


Latitude and Longitude

Locations on a flat model are defined in terms of X and Y.  There exists some point known as the Origin where X and Y are both zero.  From there, it is defined that values of X will increase (or decrease, in the case of negatve numbers) if you move horizontally away from the Origin.  Likewise, the values of Y will increase if you move vertically away from the Origin. 

By convention, both X and Y will grow to infinity, so flat models do not "wrap around" and start approaching the Origin again if you go too far in one direction.  Usually, a coordinate system will be based on some underlying representation of the real-world, so coordinates that are beyond the defined boundaries of that map are logically undefined.

By contrast, though, an ellipsoidal model does wrap around.  If you started at a point in the middle and kept traveling in a straight line to the right, you will eventually return to that starting point. 

So, it turns out not to be very practical to define points on a ball using X and Y.  Instead, points are defined using angles.  Longitude is the horizontal angle (how far East or West from a Prime Meridian) and ranges from -180 degrees to 180 degrees (with -180 and 180 being the same).  Latitude is the vertical angle (how far North or South from the Equator) and ranges from -90 degrees to 90 degrees (with -90 representing the South Pole and 90 representing the North Pole).


In terms of the Geography data type, just be aware that there is no X and Y.  Instead, you work with Long and Lat.

Note: All of the SQL Server 2008 CTPs to date that include Spatial support, including the most recent February 2008 version, use Lat-Long ordering within WKT.  This was a design decision based on the fact that the OGC standard did not already define parameter ordering for angular coordinates.  Starting with the first Release Candidate, however, these parameters will be swapped to use Long-Lat ordering.  Doing so will align SQL Server's spatial support with other platforms that have already implemented Long-Lat ordering.  Note also that that this is aligned with the concepts of X and Y, which by convention lists the X value first.


Straight Lines

The shortest distance between two points is a straight line.  But, a straight line on a flat-earth model is far different than a straight line on an ellipsoidal model.  To demonstrate, consider the shortest path from Redmond, WA, USA to Cambridge, England, UK:

On this planar projection, it certainly looks like the shortest path.  Even when examined on a 3D model, it looks correct:

But, if the camera is moved towards the North Pole, then the error becomes apparent:


In the ellipsoidal model, the shortest path between the points is not the red line, which roughly parallels the lines of Latitude, but rather the black arrow!  Converted back to a planar projection, this actual shortest path appears curved:


(in this view, the black curve was [hastily] plotted by hand)

Instance Methods

The following slide shows in all of the instance methods that have been implemented for the Geography type as of the February 2008 CTP.  For comparison, instance methods from the Geometry type that do not exist in the Geography type are shown in gray.


It may be impossible to define some of these equivalent methods for Geography, simply because the rules are different.  For instance, if you define a set of points that make up a Polygon, what is considered to be the interior and exterior of that shape?  Since the world coordinates wrap around in an ellipsoidal model, you might be intending to represent a shape whose interior is the entire world except for the small portion.  There is simply no way to convey your intent using the methods as described by the OGC standard.

In an attempt to prevent this particular scenario, the SQL Server team has imposed a limit on the size of a Geography in the February 2008 CTP: you cannot define a Geography that is larger than a hemisphere.  

There may very well be logical solutions for working around some of the issues that prevented the SQL team from implementing all of Geometry's methods in the Geography type.  However, in this case, Microsoft appears to be waiting for the OGC to define certain rules as part of a standard rather than coming up with their own assumptions, which could be invalidated later by the standards group going in a different direction.

More on the Geography type later!

SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

SQL Server 2008: Spatial Data, Part 5

SQL Server 2008: Spatial Data, Part 6

(next part) SQL Server 2008: Spatial Data, Part 8

kick it on DotNetKicks.com

Project Euler Comes to Azeroth

It seems that a lot of my friends are doing Project Euler (according to my High School math teacher, this is pronounced "Oiler").  For example, Bill Wagner has been posting C# solutions, Darrell Hawley has ventured into the Python realm, and Dustin Campbell has been working on F# versions.

I love numbers, and spent a good portion of one summer playing with primes and number fields just for fun (since then, I've discovered WoW, and that takes up all of my time that would otherwise be spent exercising my brain).  Project Euler is actually right up my alley, and while in Seattle, I joked with Dustin that I should post solutions using LUA, and use World of Warcraft as my testbed... 

Problem 1 is finished.  :-D


My no-frills WoW add-on is a simple ACE2 mod that includes AceConsole (for printing to the chat window in the lower left of the screenshot).  I won't bore you with the framework code, but will list my solutions as individual functions on my wiki (where it can grow without polluting my blog's RSS feed). 

As a taste, here's the Problem 1 solution written in LUA.  OnEnable() is my add-on's entry point, and it simply calls into the function Problem001(limit).

function ProjectEuler:OnEnable()

function ProjectEuler:Problem001(upperLimit)
   self:Print("Sum of all numbers less than " .. upperLimit .. " that are divisible by 3 or 5")

   local f = function(factor)
      local n = math.ceil(upperLimit / factor)
      return n * (n-1) * factor / 2

   local result = f(3) + f(5) - f(15)



SQL Server 2008: Spatial Data, Part 6

In the Part 4 and Part 5 of the series, I demonstrated some instance methods of the Geometry type that returned a new Geometry based on existing instances.  In this part, I will concentrate on instance methods and properties of the Geometry type that return scalar values and Points.

STArea, STLength

Typically, your spatial data will represent something from the real world.  A LineString may be the collection of points gathered from a GPS device, and together they may represent the path that you took from your home to the office.  A Polygon may be the collection of points around the boundary of governmental territory, like a county or a parish within your state.

In both of these cases, the time will come when you will want to know the length of the LineString (or length of the perimeter of the Polygon) and the area within the Polygon.  OGC standard method STArea() returns a float indicating the area of the instance in square units (or 0, if the instance is not a Polygon and does not have area).  STLength() returns a float indicating the length of the instance in units (or 0, if the instance is a Point and does not have length). 


DECLARE @g GEOMETRY = 'POLYGON((10 10, 10 40, 40 40, 10 10))'
@g.STArea(), @g.STLength() Results: Area Length
450 102.426406871193




Thinking back to Mr. Bollenbacher's 10th Grade Geometry class, we had to use a compass and straight edge to construct lines bisecting the angles of polygons (primarily triangles).  The point where the angle bisectors met was the exact center, or centroid, of the shape.  Centroids are important because any line that passes through a centroid will divide the Polygon into two parts of equal area.  It should be noted that a Centroid may not actually be on the surface of a Polygon.

The OGC standard method STCentroid() returns a Point indicating the centroid of the shape.  If the instance is not a Polygon (or MultiPolygon), then NULL will be returned.

DECLARE @g GEOMETRY = 'POLYGON((10 10, 10 40, 40 40, 10 10))'
@g.STCentroid().ToString() Results: POINT (20 30)

Note: SpatialViewer displays an individual point as an X.


STWithin, STContains

Two OGC standard methods returns a 1 or 0 indicating whether all of the points of one instance exist entirely inside of another instance.  STWithin() tests whether the base instance is inside of the parameter instance, while STContains() tests whether the parameter instance is inside of the base instance.

DECLARE @g geometry = 'POLYGON ((10 10, 13 30, 30 30, 30 15, 10 10))'
DECLARE @h geometry = 'LINESTRING (16 16, 16 24, 25 18)'
SELECT @g.STContains(@h), @g.STWithin(@h)
SELECT @h.STContains(@g), @h.STWithin(@g)
Results: 1 0

0 1



ST is {something}?

There are a number of OGC standard methods to check whether a given instance meets certain specifications: STIsClosed, STIsEmpty, STIsRing, STIsSimple, STIsValid

CLOSED: An instance is considered to be closed if the start point is the same as the end point.  By definition, a Polygon has to be closed, and a Point is not closed.  That really only leaves LineString.  For a collection of objects to be considered closed, all of its members must be closed.

EMPTY: A Geometry can be initialized in a special way as to not contain any points.  In SQL terms, this is sort of like having a NULL value, except it really is an instantiated object.  For example, LINESTRING EMPTY is a valid LineString, but it has no points.  Another humorous example is POINT EMPTY, which initializes to a Point without a Point....  so it's kind of Pointless, right?  (thank you, I'm here all week, tip your waitress).

RING: An instance is considered to be a ring if it is both Closed and Simple.

SIMPLE: An instance is considered to be simple if it does not cross over itself or otherwise touch itself.  For example, a LineString forming the letter 'S' is simple because it never comes in contact with itself.  But, a LineString that forms a Figure-Eight (8) is not simple because it would have to cross over itself.  Likewise, two circles (MultiPolygon) stacked on top of each other to form a Figure-Eight would not be simple because they touch each other.

VALID: A Geometry can cross over itself, but it cannot legally trace over itself.  That is, picture a LineString that backtracks over itself at some point, kind of like how I write my letter "P".  This is not considered to be Valid.


Tip: SQL Server will allow an invalid Geometry to be instantiated, and Microsoft has provided an extension method called MakeValid() that will convert the invalid instance into a valid instance.  In the letter "P" example, instead of the vertical line going down and then back up (as I draw it by hand), the valid form will eliminate the duplication of points simply by start at the bottom and going up (so that the LineString never traces over itself).  If it's not possible to simplify a shape in this way so that there is only one continuous path, then it will be broken up into multiple valid shapes (i.e., a MultiLineString, etc).



Individual coordinates of a Point can be accessed via the OGC Standard properties STX and STY.  Three-dimensional Points also have a Z coordinate, which can be accessed via Microsoft's extended Z property.  Likewise, four-dimensional Points have a M (for Measure) coordinate, which can be accessed via Microsoft's extended M property.  If Z or M is not defined for a given point, then NULL will be returned.

DECLARE @g geometry = 'POINT(1 2)'
@h geometry = 'POINT(1 2 3 4)'
@g.STX, @g.STY, @g.Z, @g.M SELECT @h.STX, @h.STY, @h.Z, @h.M Results: 1 2 NULL NULL 1 2 3 4



When working with spatial data, especially without using a viewer, it can be kind of difficult to pick an arbitrary point that is inside of a Polygon (or on a LineString).  Thankfully, the OGC standard method STPointOnSurface() does just that.  Given a Geometry instance, it will return a somewhat random point that is guaranteed to be located within the interior of that instance.

DECLARE @g geometry = 'POLYGON((10 10, 14 15, 50 12, 45 30, 10 30, 10 10))'
SELECT @g.STPointOnSurface().ToString() Results: POINT (23 25)




All of my examples to this point have used the default Spatial Reference ID of 0 (for the Geometry type) simply because I have not been specifying one.  The SRID is the mechanism that defines one geometry as being based on a different set of parameters than a geometry with a different SRID. 

For example, you may have a set of shapes defined where each unit represents one meter, while another set of shapes is based on a reference system where each unit represents 1.5 inches.  It's totally legal to mix these shapes together the same column of a table in your database, provided that you assign a different SRID to each.  SQL Server does not need to know what units represent, because it will never permit the interaction of a shape from one SRID with a shape from another SRID. 

The OGC standard property STSrid will get (or set) the SRID of the Geometry instance.

-- @g will have the default SRID = 0
DECLARE @g GEOMETRY = 'POLYGON((10 10, 10 40, 40 40, 10 10))'

-- @h is defined with SRID = 123
DECLARE @h GEOMETRY = GEOMETRY::STGeomFromText('POLYGON((10 10, 40 10, 40 40, 10 10))', 123) select @g.STUnion(@h).ToString() -- Returns NULL because of different SRIDs. But, let's change
-- @g to use SRID = 123

@g.STSrid = 123 SELECT @g.STUnion(@h).ToString() -- Returns POLYGON ((10 10, 40 10, 40 40, 10 40, 10 10))


Jason, What's Next?

Enough of this flat Earth stuff!  In the next part, I'll explore the Geography data type.  This is where things really start to get interesting.  Stay tuned!

SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

SQL Server 2008: Spatial Data, Part 5

(next part) SQL Server 2008: Spatial Data, Part 7

SQL Server 2008: Spatial Data, Part 8

kick it on DotNetKicks.com

Using PIVOT and RANK Together

A friend of mine (name withheld, I didn't actually ask if I could blog this... ;-) asked for advice to what appears to be a simple problem until you try to implement it.  Consider the following somewhat normalized table:





John Doe  



John Doe   



John Doe   



John Doe   



Bob Public   



Jane Doe   



Jane Doe   



Jane Doe 



Jane Doe   


What he wanted was to collapse the data to one row per person, with a column for each Account Number.  That is, he needed to pivot the table.

When you pivot a table, unique values in the source column that you pivot on become new columns in the resulting table.  So, in this case, it would not make sense to pivot on the AccountNum column, because the result would be a new column named [0851774002], another one named [0851774003], etc.

Instead, an intermediate step needed to be performed that introduced a value that could be pivoted on.  This value needed to be consistent across the individual people (so that the first record for everybody contained the same value in this new column, the second record for everybody contained the same value, etc).

SQL Server 2005 introduced Ranking functions that provide the ability to rank a record within a partition.  In this case, we can use RANK() to assign a unique number for each record, and partition by the person's name (so that the RANK will reset for each person).  By prefixing some text to the rank number, we end up with something like:

SELECT Name, Email, AccountNum, 'AccountNum' + CAST(RANK() OVER ( PARTITION BY Name, Email ORDERBY AccountNum ) AS VARCHAR(10)) R FROM myTable Results: Name Email AccountNum R =========== ============== ========== ============ Bob Public bob@bar.com 0851693000 AccountNum1 Jane Doe JaneD@baz.com 0851636000 AccountNum1 Jane Doe JaneD@baz.com 1028030000 AccountNum2 Jane Doe JaneD@baz.com 1138299000 AccountNum3 Jane Doe JaneD@baz.com 1353452000 AccountNum4 John Doe jd@foo.com 0851774001 AccountNum1 John Doe jd@foo.com 0851774002 AccountNum2 John Doe jd@foo.com 0851774003 AccountNum3 John Doe jd@foo.com 0851774100 AccountNum4

The new column (R) is the concatenation of the literal string "AccountNum" and the string representation of the number that the RANK function returned.  But the bigger point is that now this column can be used for pivoting, and result in a series of new columns called [AccountNum1], [AccountNum2], [AccountNum3], etc.

Pivoting in SQL Server 2005 requires explicit declaration of values as a column list.  In this case, we can't just say "Pivot on the R column", but rather must say "Pivot on the R column, and make new columns only for these specific values".  This restriction is a little bit of a downside because we need knowledge of the values in the column.  Or, in this case, we need to know how many possible Account Numbers a person could possibly have so that we create enough columns in the result.

The entire solution is as follows:

SELECT * FROM ( SELECT Name, Email, AccountNum, 'AccountNum' + CAST ( RANK() OVER ( PARTITION BY Name, Email ORDER BY AccountNum ) AS VARCHAR(10)) R FROM myTable ) AS rankedSource PIVOT ( MAX (AccountNum) FOR R IN ( [AccountNum1], [AccountNum2], [AccountNum3], [AccountNum4], [AccountNum5], [AccountNum6], [AccountNum7], [AccountNum8], [AccountNum9], [AccountNum10] ) ) AS pivottable


And the results (showing only two of the AccountNum columns, even though there are actually 10)

Name Email AccountNum1 AccountNum2 ========== ============= =========== =========== Bob Public bob@bar.com 0851693000 NULL Jane Doe JaneD@baz.com 0851636000 1028030000 John Doe jd@foo.com 0851774001 0851774002

SQL Server 2008: Spatial Data, Part 5

In the previous part of this series, I demonstrated instance methods that transformed a single Geometry type into another useful Geometry.  In this post, we'll go a step further and show methods that allow two or more instances to interact with one another in order to produce a new Geometry.

For my baseline, I'll use two Polygons that overlap each other:

  DECLARE @g geometry 
        = 'POLYGON((10 10, 40 10, 40 40, 10 40, 10 10))'DECLARE @h geometry 
        = 'POLYGON((30 30, 50 30, 50 50, 30 50, 30 30))'



STDifference() returns a new instance consisting of all points from the base instance that do not contain points from the parameter instance.


  SELECT @g.STDifference(@h).ToString();


POLYGON ((10 10, 40 10, 40 30, 30 30, 30 40, 10 40, 10 10))




STIntersection() returns a new instance containing only the points that are in common between the base instance and the parameter instance.


  SELECT @g.STIntersection(@h).ToString();


POLYGON ((30 30, 40 30, 40 40, 30 40, 30 30))




STSymDifference() returns a new instance containing only the points that are unique to both the base instance and the parameter instance (i.e., it excludes the points that STIntersection() would return).

In this case, the set of points is actually two different Polygons.  Because STSymDifference() needs to return a single instance of something, it will wrap those two Polygons into a collection (MultiPolygon).


  SELECT @g.STSymDifference(@h).ToString();


MULTIPOLYGON (((40 30, 50 30, 50 50, 30 50, 30 40, 40 40, 40 30)), 
              ((10 10, 40 10, 40 30, 30 30, 30 40, 10 40, 10 10)))




STUnion() returns a new instance containing all of the points of the base instance and the parameter instance merged together.


  SELECT @g.STUnion(@h).ToString();


POLYGON ((10 10, 40 10, 40 30, 50 30, 50 50, 30 50, 
30 40, 10 40, 10 10))



Blended Types

The instance methods described above do not work just for Polygons.  You can actually use them on different types, or even collections of different types. 

For instance, if we look at the results of using a LineString as the base instance and a Polygon as the parameter instance, STDifference() will return a MultiLineString constisting of the points from the original LineString that do not lie within the Polygon:

  DECLARE @g geometry = 'LINESTRING(9 9, 40 40)'DECLARE @h geometry = 'POLYGON((15 15, 15 30, 30 30, 30 15, 15 15))'SELECT @g.STDifference(@h).ToString();


MULTILINESTRING ((40 40, 30 30), (15 15, 9 9))



STIntersection() will return the points from the original LineString that do lie within the Polygon:

  SELECT @g.STIntersection(@h).ToString();


LINESTRING (30 30, 15 15)



STUnion() cannot determine a single common Geometry type, so it will return a mixed collection of types:

  SELECT @g.STUnion(@h).ToString();


     LINESTRING (40 40, 30 30), 
     POLYGON ((15 15, 30 15, 30 30, 15 30, 15 15)), 
     LINESTRING (15 15, 9 9)



SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

(next part) SQL Server 2008: Spatial Data, Part 6

SQL Server 2008: Spatial Data, Part 7

SQL Server 2008: Spatial Data, Part 8

kick it on DotNetKicks.com