A View Inside My Head

Jason's Random Thoughts of Interest

NAVIGATION - SEARCH

Project Euler Comes to Azeroth

It seems that a lot of my friends are doing Project Euler (according to my High School math teacher, this is pronounced "Oiler").  For example, Bill Wagner has been posting C# solutions, Darrell Hawley has ventured into the Python realm, and Dustin Campbell has been working on F# versions.

I love numbers, and spent a good portion of one summer playing with primes and number fields just for fun (since then, I've discovered WoW, and that takes up all of my time that would otherwise be spent exercising my brain).  Project Euler is actually right up my alley, and while in Seattle, I joked with Dustin that I should post solutions using LUA, and use World of Warcraft as my testbed... 

Problem 1 is finished.  :-D

WoWScrnShot_042408_205532

My no-frills WoW add-on is a simple ACE2 mod that includes AceConsole (for printing to the chat window in the lower left of the screenshot).  I won't bore you with the framework code, but will list my solutions as individual functions on my wiki (where it can grow without polluting my blog's RSS feed). 

As a taste, here's the Problem 1 solution written in LUA.  OnEnable() is my add-on's entry point, and it simply calls into the function Problem001(limit).

function ProjectEuler:OnEnable()
   ProjectEuler:Problem001(1000)
end 

function ProjectEuler:Problem001(upperLimit)
   self:Print("Sum of all numbers less than " .. upperLimit .. " that are divisible by 3 or 5")

   local f = function(factor)
      local n = math.ceil(upperLimit / factor)
      return n * (n-1) * factor / 2
   end

   local result = f(3) + f(5) - f(15)

   self:Print(result)

end

SQL Server 2008: Spatial Data, Part 6

In the Part 4 and Part 5 of the series, I demonstrated some instance methods of the Geometry type that returned a new Geometry based on existing instances.  In this part, I will concentrate on instance methods and properties of the Geometry type that return scalar values and Points.

STArea, STLength


Typically, your spatial data will represent something from the real world.  A LineString may be the collection of points gathered from a GPS device, and together they may represent the path that you took from your home to the office.  A Polygon may be the collection of points around the boundary of governmental territory, like a county or a parish within your state.

In both of these cases, the time will come when you will want to know the length of the LineString (or length of the perimeter of the Polygon) and the area within the Polygon.  OGC standard method STArea() returns a float indicating the area of the instance in square units (or 0, if the instance is not a Polygon and does not have area).  STLength() returns a float indicating the length of the instance in units (or 0, if the instance is a Point and does not have length). 

 

DECLARE @g GEOMETRY = 'POLYGON((10 10, 10 40, 40 40, 10 10))'
SELECT
@g.STArea(), @g.STLength() Results: Area Length
450 102.426406871193

 Spatial_6_1

 

STCentroid


Thinking back to Mr. Bollenbacher's 10th Grade Geometry class, we had to use a compass and straight edge to construct lines bisecting the angles of polygons (primarily triangles).  The point where the angle bisectors met was the exact center, or centroid, of the shape.  Centroids are important because any line that passes through a centroid will divide the Polygon into two parts of equal area.  It should be noted that a Centroid may not actually be on the surface of a Polygon.

The OGC standard method STCentroid() returns a Point indicating the centroid of the shape.  If the instance is not a Polygon (or MultiPolygon), then NULL will be returned.

DECLARE @g GEOMETRY = 'POLYGON((10 10, 10 40, 40 40, 10 10))'
SELECT
@g.STCentroid().ToString() Results: POINT (20 30)

Spatial_6_2 
Note: SpatialViewer displays an individual point as an X.

 

STWithin, STContains


Two OGC standard methods returns a 1 or 0 indicating whether all of the points of one instance exist entirely inside of another instance.  STWithin() tests whether the base instance is inside of the parameter instance, while STContains() tests whether the parameter instance is inside of the base instance.

DECLARE @g geometry = 'POLYGON ((10 10, 13 30, 30 30, 30 15, 10 10))'
DECLARE @h geometry = 'LINESTRING (16 16, 16 24, 25 18)'
SELECT @g.STContains(@h), @g.STWithin(@h)
SELECT @h.STContains(@g), @h.STWithin(@g)
Results: 1 0

0 1

Spatial_6_3

 

ST is {something}?


There are a number of OGC standard methods to check whether a given instance meets certain specifications: STIsClosed, STIsEmpty, STIsRing, STIsSimple, STIsValid

CLOSED: An instance is considered to be closed if the start point is the same as the end point.  By definition, a Polygon has to be closed, and a Point is not closed.  That really only leaves LineString.  For a collection of objects to be considered closed, all of its members must be closed.

EMPTY: A Geometry can be initialized in a special way as to not contain any points.  In SQL terms, this is sort of like having a NULL value, except it really is an instantiated object.  For example, LINESTRING EMPTY is a valid LineString, but it has no points.  Another humorous example is POINT EMPTY, which initializes to a Point without a Point....  so it's kind of Pointless, right?  (thank you, I'm here all week, tip your waitress).

RING: An instance is considered to be a ring if it is both Closed and Simple.

SIMPLE: An instance is considered to be simple if it does not cross over itself or otherwise touch itself.  For example, a LineString forming the letter 'S' is simple because it never comes in contact with itself.  But, a LineString that forms a Figure-Eight (8) is not simple because it would have to cross over itself.  Likewise, two circles (MultiPolygon) stacked on top of each other to form a Figure-Eight would not be simple because they touch each other.

VALID: A Geometry can cross over itself, but it cannot legally trace over itself.  That is, picture a LineString that backtracks over itself at some point, kind of like how I write my letter "P".  This is not considered to be Valid.

Spatial_6_4

Tip: SQL Server will allow an invalid Geometry to be instantiated, and Microsoft has provided an extension method called MakeValid() that will convert the invalid instance into a valid instance.  In the letter "P" example, instead of the vertical line going down and then back up (as I draw it by hand), the valid form will eliminate the duplication of points simply by start at the bottom and going up (so that the LineString never traces over itself).  If it's not possible to simplify a shape in this way so that there is only one continuous path, then it will be broken up into multiple valid shapes (i.e., a MultiLineString, etc).

 

STX, STY, Z, M


Individual coordinates of a Point can be accessed via the OGC Standard properties STX and STY.  Three-dimensional Points also have a Z coordinate, which can be accessed via Microsoft's extended Z property.  Likewise, four-dimensional Points have a M (for Measure) coordinate, which can be accessed via Microsoft's extended M property.  If Z or M is not defined for a given point, then NULL will be returned.

DECLARE @g geometry = 'POINT(1 2)'
DECLARE
@h geometry = 'POINT(1 2 3 4)'
SELECT
@g.STX, @g.STY, @g.Z, @g.M SELECT @h.STX, @h.STY, @h.Z, @h.M Results: 1 2 NULL NULL 1 2 3 4

 

STPointOnSurface


When working with spatial data, especially without using a viewer, it can be kind of difficult to pick an arbitrary point that is inside of a Polygon (or on a LineString).  Thankfully, the OGC standard method STPointOnSurface() does just that.  Given a Geometry instance, it will return a somewhat random point that is guaranteed to be located within the interior of that instance.

DECLARE @g geometry = 'POLYGON((10 10, 14 15, 50 12, 45 30, 10 30, 10 10))'
SELECT @g.STPointOnSurface().ToString() Results: POINT (23 25)

Spatial_6_5

 

STSrid


All of my examples to this point have used the default Spatial Reference ID of 0 (for the Geometry type) simply because I have not been specifying one.  The SRID is the mechanism that defines one geometry as being based on a different set of parameters than a geometry with a different SRID. 

For example, you may have a set of shapes defined where each unit represents one meter, while another set of shapes is based on a reference system where each unit represents 1.5 inches.  It's totally legal to mix these shapes together the same column of a table in your database, provided that you assign a different SRID to each.  SQL Server does not need to know what units represent, because it will never permit the interaction of a shape from one SRID with a shape from another SRID. 

The OGC standard property STSrid will get (or set) the SRID of the Geometry instance.

-- @g will have the default SRID = 0
DECLARE @g GEOMETRY = 'POLYGON((10 10, 10 40, 40 40, 10 10))'

-- @h is defined with SRID = 123
DECLARE @h GEOMETRY = GEOMETRY::STGeomFromText('POLYGON((10 10, 40 10, 40 40, 10 10))', 123) select @g.STUnion(@h).ToString() -- Returns NULL because of different SRIDs. But, let's change
-- @g to use SRID = 123


SET
@g.STSrid = 123 SELECT @g.STUnion(@h).ToString() -- Returns POLYGON ((10 10, 40 10, 40 40, 10 40, 10 10))

 

Jason, What's Next?


Enough of this flat Earth stuff!  In the next part, I'll explore the Geography data type.  This is where things really start to get interesting.  Stay tuned!

SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

SQL Server 2008: Spatial Data, Part 5

(next part) SQL Server 2008: Spatial Data, Part 7

SQL Server 2008: Spatial Data, Part 8

kick it on DotNetKicks.com

Using PIVOT and RANK Together

A friend of mine (name withheld, I didn't actually ask if I could blog this... ;-) asked for advice to what appears to be a simple problem until you try to implement it.  Consider the following somewhat normalized table:

AccountNum

Name

Email

0851774002 

John Doe  

jd@foo.com

0851774003   

John Doe   

jd@foo.com

0851774001   

John Doe   

jd@foo.com

0851774100   

John Doe   

jd@foo.com

0851693000   

Bob Public   

bob@bar.com

1138299000   

Jane Doe   

JaneD@baz.com

1353452000   

Jane Doe   

JaneD@baz.com

1028030000   

Jane Doe 

JaneD@baz.com

0851636000   

Jane Doe   

JaneD@baz.com


What he wanted was to collapse the data to one row per person, with a column for each Account Number.  That is, he needed to pivot the table.

When you pivot a table, unique values in the source column that you pivot on become new columns in the resulting table.  So, in this case, it would not make sense to pivot on the AccountNum column, because the result would be a new column named [0851774002], another one named [0851774003], etc.

Instead, an intermediate step needed to be performed that introduced a value that could be pivoted on.  This value needed to be consistent across the individual people (so that the first record for everybody contained the same value in this new column, the second record for everybody contained the same value, etc).

SQL Server 2005 introduced Ranking functions that provide the ability to rank a record within a partition.  In this case, we can use RANK() to assign a unique number for each record, and partition by the person's name (so that the RANK will reset for each person).  By prefixing some text to the rank number, we end up with something like:

SELECT Name, Email, AccountNum, 'AccountNum' + CAST(RANK() OVER ( PARTITION BY Name, Email ORDERBY AccountNum ) AS VARCHAR(10)) R FROM myTable Results: Name Email AccountNum R =========== ============== ========== ============ Bob Public bob@bar.com 0851693000 AccountNum1 Jane Doe JaneD@baz.com 0851636000 AccountNum1 Jane Doe JaneD@baz.com 1028030000 AccountNum2 Jane Doe JaneD@baz.com 1138299000 AccountNum3 Jane Doe JaneD@baz.com 1353452000 AccountNum4 John Doe jd@foo.com 0851774001 AccountNum1 John Doe jd@foo.com 0851774002 AccountNum2 John Doe jd@foo.com 0851774003 AccountNum3 John Doe jd@foo.com 0851774100 AccountNum4

The new column (R) is the concatenation of the literal string "AccountNum" and the string representation of the number that the RANK function returned.  But the bigger point is that now this column can be used for pivoting, and result in a series of new columns called [AccountNum1], [AccountNum2], [AccountNum3], etc.

Pivoting in SQL Server 2005 requires explicit declaration of values as a column list.  In this case, we can't just say "Pivot on the R column", but rather must say "Pivot on the R column, and make new columns only for these specific values".  This restriction is a little bit of a downside because we need knowledge of the values in the column.  Or, in this case, we need to know how many possible Account Numbers a person could possibly have so that we create enough columns in the result.

The entire solution is as follows:

SELECT * FROM ( SELECT Name, Email, AccountNum, 'AccountNum' + CAST ( RANK() OVER ( PARTITION BY Name, Email ORDER BY AccountNum ) AS VARCHAR(10)) R FROM myTable ) AS rankedSource PIVOT ( MAX (AccountNum) FOR R IN ( [AccountNum1], [AccountNum2], [AccountNum3], [AccountNum4], [AccountNum5], [AccountNum6], [AccountNum7], [AccountNum8], [AccountNum9], [AccountNum10] ) ) AS pivottable

 

And the results (showing only two of the AccountNum columns, even though there are actually 10)

Name Email AccountNum1 AccountNum2 ========== ============= =========== =========== Bob Public bob@bar.com 0851693000 NULL Jane Doe JaneD@baz.com 0851636000 1028030000 John Doe jd@foo.com 0851774001 0851774002

SQL Server 2008: Spatial Data, Part 5

In the previous part of this series, I demonstrated instance methods that transformed a single Geometry type into another useful Geometry.  In this post, we'll go a step further and show methods that allow two or more instances to interact with one another in order to produce a new Geometry.

For my baseline, I'll use two Polygons that overlap each other:

  DECLARE @g geometry 
        = 'POLYGON((10 10, 40 10, 40 40, 10 40, 10 10))'DECLARE @h geometry 
        = 'POLYGON((30 30, 50 30, 50 50, 30 50, 30 30))'

Spatial_5_1

STDifference


STDifference() returns a new instance consisting of all points from the base instance that do not contain points from the parameter instance.

 

  SELECT @g.STDifference(@h).ToString();

Result:

POLYGON ((10 10, 40 10, 40 30, 30 30, 30 40, 10 40, 10 10))

Spatial_5_2

 

STIntersection


STIntersection() returns a new instance containing only the points that are in common between the base instance and the parameter instance.

 

  SELECT @g.STIntersection(@h).ToString();

Result:

POLYGON ((30 30, 40 30, 40 40, 30 40, 30 30))

Spatial_5_3

 

STSymDifference


STSymDifference() returns a new instance containing only the points that are unique to both the base instance and the parameter instance (i.e., it excludes the points that STIntersection() would return).

In this case, the set of points is actually two different Polygons.  Because STSymDifference() needs to return a single instance of something, it will wrap those two Polygons into a collection (MultiPolygon).

 

  SELECT @g.STSymDifference(@h).ToString();

Result:

MULTIPOLYGON (((40 30, 50 30, 50 50, 30 50, 30 40, 40 40, 40 30)), 
              ((10 10, 40 10, 40 30, 30 30, 30 40, 10 40, 10 10)))

 Spatial_5_4

 

STUnion


STUnion() returns a new instance containing all of the points of the base instance and the parameter instance merged together.

 

  SELECT @g.STUnion(@h).ToString();

Results:

POLYGON ((10 10, 40 10, 40 30, 50 30, 50 50, 30 50, 
30 40, 10 40, 10 10))

Spatial_5_5

 

Blended Types


The instance methods described above do not work just for Polygons.  You can actually use them on different types, or even collections of different types. 

For instance, if we look at the results of using a LineString as the base instance and a Polygon as the parameter instance, STDifference() will return a MultiLineString constisting of the points from the original LineString that do not lie within the Polygon:

  DECLARE @g geometry = 'LINESTRING(9 9, 40 40)'DECLARE @h geometry = 'POLYGON((15 15, 15 30, 30 30, 30 15, 15 15))'SELECT @g.STDifference(@h).ToString();

Results:

MULTILINESTRING ((40 40, 30 30), (15 15, 9 9))

Spatial_5_6 

 

STIntersection() will return the points from the original LineString that do lie within the Polygon:

  SELECT @g.STIntersection(@h).ToString();

Results:

LINESTRING (30 30, 15 15)

Spatial_5_7

 

STUnion() cannot determine a single common Geometry type, so it will return a mixed collection of types:

  SELECT @g.STUnion(@h).ToString();

Results:

GEOMETRYCOLLECTION 
(
     LINESTRING (40 40, 30 30), 
     POLYGON ((15 15, 30 15, 30 30, 15 30, 15 15)), 
     LINESTRING (15 15, 9 9)
)

Spatial_5_8

 

SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

SQL Server 2008: Spatial Data, Part 4

(next part) SQL Server 2008: Spatial Data, Part 6

SQL Server 2008: Spatial Data, Part 7

SQL Server 2008: Spatial Data, Part 8

kick it on DotNetKicks.com

SQL Server 2008: Spatial Data, Part 4

In this, the 4th post in a series (Part 1, Part 2, Part 3) on the new spatial data types in SQL Server 2008, I'll explain some of the methods that are used to transform a single Geometry instance into another useful Geometry instance.  Note that I'm using Geometry for simplicity, but these techniques also work with GeographyEdit: Ok, after starting to take a hard look at Geography, I realized that A LOT of the methods that Geometry offers are not implemented in Geography.  :-/  Sorry to mislead you.

 

Useful TipTo help me to visualize geometries as I explore the capabilities of the spatial data type, I've been using SpatialViewer, a tool written by fellow SQL Server MVP Simon Sabin.

 

As a baseline, I'll be using my highest quality, hand-drawn letter "S".  This LineString is a simple geometry (it does not cross over itself), but is not closed (the starting and ending points are not the same).

 

  DECLARE @g GEOMETRY
SET @g = 'LINESTRING (  69 26, 69 23, 69 21, 67 20, 65 20, 
          63 18, 58 17, 52 17, 51 17, 49 17, 45 18, 44 20, 
          44 21, 42 26, 42 29, 42 32, 42 35, 43 35, 47 38, 
          50 41, 55 42, 58 42, 65 44, 66 44, 67 44, 68 45, 
          69 47, 70 48, 70 50, 71 51, 70 56, 68 60, 68 63, 
          66 65, 65 66, 63 68, 60 71, 59 71, 57 71, 55 71, 
          51 69, 45 65, 44 63, 42 62, 41 59, 41 57, 41 56, 
          41 54, 42 53 )'

Spatial_S_1

 

STEnvelope

A bounding box is a rectangle that is defined by the combination of minimum and maximum X and Y values found in a given Geometry instance.  The OGC standard method STEnvelope() returns the bounding box (a Polygon) for the instance on which it is invoked.  All points of the original instance lie within the new Polygon.

Note: In the following T-SQL examples, I will trail the method calls with a ToString() so that we can examine the resulting WKT.  Normally, you would just work with the geometry (i.e., you wouldn't need call ToString())..

 

  SELECT @g.STEnvelope().ToString()

Result:

POLYGON ((41 17, 71 17, 71 71, 41 71, 41 17))

Spatial_S_4 

Note: In these pictures, the original geometry is drawn in black, and the new geometry is superimposed in red.  If the new geometry has area, then that area will appear as light brown or tan.

 

STConvexHull

biten_apple_xl If something is convex, then it is thought of as having a surface that is curved or rounded outward from the center.  For example, the side of an apple is convex.  If you happen to take a bite out of the apple, however, then that the portion that has been removed is considered to be concave.  To "correct" the concave portion of the half-eaten apple, we could fill in the hole with something like clay, but we would only need to restore enough material so that there was a straight line from the top of the hole to the bottom (and the same for side to side).

The OGC standard method STConvexHull() returns the minimal bounding convex Polygon for a geometry instance.  That is, any convex parts of the original instance will be preserved, and any concave parts will be "filled in", so to speak, by defining a straight line to bypass them.  Like STEnvelope(), all points of the original instance lie within the new Polygon.

 

  SELECT @g.STConvexHull().ToString()

Result:

POLYGON ((71 51, 70 56, 68 63, 66 65, 65 66, 63 68, 60 71, 
          59 71, 57 71, 55 71, 51 69, 45 65, 42 62, 41 59, 
          41 57, 41 56, 41 54, 42 26, 44 20, 45 18, 49 17, 
          51 17, 52 17, 58 17, 63 18, 67 20, 69 21, 71 51))

Spatial_S_2

 

STBuffer

What if you have an existing shape, and you want to make it bigger, but preserve the general... uh, shape of it?  Then you would use the OGC standard method STBuffer(distance)!  This method returns a Polygon that inflates the area around the original geometry instance by a number of units that you provide.  Note that if the original instance is a Point, then the result will be a circle with a radius of the number of units that you provided.  It is also possible to deflate an existing Polygon by supplying a negative buffer value.

 

  SELECT @g.STBuffer(5).ToString()

Result:

POLYGON ((49 12, 51 12, 52 12, 58 12, 
          58.246719360351562 12.00609016418457, 
          58.492688179016113 12.024333953857422, 
          58.737458229064941 12.054683685302734, 
          58.98058032989502 12.097097396850586, 
          (... snipped for clarity ...)
          48.693824768066406 12.009382247924805, 49 12))

Spatial_S_3

What happens if the amount of buffering forces the inflated area to overlap with itself?  The resulting Polygon may develop holes (the area within a hole is still considered part of the exterior of the Polygon).  Here the buffer increases to 8, creating a hole:

 

  SELECT @g.STBuffer(8).ToString()

Result:

POLYGON (( exterior ring points ), ( interior ring points ))

Spatial_S_5

 

STExteriorRing, STInteriorRingN

The result of the very last example was a Polygon with one interior hole.  OGC standards provide a way to access the various components of a polygon (exterior ring and interior rings) individually. 

STExteriorRing() returns just the closed LineString of the Polygon itself.

 

  SELECT @g.STBuffer(8).STExteriorRing().ToString()

Result:

LINESTRING (49 9, 51 9, 52 9, 58 9, 
            58.394750595092773 9.0097446441650391, 
            (... snipped for clarity ...)
            48.022533416748047 9.0599403381347656, 
            48.5101203918457 9.0150127410888672, 49 9)

Spatial_S_6

Similarly, STInteriorRingN(n) is used to return the closed LineString of interior rings. 

Note: This method is accessing a member of a GeomCollection by index, which I plan to cover in a later post.  What's important to know now is that indexing starts at 1, and you will get an error if you specify an index that does not actually exist in the collection.

  
  SELECT @g.STBuffer(8).STInteriorRingN(1).ToString()

Result:

LINESTRING (48.893725268961383 48.937175344014442, 
            49.084709167480469 49.279642105102539, 
            (... snipped for clarity ...)
            48.893725268961383 48.937175344014442)
  Spatial_S_7
 

Extended Methods

Microsoft has implemented some additional methods on Geometry instances to perform tasks that are beyond the scope of the OGC standards.

Reduce(tolerance) is a method that will simplify a given instance using the Douglas-Peucker algorithm.  The result is a an approximation of the original instance containing a fewer number of points.  The accuracy of the new shape improves as the provided tolerance value approaches zero, but more points are necessary to provide that accuracy. 

 

  SELECT @g.Reduce(5).ToString()

Result:

LINESTRING (69 26, 49 17, 42 26, 42 35, 70 48, 
            60 71, 42 62, 42 53)
Spatial_S_8

 

 

BufferWithTolerance(distance, tolerance, relative) is very similar in function to STBuffer(), only it gives you more control over the accuracy of the result.  The tolerance parameter, like in the Reduce() method, controls the amount of acceptable error a resulting line segment can be from what is ideal. 

To understand how tolerance factors into the result, then picture a Point that is buffered into a circle.  To truly represent a circle, we would need a Polygon consisting of a infinite number of points.  This is not practical, but by allowing for error, we can generate a regular Polygon with any number of sides that behaves like a circle.  Smaller tolerances result in Polygons with a very large number of very small sides.  Likewise, larger tolerances result in Polygons with fewer sides, and your circle will start to resemble things like octagons, hexagons, and even squares.

Compare the image below generated with a less accurate tolerance (left image), to the STBuffer(8) example from above (right image).  The difference is subtle, primarily because the scale that we're working with is pretty small.  But, one difference to note is that the ends of the inflated "S" on the left are straight while the ones on the right are rounded.  Overall, the the image on the right has more points, and thus is a more accurate the original "S" shape than the image on the left.

 

  SELECT @g.BufferWithTolerance(8, 10, 0).ToString()

Result:

POLYGON (( ring points ), ( hole points ))
  Spatial_S_9
  Spatial_S_5

SQL Server 2008: Spatial Data, Part 1

SQL Server 2008: Spatial Data, Part 2

SQL Server 2008: Spatial Data, Part 3

(next part) SQL Server 2008: Spatial Data, Part 5

SQL Server 2008: Spatial Data, Part 6

SQL Server 2008: Spatial Data, Part 7

SQL Server 2008: Spatial Data, Part 8

kick it on DotNetKicks.com