A View Inside My Head

Jason's Random Thoughts of Interest

NAVIGATION - SEARCH

Using SQL Server Spatial Objects as ADO.NET Parameter Values

I've previously mentioned that the SQL Server 2008 Spatial data types are freely available for use in your .NET applications, regardless of whether you have SQL Server 2008 or not.  This allows you to incorporate some powerful spatial capabilities right into your application. 

(Look for "Microsoft SQL Server System CLR Types" on this page: http://www.microsoft.com/downloads/details.aspx?FamilyID=228DE03F-3B5A-428A-923F-58A033D316E1&displaylang=en )

However, in most usage scenarios, there will come a time when you have an instance of a SQL Server spatial object in your .NET application, and need to commit it to your SQL Server 2008 database.  How would you do this, without losing fidelity or resorting to serialization of the object to WKT first?

The solutions is to create a Parameter object of type System.Data.SqlDbType.Udt.  Then set the UdtTypeName parameter to the SQL Server-recognized type name (i.e., for SqlGeometry, you would simply use Geometry).

The following code demonstrates executing an UPDATE statement that sets the value of a Spatial field to a newly constructed object.

using (SqlConnection conn = new SqlConnection("Server=.;Integrated Security=true;Initial Catalog=scratch")) { using (SqlCommand cmd = new SqlCommand("UPDATE fe_2007_us_zcta500 SET Boundary=@boundary WHERE id=@id", conn)) { SqlParameter id = cmd.Parameters.Add("@id", System.Data.SqlDbType.Int); SqlParameter boundary = cmd.Parameters.Add("@boundary", System.Data.SqlDbType.Udt); boundary.UdtTypeName = "geometry"; SqlGeometry geom = SqlGeometry.Parse("POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))"); boundary.Value = geom; id.Value = 123; conn.Open(); cmd.ExecuteNonQuery(); conn.Close(); } }

Coding in SQL Server: An Evolution

Tuesday at the NWNUG meeting, Steven Smith spoke on various ways to squeeze performance out of your ASP.NET applications.  This was a fantastic talk, and gave me plenty to think about (since ASP.NET is not my forte, I only consider myself to have an intermediate skillset on this topic).

One suggestion that he made involved caching database writes.  That is, instead of immediately writing logging-type information to the database for every request, which is a relatively expensive operation considering the small payload size, that you could accumulate them in a short-term cache, and then perform the write operation periodically.  Fewer database calls = faster performance.

In his example, he spoke of his advertisement server that might serve many impressions per second, but he doesn't want each impression to incur an expensive database write.  So, he keeps track of the activity locally, and then persists to the database every 5 seconds using a single database call containing multiple data points.

The code that Steve demonstrated utilized XML to contain the data within a single block of text (read: can be passed in as a single parameter to a stored procedure):

<ROOT>
<Activity customerId="ALFKI" viewCount="5" />
<Activity customerId="ANATR" viewCount="7" />
</ROOT>

 

Now, consuming XML from T-SQL is an area that I know very well, so I cringed a little bit when Steve showed the actual stored procedure code itself:

CREATE PROCEDURE dbo.BulkLogCustomerViews @@doc text -- XML Doc... AS DECLARE @idoc int -- Create an internal representation (virtual table) of the XML document... EXEC sp_xml_preparedocument @idoc OUTPUT, @@doc -- Perform UPDATES UPDATE TopCustomerLog SET TopCustomerLog.ViewCount = TopCustomerLog.ViewCount + ox2.viewCount FROM OPENXML (@idoc, '/ROOT/Activity',1) WITH ( [customerId] NCHAR(5) , viewCount int ) ox2 WHERE TopCustomerLog.[customerId] = ox2.[customerId] -- Perform INSERTS INSERT INTO TopCustomerLog ( CustomerID , ViewCount ) SELECT [customerId] , viewCount FROM OPENXML (@idoc, '/ROOT/Activity',1) WITH ( customerId NCHAR(5) , viewCount int ) ox WHERE NOT EXISTS (SELECT customerId FROM TopCustomerLog WHERE TopCustomerLog.customerId = ox.customerId) -- Remove the 'virtual table' now... EXEC sp_xml_removedocument @idoc

Now, to Steve's credit, this code works just fine, and can probably be used as-is on all versions of SQL Server from 7.0 through 2008.  But, since we really don't write ASP applications consisting entirely of Response.Write any longer, I'd like to see Steve update his demo to use more modern techniques on the database as well.  ;-)

The first thing that he could do is update the procedure to utilize the XML data type that was first introduced in SQL Server 2005.  This would simplify the code a little bit, and would get rid of the dependency on the COM-based MSXML.dll, which the sp_xml_preparedocument and OPENXML() uses.

CREATE PROCEDURE dbo.BulkLogCustomerViews @doc xml AS -- Perform UPDATES UPDATE TopCustomerLog SET TopCustomerLog.ViewCount = TopCustomerLog.ViewCount + ox2.viewCount FROM ( SELECT T.activity.value('@customerId', 'nchar(5)') as CustomerID, T.activity.value('@viewCount', 'int') viewCount FROM @doc.nodes('/ROOT/Activity') as T(activity) ) ox2 WHERE TopCustomerLog.[customerId] = ox2.[customerId] -- Perform INSERTS INSERT INTO TopCustomerLog ( CustomerID , ViewCount ) SELECT [customerId] , viewCount FROM ( SELECT T.activity.value('@customerId', 'nchar(5)') as CustomerID, T.activity.value('@viewCount', 'int') viewCount FROM @doc.nodes('/ROOT/Activity') as T(activity) ) ox WHERE NOT EXISTS ( SELECT customerId FROM TopCustomerLog WHERE TopCustomerLog.customerId = ox.customerId )

 

Note that the XML data type in SQL Server doesn't need to be a well-formed document.  In this case, Steve could just pass in series of "Activity" elements (no "ROOT" element would be required by SQL Server, so he would also be able to simplify the .NET code that actually creates the XML string):

<Activity customerId="ALFKI" viewCount="5" />
<Activity customerId="ANATR" viewCount="7" />

 

Consequently, the XPath (XQuery, actually) within the nodes() method of the stored procedure code would need to change as well:

@doc.nodes('Activity') as T(activity)


But, we can kick this up a notch and use some SQL Server 2008 features as well. First, there's new "Upsert" capabilities (MERGE statement) that tries to simplify what Steve does with the UPDATE followed by INSERT:

CREATE PROCEDURE dbo.BulkLogCustomerViews @doc xml AS MERGE TopCustomerLog AS target USING (SELECT T.activity.value('@customerId', 'nchar(5)') as CustomerID, T.activity.value('@viewCount', 'int') as viewCount FROM @doc.nodes('Activity') as T(activity)) AS source ON (target.CustomerID = source.CustomerID) WHEN MATCHED THEN UPDATE SET target.ViewCount = target.ViewCount + source.viewCount WHEN NOT MATCHED THEN INSERT (CustomerID, ViewCount) VALUES (source.CustomerID, source.viewCount);

 

One more thing that could be done to further simplify this T-SQL is to use a Table-valued Parameter instead of the XML.  This would allow Steve to pass a fully populated table of data into the stored procedure and consume it directly by the MERGE statement.

The first step is to create a T-SQL type that defines the table structure of the parameter (this is a one-time operation, unless the table structure changes):

CREATE TYPE CustomerViewType AS TABLE ( CustomerID nchar(5) NOTNULL, ViewCount intNOTNULL );

 

Now, a parameter can be defined of this type, and used just like any other table-value variable:

ALTER PROCEDURE dbo.BulkLogCustomerViews @views CustomerViewType READONLY AS MERGE TopCustomerLog AS target USING @views AS source ON (target.CustomerID = source.CustomerID) WHEN MATCHED THEN UPDATE SET target.ViewCount = target.ViewCount + source.viewCount WHEN NOT MATCHED THEN INSERT (CustomerID, ViewCount) VALUES (source.CustomerID, source.viewCount);

 

On the ADO.NET side, the table-valued parameter could be represented as a DataTable object (other options also exist), and can be assigned directly as the value of the stored procedure's parameter object:

// Create a data table, and provide its structure DataTable customerViews = new DataTable(); customerViews.Columns.Add("CustomerID", typeof(string)); customerViews.Columns.Add("ViewCount", typeof(int)); // Fill with rows customerViews.Rows.Add("ALFKI", 5); customerViews.Rows.Add("ANATR", 7); using (SqlConnection conn = new SqlConnection("...")) { SqlCommand cmd = conn.CreateCommand(); cmd.CommandType = System.Data.CommandType.StoredProcedure; cmd.CommandText = "dbo.BulkLogCustomerViews"; SqlParameter param = cmd.Parameters.AddWithValue("@views", customerViews); conn.Open(); cmd.ExecuteNonQuery(); }

 

SQL Server 2008 RC0 Install: Sql2005SsmsExpressFacet

 

This morning's goal was to quickly install SQL Server 2008 RC0, and then move on with some project work.  Let's just say that my project work should resume by this afternoon...

In the interest of disk space, I removed an existing installation of SQL Server 2005 Developer Edition.  And then the installation of 2008 RC0 began by installing the "Microsoft.NET Framework 3.5 SP1 (Beta)"...  which is probably "install-smell" for me needing to pave my machine when the product finally RTM's.  But, I digress...

The installation went pretty smoothly until it came time for the "System Configuration Check" that takes place after you select everything that you would like to install, but before the files actually get installed.  In my case, this check failed because "The SQL Server 2005 Express Tools are installed.  To continue, remove the SQL Server 2005 Express Tools."  (This is the "Sql2005SsmsExpressFacet" rule of the installation)

Thank you, Microsoft, for that succinct failure message that includes instructions for resolution...  Except, I didn't have the SQL Server 2005 Express Tools installed.  They didn't show up in my Programs list, in the Start menu, or on my C: drive at all.  How am I to uninstall something that isn't installed?  Hrmmm....

After about an hour's search around my hard drive, I finally went into the registry, and discovered the following key:

HKLM\Software\Microsoft\Microsoft SQL Server\90\Tools\ShellSEM

Note: Jan Sotola reports that the affected 64-bit version key is:

HKLM\Software\Wow6432Node\Microsoft\...
   ...\Microsoft SQL Server\90\Tools\ShellSEM

Contained within was some registry information belonging to Red Gate SQL Prompt.  Apparently, despite my removing of the SQL 2005 Express Tools some time ago, this registry key was not removed because the Red Gate information was still there.

On a hunch, I renamed the key to "ShellSEM.old", and the SQL Server 2008 installation carried on.

UPDATE: Shortly after posting this, Theo Spears from Red Gate sent the following email:

"I apologise for this issue; the SQL Prompt team here has been working to address it. You and your readers may be interested to hear that we now have a version which works with SQL Server 2008 RC0, and no longer blocks the installation. To get a copy send us an email at support@red-gate.com"

 

I should clarify that my little rant above was not targeted at Red Gate, but I'm so happy to hear that they are proactively working to resolve this little issue.  I would have just liked for Microsoft to use more than a single registry key as evidence of a conflicting product installation, that's all.

Using PIVOT and RANK Together

A friend of mine (name withheld, I didn't actually ask if I could blog this... ;-) asked for advice to what appears to be a simple problem until you try to implement it.  Consider the following somewhat normalized table:

AccountNum

Name

Email

0851774002 

John Doe  

jd@foo.com

0851774003   

John Doe   

jd@foo.com

0851774001   

John Doe   

jd@foo.com

0851774100   

John Doe   

jd@foo.com

0851693000   

Bob Public   

bob@bar.com

1138299000   

Jane Doe   

JaneD@baz.com

1353452000   

Jane Doe   

JaneD@baz.com

1028030000   

Jane Doe 

JaneD@baz.com

0851636000   

Jane Doe   

JaneD@baz.com


What he wanted was to collapse the data to one row per person, with a column for each Account Number.  That is, he needed to pivot the table.

When you pivot a table, unique values in the source column that you pivot on become new columns in the resulting table.  So, in this case, it would not make sense to pivot on the AccountNum column, because the result would be a new column named [0851774002], another one named [0851774003], etc.

Instead, an intermediate step needed to be performed that introduced a value that could be pivoted on.  This value needed to be consistent across the individual people (so that the first record for everybody contained the same value in this new column, the second record for everybody contained the same value, etc).

SQL Server 2005 introduced Ranking functions that provide the ability to rank a record within a partition.  In this case, we can use RANK() to assign a unique number for each record, and partition by the person's name (so that the RANK will reset for each person).  By prefixing some text to the rank number, we end up with something like:

SELECT Name, Email, AccountNum, 'AccountNum' + CAST(RANK() OVER ( PARTITION BY Name, Email ORDERBY AccountNum ) AS VARCHAR(10)) R FROM myTable Results: Name Email AccountNum R =========== ============== ========== ============ Bob Public bob@bar.com 0851693000 AccountNum1 Jane Doe JaneD@baz.com 0851636000 AccountNum1 Jane Doe JaneD@baz.com 1028030000 AccountNum2 Jane Doe JaneD@baz.com 1138299000 AccountNum3 Jane Doe JaneD@baz.com 1353452000 AccountNum4 John Doe jd@foo.com 0851774001 AccountNum1 John Doe jd@foo.com 0851774002 AccountNum2 John Doe jd@foo.com 0851774003 AccountNum3 John Doe jd@foo.com 0851774100 AccountNum4

The new column (R) is the concatenation of the literal string "AccountNum" and the string representation of the number that the RANK function returned.  But the bigger point is that now this column can be used for pivoting, and result in a series of new columns called [AccountNum1], [AccountNum2], [AccountNum3], etc.

Pivoting in SQL Server 2005 requires explicit declaration of values as a column list.  In this case, we can't just say "Pivot on the R column", but rather must say "Pivot on the R column, and make new columns only for these specific values".  This restriction is a little bit of a downside because we need knowledge of the values in the column.  Or, in this case, we need to know how many possible Account Numbers a person could possibly have so that we create enough columns in the result.

The entire solution is as follows:

SELECT * FROM ( SELECT Name, Email, AccountNum, 'AccountNum' + CAST ( RANK() OVER ( PARTITION BY Name, Email ORDER BY AccountNum ) AS VARCHAR(10)) R FROM myTable ) AS rankedSource PIVOT ( MAX (AccountNum) FOR R IN ( [AccountNum1], [AccountNum2], [AccountNum3], [AccountNum4], [AccountNum5], [AccountNum6], [AccountNum7], [AccountNum8], [AccountNum9], [AccountNum10] ) ) AS pivottable

 

And the results (showing only two of the AccountNum columns, even though there are actually 10)

Name Email AccountNum1 AccountNum2 ========== ============= =========== =========== Bob Public bob@bar.com 0851693000 NULL Jane Doe JaneD@baz.com 0851636000 1028030000 John Doe jd@foo.com 0851774001 0851774002