A View Inside My Head

Jason's Random Thoughts of Interest
posts - 85, comments - 178, trackbacks - 22

Using PIVOT and RANK Together

A friend of mine (name withheld, I didn't actually ask if I could blog this... ;-) asked for advice to what appears to be a simple problem until you try to implement it. Consider the following somewhat normalized table:

AccountNum

Name

Email

0851774002

John Doe

jd@foo.com

0851774003

John Doe

jd@foo.com

0851774001

John Doe

jd@foo.com

0851774100

John Doe

jd@foo.com

0851693000

Bob Public

bob@bar.com

1138299000

Jane Doe

JaneD@baz.com

1353452000

Jane Doe

JaneD@baz.com

1028030000

Jane Doe

JaneD@baz.com

0851636000

Jane Doe

JaneD@baz.com


What he wanted was to collapse the data to one row per person, with a column for each Account Number. That is, he needed to pivot the table.

When you pivot a table, unique values in the source column that you pivot on become new columns in the resulting table. So, in this case, it would not make sense to pivot on the AccountNum column, because the result would be a new column named [0851774002], another one named [0851774003], etc.

Instead, an intermediate step needed to be performed that introduced a value that could be pivoted on. This value needed to be consistent across the individual people (so that the first record for everybody contained the same value in this new column, the second record for everybody contained the same value, etc).

SQL Server 2005 introduced Ranking functions that provide the ability to rank a record within a partition. In this case, we can use RANK() to assign a unique number for each record, and partition by the person's name (so that the RANK will reset for each person). By prefixing some text to the rank number, we end up with something like:

SELECT    Name,
          Email,
          AccountNum,
          'AccountNum' +
          CAST(
RANK() OVER ( PARTITION BY Name, Email ORDER BY AccountNum )
AS VARCHAR(10)) R FROM myTable Results: Name Email AccountNum R =========== ============== ========== ============ Bob Public bob@bar.com 0851693000 AccountNum1 Jane Doe JaneD@baz.com 0851636000 AccountNum1 Jane Doe JaneD@baz.com 1028030000 AccountNum2 Jane Doe JaneD@baz.com 1138299000 AccountNum3 Jane Doe JaneD@baz.com 1353452000 AccountNum4 John Doe jd@foo.com 0851774001 AccountNum1 John Doe jd@foo.com 0851774002 AccountNum2 John Doe jd@foo.com 0851774003 AccountNum3 John Doe jd@foo.com 0851774100 AccountNum4
 

The new column (R) is the concatenation of the literal string "AccountNum" and the string representation of the number that the RANK function returned. But the bigger point is that now this column can be used for pivoting, and result in a series of new columns called [AccountNum1], [AccountNum2], [AccountNum3], etc.

Pivoting in SQL Server 2005 requires explicit declaration of values as a column list. In this case, we can't just say "Pivot on the R column", but rather must say "Pivot on the R column, and make new columns only for these specific values". This restriction is a little bit of a downside because we need knowledge of the values in the column. Or, in this case, we need to know how many possible Account Numbers a person could possibly have so that we create enough columns in the result.

The entire solution is as follows:

SELECT  *
FROM    ( SELECT    Name,
                    Email,
                    AccountNum,
                    'AccountNum'
                    + CAST(RANK() 
          OVER ( PARTITION BY Name, Email 
                 ORDER BY AccountNum ) AS VARCHAR(10)) R
          FROM      myTable
        ) AS rankedSource 
PIVOT 
( 
    MAX(AccountNum) 
    FOR R IN 
    ( [AccountNum1], [AccountNum2], [AccountNum3],
      [AccountNum4], [AccountNum5], [AccountNum6],
      [AccountNum7], [AccountNum8], [AccountNum9],
      [AccountNum10] ) 
) AS pivottable

And the results (showing only two of the AccountNum columns, even though there are actually 10)

Name       Email         AccountNum1  AccountNum2
========== ============= =========== =========== Bob Public bob@bar.com 0851693000 NULL Jane Doe JaneD@baz.com 0851636000 1028030000 John Doe jd@foo.com 0851774001 0851774002

Print | posted on Thursday, April 10, 2008 11:03 AM | Filed Under [ Articles SQL ]

Comments have been closed on this topic.

Powered by: