Tuesday, October 19, 2010

Using GroupBy with LINQ

I love LINQ, but sometimes the syntax can be a little strange. In particular, I find the group by syntax to be a little weird, but not too bad once I broke it down.

For this example, let’s assume we have a person table. The person table has a column called Score and has some other fields that don’t really matter for this example. The related table is a Gender table. It has one field of importance and that is the Name field. The two rows in this table are Male and Female. There is a foreign key in the Person table that points to the Gender table. Think of it this way. The person table is the main table, and there could be a drop down list to select the gender for the person.

With that in mind, we want to know what the total score for Males and Females. We want to do this using a group by. The results will be two columns: Gender and Score.

Here is an example of the output that we desire.

Gender Score
Female 2013
Male 1923

Here is the lamdba based LINQ query we would need to do this.

   People
   .GroupBy (e => new {Gender = e.Gender.Name} )
   .Select (byGenderGroup =>
         new 
         {
            Gender = byGenderGroup.Key.Gender,
            Score = byGenderGroup.Sum (t => t.Score)
         }
   );

If we take this line by line we will see that this really isn’t so different from SQL that would be generated. Here is the SQL

   SELECT SUM([t0].[Score]) AS [Score], [t1].[Name] AS [Gender]
FROM [Person] AS [t0]
INNER JOIN [Gender] AS [t1] ON [t1].[ID] = [t0].[GenderID]
GROUP BY [t1].[Name]

Let’s go line by line of the LINQ code.

People is the main table we are working with

GroupBy creates an anonymous type (that way we can easily add additional columns to group by). In this case we group by the Gender.Name just like the last line of the SQL statement.

The Select lines create another anonymous type so that we can select just the columns we want to return. Notice that byGenderGroup doesn’t have a property called Gender. Since byGenderGroup doesn’t represent a person record and actually represents the grouped results, we can access any of the columns that we have grouped by in the above GroupBy line. In this case, Key collection only gives us one property, and that is Gender. The byGenderGroup does have many other methods that are available though. One example is the sum method. In general byGenderGroup has all the aggregate functions you would have in SQL.

FYI, you can also do this without using lamdba expressions, though I personally don’t like the syntax and find it confusing.

var results = from p in People group p by new {Gender = p.Gender.Name} into byGenderGroup
select new {byGenderGroup.Key.Gender, Score = byGenderGroup.Sum (t => t.Score)};

The choice is yours.

No comments: