Tuesday, January 18, 2011

Using LINQ in a foreach loop to build a query

I love using LINQ because it makes building up queries so easy. The exception to this is when using a foreach with a union. I suspect it is just the foreach that is causing this behavior, but for sure the behavior is consistent when using both.

Let’s pretend that you have a list of people that you let the user select one or more people from. You want to pull back the person records of just the records they selected. Maybe you want to do a database round-trip to get the latest data before an edit or may you just want to filter the list in memory. In either case, this post addresses both of them since LINQ is used in both cases. The only difference may be is that instead of a List<People> you may be using an IQueryable<Person> if you are using the database. The tricky step is identical. I know because I was originally seeing the issue on the database example I had. I wrote this using just standard LINQ (to Objects) and in memory objects to make the running of this easy for you.

With that said, I have created a Person class below that is meant to simulate the Person table in the database. I then populate it with three records (Person objects). I then specify that I want just the people with the ids 1 or 2. I then run the query, dump the results. This shows the issue. We only get one record back. Then I run the next query and dump the results. This time the results are correct and give us two records as expected.

What is the difference in implementation you ask? There is very little different. In both cases there is a foreach loop and use the union method to build up a query. Union is similar to using an OR in the where clause of a SQL statement, but often the performance is much better. The downside is that much of the query is duplicated and makes maintenance a pain. No so with LINQ. It is very easy to do so here.

The ONLY difference between what works and what doesn’t work is that I declare a LOCAL variable INSIDE the foreach loop. I then use the LOCAL variable as a parameter in the LINQ expression. I have to assume this causes it to get the value from the variable before the LINQ expression is evaluated and thus it has the value we want. Otherwise, I assume it just looks at the current (last value) of the variable declared in the foreach statement itself at the end after the foreach loop has finished executing. I don’t know exactly how it works, but that is my assumption based on what I see from the behavior of the two methods. I don’t know if I would call this a bug or not, but it is not intuitive to me and makes for it to be very easy to write buggy code if not properly tested.

public class Person
{
public string Name {get; set;}
public int ID {get; set;}
}

List<Person> people = new List<Person>();

void Main()
{
// simulate a database and a table called people that has 3 records
people.Add(new Person{ Name="Brent", ID=1});
people.Add(new Person{ Name="Lance", ID=2});
people.Add(new Person{ Name="Kim", ID=3});

// this could ids of the records the user selected
List<int> idsToFind = new List<int>();
idsToFind.Add(1);
idsToFind.Add(2);

// execute the query
var allPeople = GetPeopleByIDsBroken(idsToFind);

// output the results
allPeople.Dump();

// The results will be ONE record and that is Lance
// This might be surprising because we passed in the ids for Lance and Brent.

// execute the query
var allPeople2 = GetPeopleByIDsWorks(idsToFind);

// output the results
allPeople2.Dump();

// The results will be TWO record and that is Brent and Lance
// This is the expected result and what we wanted.
}




IEnumerable<Person> GetPeopleByIDsBroken(List<int> ids)
{
IEnumerable<Person> builtUpQuery = null;

foreach (int id in ids)
{
// first time through only.
if (builtUpQuery == null)
{
builtUpQuery = people.Where(p => p.ID == id);
}
// all subsequent times
else
{
builtUpQuery = builtUpQuery.Union(people.Where(p => p.ID == id));
}
}

return builtUpQuery;
}

IEnumerable<Person> GetPeopleByIDsWorks(List<int> ids)
{
IEnumerable<Person> builtUpQuery = null;

foreach (int id in ids)
{
// We have to get a local copy of the id.
// Othewise, the value is not taken from the id in the foreach loop
// until the entire expression is evaluated
int localCopyOfID = id;

// first time through only.
if (builtUpQuery == null)
{
builtUpQuery = people.Where(p => p.ID == localCopyOfID);
}
// all subsequent times
else
{
builtUpQuery = builtUpQuery.Union(people.Where(p => p.ID == localCopyOfID));
}
}

return builtUpQuery;
}


3 comments:

Unknown said...

Thanks for this post. Fixed two days of pulling my hair out.

Anonymous said...

thanks so much you helped me alot cause it was a very strange behavior

Anonymous said...

Not a very good idea. You should use Expression Builder