Bare naked LINQ
Some ever recurring discussions around Quaere are “this isn’t really LINQ for Java” and “cool, but this isn’t truly type-safe”. In this post I’ll dive into these topics by explaining how LINQ works…
LINQ to Objects
The language integration in C# 3.0 and Visual Basic 9 makes LINQ very readable and makes queries easy to write, but this integration in not required to use LINQ. Consider this simple example:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var lowNums =
from n in numbers
where n < 5
select n;
When the C# compiler compiles this snippet, it will be translated into this C# 2.0 compatible code:
int[] numbers = new int[] { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
IEnumerable<int> lowNums = Enumerable.Where<int>(numbers, delegate(int n) {
return n < 5;
});
To explain what actually goes on here, let’s recap how some of the new features in C# 3.0 work. Extension methods is one of the new features that are central to LINQ. As with all of the C# 3.0 features, extension methods are just syntactic sugar.
Extension methods are static methods in a static class with the new this modifier on the first parameter. When you import a class with extension methods, the extension methods become available on all applicable types thru instance method semantics. The example below show how you can declare a static class with extension methods, and how the extension method can be invoked.
public static class MyExtensions {
// The WhoAmI method "extends" anything deriving from System.Object.
public static string WhoAmI(this object obj) {
return obj.GetType().Name;
}
}
public class Program {
static void Main(string[] args) {
string s="Hello World";
Console.WriteLine(s.WhoAmI());
}
}
For readers who are familiar with Ruby this resembles mix-ins. In Ruby you can define a mix-in on the module level. Modules in Ruby cannot include instance methods because they are not classes, but you can import modules within a class definition using the include statement. The example below shows a Ruby implementation of the previous C# 3.0 snippet.
def whoAmI?
"#{self.class.name}"
end
s = "Quaere? I've got ambition, baby!"
s.whoAmI?
When we compile our “who am I” sample it will be translated into this snippet when it is compiled:
[Extension]
public static class MyExtensions {
[Extension]
public static string WhoAmI(object obj);
}
public class Program {
static void Main(string[] args) {
string s = "Hello World";
Console.WriteLine(MyExtensions.WhoAmI(s));
}
}
Notice that the C# compiler has erased the this modifier and tagged the extension method with an Extension attribute. Notice also that the s.WhoAmI() method call has been translated into MyExtensions.WhoAmI(s).
The System.Linq.Enumerable class used in the second LINQ query above, is a static holder class for some of LINQs extension methods (the others reside in System.Linq.Queryable which we’ll discuss shortly). However, we don’t use extension methods in our sample, but rather pass the numbers array to the actual static implementation of the Where extension method. This method accepts an IEnumerable, which is the source for the query, and a delegate adhering to the Func<TSource,bool> signature. This delegate signature is defined in the new System.Core assembly where all of the core LINQ features can be found.
The delegate is used as a predicate for the where clause of the query, and it is invoked for every element within the source collection. Whenever the predicate returns true for an element, that element is included in the query result which will be returned by the Where method.
Finding all numbers less than five is a simple example, so let’s look at something different.
string[] categories = { "Beverages", "Condiments", "Vegetables", "Dairy Products", "Seafood" };
Product[] products = Product.getAllProducts();
var q =
from c in categories
join p in products on c equals p.Category into ps
select new { Category=c, Products=ps };
This samples uses a group join operation to get all the products that match a given category bundled as a sequence. There is another C# 3.0 feature at play here - anonymous classes. The new { Category=c, Products=ps } projection produces a new class that is not known at the time of writing. The use of anonymous classes mandates the need for the var keyword which is a semi dynamic feature in C# 3.0. The class returned by the projection will be generated by the compiler, and the type of the g variable will be an IEnumerable where T is the generated class. The compiler generated class will resemble this;
internal sealed class CategoryProducts {
private readonly string category;
private readonly IEnumerable<Product> products;
public string Category {
get {
return category;
}
}
public IEnumerable<Product> Products {
get {
return products;
}
}
public CategoryProducts(string category, IEnumerable<Product> products) {
this.category=category;
this.products=products;
}
}
If we “C# 2.0-ify” the LINQ query, we’ll end up with the following:
string[] categories = new string[] { "Beverages", "Condiments", "Vegetables", "Dairy Products", "Seafood" };
Product[] products = Product.getAllProducts();
IEnumerable<CategoryProducts> q = Enumerable.GroupJoin(
categories,
products,
delegate(string c) {
return c;
},
delegate(Product p) {
return p.Category;
},
delegate(string c, IEnumerable<Product> ps) {
return new CategoryProducts(c,ps);
}
);
While the first sample was easy to follow this one is not. This is where the “LIN” in LINQ comes in. All of the code you’ve seen in the C# 2.0 friendly samples is generated by the compiler from the LINQ queries we write. The great thing about these queries is that they are type safe all the way through.
Naturally, I did not have the option of extending the Java compiler when I wrote Quaere, and this is why Quaere uses a different approach to construct the queries. Still, Quaere isn’t that far from LINQ - remember that we’ve only seen LINQ to Objects at play this far. Things are substantially different when we write queries against other sources such as LINQ to SQL.
LINQ to IQueryable
As mentioned, there is another static holder class called Queryable in LINQ. This is where all the extension methods for the general IQueryable interface can be found. Just like LINQ, Quaere has the same extensibility mechanism and this is what is used to implement Quaere for JPA. Let’s look at what my Quaere for JPA sample would look like if it was written using LINQ to SQL:
Northwind northwind=new Northwind("Server=.\\SQLEXPRESS; AttachDBFileName=Northwind.mdf;");
var customersInWashington =
from c in northwind.Customers
where c.City == "London"
select c;
The Northwind class is a generated class that extends the DataContext class, the Customer property returns a Table<Customer> instance which implements the IQueryable<Customer> interface. If we undress this sample, we’ll end up with the following C# 2.0 code.
DataContext northwind=new Northwind("Server=.\\SQLEXPRESS; AttachDBFileName=Northwind.mdf;");
IQueryable<Customer> customers=northwind.Customers;
Expression alias=Expression.Parameter(typeof(Customer),"c");
IEnumerable<Customer> customersInWA =
Queryable.Where<Customer>(
customers,
Expression.Lambda<Func<Customer, bool>>(
Expression.Equal(
Expression.Property(
alias,
typeof(Customer).GetMethod("get_Region")
),
Expression.Constant(
"WA",
typeof(string)
),
false,
typeof(string).GetMethod("op_Equality")
),
new ParameterExpression[] {
alias
}
)
);
This is quite different from the code we saw in our LINQ to Objects samples, and it is closer to what things look like under Quaere’s hood. The Where method on the Queryable class accepts an expression tree rather than the source / predicate pair used by Enumerable. One thing to note is that this isn’t fully type safe, instead the property reference, constant and comparison expressions rely on types passed as arguments. Further reflection is used to get hold of methods such as the getter for the region property and System.String’s equality operator.
This expression tree is passed to the IQueryable<Customer> instance’s IQueryProvider implementation which will we used to execute the query against the database.
For comparison this how one would create the same expression tree using Quaere’s expression API:
Identifier c = new Identifier("c");
QueryExpression customersInWashington = new QueryExpression(
new FromClause(
Customer.class,
c,
new Statement(
Arrays.<Expression>asList(c)
)
),
new QueryBody(
Arrays.<QueryBodyClause>asList(
new WhereClause(
new EqualOperator(
new Statement(
Arrays.<Expression>asList(
c,
new MethodCall(
new Identifier("getRegion"),
Arrays.<Expression>asList()
)
)
),
new Statement(
Arrays.<Expression>asList(
new Constant("WA")
)
)
)
)
),
new SelectClause(
new Statement(
Arrays.<Expression>asList(c)
)
),
null
)
);
Final words
Even if Quaere has been dubbed “LINQ for Java” by many bloggers, I’ve never claimed that it is the same thing. It is just a library modeled on the same concepts. In fact, there are substantial differences between Java and .NET that suggest it shouldn’t be the same thing. I could have written Quaere as a compiler macro like the LINQ’s language integration (the “LIN”-part) is done, but this would have made Quaere less accessible to the general Java developer.
We’ve seen that LINQ is truly type safe as long as you’re querying IEnumerable’s, and has “compiler provided” type safety when querying IQueryable’s. Thomas has recently committed some sketches for some changes to the Quaere query API that increases it’s type safety.
// NOTE: I've changed some things to make it fit better in this post.
// (Sorry about that Thomas :-)
Integer[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
Integer n = alias(numbers);
List<Integer> lowNums = from(n).in(numbers).
.where(predicate(n, LESS_THAN, 5))
.select(n);
When we sort out some of the challenges with Thomas’ approach, we’ll have better type safety in Quaere. As for LINQ to “anything that implements IQueryable<T>” we’re even, the only thing LINQ has that Quaere hasn’t is a compiler that enforces type safety at compile time.