Bare naked LINQ

Published 18 October 07 06:07 AM | andersnoras 

Some ever recurring discussions around Quaere are “this isn’t really LINQ for Java” and “cool, but this isn’t truly type-safe”. In this post I’ll dive into these topics by explaining how LINQ works…

LINQ to Objects

The language integration in C# 3.0 and Visual Basic 9 makes LINQ very readable and makes queries easy to write, but this integration in not required to use LINQ. Consider this simple example:

int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var lowNums =
    from n in numbers
    where n < 5
    select n;

When the C# compiler compiles this snippet, it will be translated into this C# 2.0 compatible code:

int[] numbers = new int[] { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
IEnumerable<int> lowNums = Enumerable.Where<int>(numbers, delegate(int n) {
    return n < 5;
});

To explain what actually goes on here, let’s recap how some of the new features in C# 3.0 work. Extension methods is one of the new features that are central to LINQ. As with all of the C# 3.0 features, extension methods are just syntactic sugar. Extension methods are static methods in a static class with the new this modifier on the first parameter. When you import a class with extension methods, the extension methods become available on all applicable types thru instance method semantics. The example below show how you can declare a static class with extension methods, and how the extension method can be invoked.

public static class MyExtensions {
    // The WhoAmI method "extends" anything deriving from System.Object.
    public static string WhoAmI(this object obj) {
        return obj.GetType().Name;
    }
}
public class Program {
     static void Main(string[] args) {
           string s="Hello World";
           Console.WriteLine(s.WhoAmI());
     }
}

For readers who are familiar with Ruby this resembles mix-ins. In Ruby you can define a mix-in on the module level. Modules in Ruby cannot include instance methods because they are not classes, but you can import modules within a class definition using the include statement. The example below shows a Ruby implementation of the previous C# 3.0 snippet.

def whoAmI?
     "#{self.class.name}"
end

s = "Quaere? I've got ambition, baby!"
s.whoAmI?

When we compile our “who am I” sample it will be translated into this snippet when it is compiled:

[Extension]
public static class MyExtensions {
      [Extension]
      public static string WhoAmI(object obj);
}
public class Program {
    static void Main(string[] args) {
        string s = "Hello World";
        Console.WriteLine(MyExtensions.WhoAmI(s));      
    }
}

Notice that the C# compiler has erased the this modifier and tagged the extension method with an Extension attribute. Notice also that the s.WhoAmI() method call has been translated into MyExtensions.WhoAmI(s).

The System.Linq.Enumerable class used in the second LINQ query above, is a static holder class for some of LINQs extension methods (the others reside in System.Linq.Queryable which we’ll discuss shortly). However, we don’t use extension methods in our sample, but rather pass the numbers array to the actual static implementation of the Where extension method. This method accepts an IEnumerable, which is the source for the query, and a delegate adhering to the Func<TSource,bool> signature. This delegate signature is defined in the new System.Core assembly where all of the core LINQ features can be found. The delegate is used as a predicate for the where clause of the query, and it is invoked for every element within the source collection. Whenever the predicate returns true for an element, that element is included in the query result which will be returned by the Where method.

Finding all numbers less than five is a simple example, so let’s look at something different.

string[] categories = { "Beverages", "Condiments", "Vegetables", "Dairy Products", "Seafood" };
Product[] products = Product.getAllProducts();
var q =
    from c in categories
    join p in products on c equals p.Category into ps
    select new { Category=c, Products=ps };

This samples uses a group join operation to get all the products that match a given category bundled as a sequence. There is another C# 3.0 feature at play here - anonymous classes. The new { Category=c, Products=ps } projection produces a new class that is not known at the time of writing. The use of anonymous classes mandates the need for the var keyword which is a semi dynamic feature in C# 3.0. The class returned by the projection will be generated by the compiler, and the type of the g variable will be an IEnumerable where T is the generated class. The compiler generated class will resemble this;

internal sealed class CategoryProducts {
    private readonly string category;
    private readonly IEnumerable<Product> products;
    public string Category {
        get {
            return category;
        }
    }
    public IEnumerable<Product> Products {
        get {
            return products;
        }
    }
    public CategoryProducts(string category, IEnumerable<Product> products) {
        this.category=category;
        this.products=products;
    }
}

If we “C# 2.0-ify” the LINQ query, we’ll end up with the following:

string[] categories = new string[] { "Beverages", "Condiments", "Vegetables", "Dairy Products", "Seafood" };
Product[] products = Product.getAllProducts();
IEnumerable<CategoryProducts> q = Enumerable.GroupJoin(
    categories,
    products,
    delegate(string c) {
        return c;
    },
    delegate(Product p) {
        return p.Category;
    },
    delegate(string c, IEnumerable<Product> ps) {
        return new CategoryProducts(c,ps);
    }
);

While the first sample was easy to follow this one is not. This is where the “LIN” in LINQ comes in. All of the code you’ve seen in the C# 2.0 friendly samples is generated by the compiler from the LINQ queries we write. The great thing about these queries is that they are type safe all the way through. Naturally, I did not have the option of extending the Java compiler when I wrote Quaere, and this is why Quaere uses a different approach to construct the queries. Still, Quaere isn’t that far from LINQ - remember that we’ve only seen LINQ to Objects at play this far. Things are substantially different when we write queries against other sources such as LINQ to SQL.

LINQ to IQueryable

As mentioned, there is another static holder class called Queryable in LINQ. This is where all the extension methods for the general IQueryable interface can be found. Just like LINQ, Quaere has the same extensibility mechanism and this is what is used to implement Quaere for JPA. Let’s look at what my Quaere for JPA sample would look like if it was written using LINQ to SQL:

Northwind northwind=new Northwind("Server=.\\SQLEXPRESS; AttachDBFileName=Northwind.mdf;");
var customersInWashington = 
    from c in northwind.Customers
    where c.City == "London"
    select c;

The Northwind class is a generated class that extends the DataContext class, the Customer property returns a Table<Customer> instance which implements the IQueryable<Customer> interface. If we undress this sample, we’ll end up with the following C# 2.0 code.

DataContext northwind=new Northwind("Server=.\\SQLEXPRESS; AttachDBFileName=Northwind.mdf;");
IQueryable<Customer> customers=northwind.Customers;
Expression alias=Expression.Parameter(typeof(Customer),"c");
IEnumerable<Customer> customersInWA = 
    Queryable.Where<Customer>(
        customers,
        Expression.Lambda<Func<Customer, bool>>(
            Expression.Equal(
                Expression.Property(
                    alias, 
                    typeof(Customer).GetMethod("get_Region")
                ),
                Expression.Constant(
                    "WA",
                    typeof(string)
                ),
                false,
                typeof(string).GetMethod("op_Equality")
            ),
            new ParameterExpression[] {
                alias
            }
        )
    );

This is quite different from the code we saw in our LINQ to Objects samples, and it is closer to what things look like under Quaere’s hood. The Where method on the Queryable class accepts an expression tree rather than the source / predicate pair used by Enumerable. One thing to note is that this isn’t fully type safe, instead the property reference, constant and comparison expressions rely on types passed as arguments. Further reflection is used to get hold of methods such as the getter for the region property and System.String’s equality operator. This expression tree is passed to the IQueryable<Customer> instance’s IQueryProvider implementation which will we used to execute the query against the database.

For comparison this how one would create the same expression tree using Quaere’s expression API:

Identifier c = new Identifier("c");
QueryExpression customersInWashington = new QueryExpression(
    new FromClause(
       Customer.class,
       c,
       new Statement(
           Arrays.<Expression>asList(c)
       )
   ),
   new QueryBody(
       Arrays.<QueryBodyClause>asList(
           new WhereClause(
               new EqualOperator(
                   new Statement(
                       Arrays.<Expression>asList(
                           c,
                           new MethodCall(
                               new Identifier("getRegion"),
                               Arrays.<Expression>asList()
                           )
                       )
                   ),
                   new Statement(
                       Arrays.<Expression>asList(
                           new Constant("WA")
                       )
                   )
               )
           )
       ),
       new SelectClause(
           new Statement(
               Arrays.<Expression>asList(c)
           )
       ),
       null
    )
);

Final words

Even if Quaere has been dubbed “LINQ for Java” by many bloggers, I’ve never claimed that it is the same thing. It is just a library modeled on the same concepts. In fact, there are substantial differences between Java and .NET that suggest it shouldn’t be the same thing. I could have written Quaere as a compiler macro like the LINQ’s language integration (the “LIN”-part) is done, but this would have made Quaere less accessible to the general Java developer. We’ve seen that LINQ is truly type safe as long as you’re querying IEnumerable’s, and has “compiler provided” type safety when querying IQueryable’s. Thomas has recently committed some sketches for some changes to the Quaere query API that increases it’s type safety.

// NOTE: I've changed some things to make it fit better in this post.
// (Sorry about that Thomas :-) 
Integer[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
Integer n = alias(numbers);
List<Integer> lowNums = from(n).in(numbers).
    .where(predicate(n, LESS_THAN, 5))
    .select(n);

When we sort out some of the challenges with Thomas’ approach, we’ll have better type safety in Quaere. As for LINQ to “anything that implements IQueryable<T>” we’re even, the only thing LINQ has that Quaere hasn’t is a compiler that enforces type safety at compile time.

Filed under: , , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# Casper said on October 18, 2007 7:13 PM:

"...the only thing LINQ has that Quaere hasn’t is a compiler that enforces type safety at compile time" which is a pretty huge thing to have in place for an über static language like Java (no pre-processor, checked exceptions, no type inference etc...).

An interesting article but it spawned a few questions in my head. How about the Java type erasure problem, surely there is a bunch of stuff Quaere can't query unlike LINQ? And would a true LINQ port be possible if Java had support for lambda expressions and extension methods?

# Thomas Mueller said on October 18, 2007 10:10 PM:

Quaere can be even made simpler:

Integer[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };

List<Integer> lowNums = from(numbers, N)

   .where(N.lessThan(5)).select();

This query is actually type safe. Type erasure is not a problem here.

> stuff Quaere can't query

I don't think there is a limit. A few things are simpler in Quaere, and a few things more complicated, for example anonymous classes.

By the way, does LINQ have autocomplete, I mean does the IDE understand that a 'where' can follow now? This is not a problem for Quaere.

# Casper said on October 19, 2007 3:09 AM:

"By the way, does LINQ have autocomplete"

Yes it does, and it also assists in column table name lookup so you don't have to guess in blind like with JPA annotations.

# Anders Nor??s' Blog : Auto completion for LINQ and Quaere (take 3) said on October 19, 2007 4:50 AM:

PingBack from http://andersnoras.com/blogs/anoras/archive/2007/10/18/auto-completion-for-linq-and-quaere-take-3.aspx

# Sigurd said on October 19, 2007 11:58 AM:

Casper, I think Anders has shown that extension methods aren't a must have for LINQ - after all he's written the queries without using them...

# Anders Norås' Blog said on November 6, 2007 2:21 PM:

Casper wrote an interesting comment to my screen cast about LINQ and Quaere auto completion which deserves

# Gordon M said on November 26, 2007 7:52 AM:

The lack of delegates/function pointers in Java would make for a cumbersome approach using observer classes, so that the actual query statement cannot be completed in a single statement. Ahem...

# Anders Norås' Blog said on December 6, 2007 2:39 AM:

A while a go Neil Gafter proposed extension methods as a new feature in Java 7. This is an interesting

# Anders Norås' Blog said on January 30, 2008 12:52 PM:

Today it is exactly one year since I pick up on blogging after a long break. To celebrate, I&#8217;ll

# dave^2=-1 said on February 1, 2008 2:22 AM:

Reading Anders' post on Lexical Closures, Deferred Execution and Kicker Methods with respect to LINQ

Leave a Comment

(required) 
(optional)
(required) 
Enter the code you see below