Back in October 2008 Anders Hejlsberg made an announcement that C#.next would potentially ship with the notion of the “compiler as a service” (around the 55minute mark for the impatient). Everyone was very excited. Suddenly meta-programming was about to become a boat-load easier, as it wouldn’t require IL generation, Expression Tree building or other esoteric techniques – just plain text.
Sometime later, the codename for this initiative was dubbed “Roslyn“, and released its 1st CTP in October 2011. 3 years after it’s initial announcement.
Today I aim to look at the current state of play in Roslyn, how to use it, what it is good for, what it isn’t good for.
What is Roslyn?
Roslyn principally is a rewrite of the C# Compiler in C#, and the VB.NET compiler in VB.NET. Historically both compilers were written in C++, which has hampered the progression of the languages almost to the point of stagnancy*. For all the talk of “opening the black box” and allowing easier meta programming, DSL, and REPL, effectively Roslyn is a way to clean the compiler and make it easier for Microsoft to move forward with a clean code-base and is less about allowing developers to achieve the above end-goals. However this blog is going to cover the public face of Roslyn – it’s APIs.
Roslyn as an API set can really be thought of in 3 areas -
- Access to the AST (abstract syntax tree) & compilation of the AST
- Visual Studio integration
- Scripting & REPL
Immediately when I see C# and scripting together I see disaster. Recently I attended the Software Architect 2012 conference where Rosyln was one of the tracks. Roslyn was heavily promoted for it’s scripting capabilities. the speaker proclaimed “You can now provide Excel-like formulas using Roslyn” and “You’ll be able to let users write plugins for your software”. My thoughts on this are simple
- Excel formulas dont require a compiler. The Cellz parser by Phil Trelford is a couple of hundred lines of code and provides 90% of the functionality of Excel. Building this as a scripting language opens the doors for security holes.
- If you want to allow plugins in your application you might want to consider providing a public plugin API rather than a compiler.
- If you really want scripting, consider using a scripting language – IronPython has been perfectly usable on .NET for years, along with IronRuby, IronScheme and IronJS. Other languages such as LUA (which was specifically designed for scripting) has .NET bindings also.
Access to the AST and compilation is more interesting and will be the focus of this blog post.
Working with Roslyn
The easiest way to get access to Roslyn is through nuget. For the code in this post I’ll be using F# which makes consuming the Roslyn API easier due to it’s advanced pattern matching.
To use Roslyn it’s also good to have a goal – in this case we’ll be attempting to take some C# code, parse it to find some “smelly code”, and to fix that code. The smelly code is here:
As you can see it uses a LINQ Count() to see if there are >0 items in an IEnumerable<T> and returns 1 or 0 based on the result. A simple optimization would be to replace the .Count() > 0 call with .Any().
We start (as always) by opening the relevant Roslyn namespaces.
We can then simply load the code into the AST like so (assuming the code is in a value called badCode)
We can print out the AST by writing a small recursive function
Which will produce the following output for our small code snippet -
As you can see – thats quite a large amount of output for such a small snippet of code. The part we’re really interested in is this
So we can define our strategy to detect this bad code as follows -
- Parse the tree to find a BinaryExpressionSyntax (the comparison in the IfStatementSyntax)
- See if either side is a method call called Count()
- See if either side is a literal with a value of “0”.
- Ensure the Count method is on a type of IEnumerable<T>
All nodes in Roslyn derive from a single base node called SyntaxNode so it’s useful in F# to write some Active Patterns to help deal with this:
Now for the meat! We are going to define a function which accepts a BinaryExpressionSyntax and returns an option type (the return type we’ll cover shortly) – theres quite a lot of code here -
Originally I wrote code like this in C# and was over 300 line of code and quiet impenetrable. F# and pattern matching has helped to significantly reduce this – ironic that using a C# compiler API in F# is simpler isnt it**?
Notice how nowhere in the above code are we checking the type of the memberAccess against IEnumerable – thats because the AST contains NO type information. To get the type information we need an additional step – we need to actually compile the AST.
the returned Compilation object has a method to return the “SemanticModel” – the part which matches type information to SyntaxNodes. If we modify the function isBadUseOfCount to take an additional parameter we can add the following code to check the type:
OK. So no we have a fairly robust way of detecting our code smell. But how do we fix this?
The AST provides an immutable tree structure which represents the code so we cannot directly modify this. Realizing this may be a common usage of Roslyn, the designers have provided an implementation of the Visitor pattern with which to help create a new tree. To access this we must inherit from the class SyntaxRewriter. This is where the return type from isBadUseOfCode s going to be handy, as this is where we are going to take the returned value and replace it. Here’s my implementation:
The SyntaxRewriter automatically walks the tree for us and calls into the virtual methods when it reaches those types. In our case there are 2 scenarios – there’s a BinaryExpression with a bad use of Count(), and there’s not. If our function returns Some then we have detected a bad use of count and return a new node in place of the incoming node. Otherwise we return the base implementation. Here is the implementation of the lines 7 and 8 in C# as a brief example of how ugly this can look in C#. Imagine trying to do C# Codegen like this:
To tie all this together here is the main implementation which fill in the blanks of how all this code interacts.
You can crack open your favourite .net decompiler and see the amazing results. Theres an obvious bug (answers on a postcard) but is functional in the simple cases. But for all that work (and there was lots) it raises a question:
Is this useful?
I’d argue that the main uses for Roslyn (outside of maintenance at Microsoft) is for companies wanting to sell refactoring tools. Jetbrains has this market currently sewn up with Resharper, and I’m not convinced they’ll throw away years of code to switch to an immature and unfinished platform. Possibly Roslyn will enable a competitor to Resharper into the market by providing a lower barrier to entry, but I doubt there’s much market for it.
I can’t see any developer really spending much time playing with this outside of toy projects – it’s just not that friendly or productive. Bear in mind I could have achieved the same refactoring (without any type checks) using a couple of lines of regex, or paid a couple of hundred dollars for a Resharper license. As it stands it’s taken a couple of hours of playing with Roslyn.
And here’s the rub – exposing the compiler directly in this way is complex. Exposing complex things results in complex code. Complexity isn’t necessarily bad – it just limits how useful it can be to 99% of it’s audience.
Ironically, over at MSR in Cambridge Don Syme and the F# team have exposed the F# compiler in a more friendly way in Type Providers. By opening up the compiler with extension points for providing metadata to the compiler they’ve made it easier to consume typed data in F# from sources other than .NET IL. The uses for this are really wide ranging and exciting.
Roslyn is a tremendous engineering feat – it’s a clean(-ish) API over a complex problem. I’m not convinced whether people will really use it, as the complexity involved is staggering for anything other than simple demos.
Roslyn as a project is now over 4 years old and can only compile C# 3 code. That means currently there is no support for async or (ironically) dynamic code. Eric Lippert’s recent departure from Microsoft and the Roslyn project is surely another blow in getting the project completed and out of the door at MS.
Choosing to write a C# compiler in C# is a noble idea (for years languages have been written in “themselves” – Delphi was written in Delphi, F# written in F#, etc – it’s an excellent example of dogfooding). The problem is that C# just isn’t a great language for tackling this problem. F# and its support for pattern matching and discriminated unions, lex/yacc (along with it’s immutability by default, which is a cornerstone of Roslyn) would have been a better choice on the .NET platform, and while C++ is not perfect there are lots of options for lexing and parsing in the C/C++ space, which again C# really lacks.
** F# is an ideal language for not only consuming this API but for also writing code parsers and compilers. One has to wonder if Roslyn had been developed in F# if they’d be finished by now.