The Roslyn Incident

Back in October 2008 Anders Hejlsberg made an announcement that C#.next would potentially ship with the notion of the “compiler as a service” (around the 55minute mark for the impatient). Everyone was very excited. Suddenly meta-programming was about to become a boat-load easier, as it wouldn’t require IL generation, Expression Tree building or other esoteric techniques – just plain text.

Sometime later, the codename for this initiative was dubbed “Roslyn“, and released its 1st CTP in October 2011. 3 years after it’s initial announcement.

Today I aim to look at the current state of play in Roslyn, how to use it, what it is good for, what it isn’t good for.

What is Roslyn?

Roslyn principally is a rewrite of the C# Compiler in C#, and the VB.NET compiler in VB.NET. Historically both compilers were written in C++, which has hampered the progression of the languages almost to the point of stagnancy*. For all the talk of “opening the black box” and allowing easier meta programming, DSL, and REPL, effectively Roslyn is a way to clean the compiler and make it easier for Microsoft to move forward with a clean code-base  and is less about allowing developers to achieve the above end-goals. However this blog is going to cover the public face of Roslyn – it’s APIs.

Roslyn as an API set can really be thought of in 3 areas -

  • Access to the AST (abstract syntax tree) & compilation of the AST
  • Visual Studio integration
  • Scripting & REPL

Immediately when I see C# and scripting together I see disaster. Recently I attended the Software Architect 2012 conference where Rosyln was one of the tracks. Roslyn was heavily promoted for it’s scripting capabilities. the speaker proclaimed “You can now provide Excel-like formulas using Roslyn” and “You’ll be able to let users write plugins for your software”. My thoughts on this are simple

  • Excel formulas dont require a compiler. The Cellz parser by Phil Trelford is a couple of hundred lines of code and provides 90% of the functionality of Excel. Building this as a scripting language opens the doors for security holes.
  • If you want to allow plugins in your application you might want to consider providing a public plugin API rather than a compiler.
  • If you really want scripting, consider using a scripting language – IronPython has been perfectly usable on .NET for years, along with IronRuby, IronScheme and IronJS. Other languages such as LUA (which was specifically designed for scripting) has .NET bindings also.

Access to the AST and compilation is more interesting and will be the focus of this blog post.

Working with Roslyn

The easiest way to get access to Roslyn is through nuget. For the code in this post I’ll be using F# which makes consuming the Roslyn API easier due to it’s advanced pattern matching.

To use Roslyn it’s also good to have a goal – in this case we’ll be attempting to take some C# code, parse it to find some “smelly code”, and to fix that code. The smelly code is here:

As you can see it uses a LINQ Count() to see if there are >0 items in an IEnumerable<T> and returns 1 or 0 based on the result. A simple optimization would be to replace the .Count() > 0 call with .Any().

We start (as always) by opening the relevant Roslyn namespaces.

We can then simply load the code into the AST like so (assuming the code is in a value called badCode)

We can print out the AST by writing a small recursive function

Which will produce the following output for our small code snippet -

As you can see – thats quite a large amount of output for such a small snippet of code. The part we’re really interested in is this

So we can define our strategy to detect this bad code as follows -

  1. Parse the tree to find a BinaryExpressionSyntax (the comparison in the IfStatementSyntax)
  2. See if either side is a method call called Count()
  3. See if either side is a literal with a value of “0”.
  4. Ensure the Count method is on a type of IEnumerable<T>

All nodes in Roslyn derive from a single base node called SyntaxNode so it’s useful in F# to write some Active Patterns to help deal with this:

Now for the meat! We are going to define a function which accepts a BinaryExpressionSyntax and returns an option type (the return type we’ll cover shortly) – theres quite a lot of code here -

Originally I wrote code like this in C# and was over 300 line of code and quiet impenetrable. F# and pattern matching has helped to significantly reduce this – ironic that using a C# compiler API in F# is simpler isnt it**?

Notice how nowhere in the above code are we checking the type of the memberAccess against IEnumerable – thats because the AST contains NO type information. To get the type information we need an additional step – we need to actually compile the AST.

the returned Compilation object has a method to return the “SemanticModel” – the part which matches type information to SyntaxNodes. If we modify the function isBadUseOfCount to take an additional parameter we can add the following code to check the type:

OK. So no we have a fairly robust way of detecting our code smell. But how do we fix this?

The AST provides an immutable tree structure which represents the code so we cannot directly modify this. Realizing this may be a common usage of Roslyn, the designers have provided an implementation of the Visitor pattern with which to help create a new tree. To access this we must inherit from the class SyntaxRewriter. This is where the return type from isBadUseOfCode s going to be handy, as this is where we are going to take the returned value and replace it. Here’s my implementation:

The SyntaxRewriter automatically walks the tree for us and calls into the virtual methods when it reaches those types. In our case there are 2 scenarios – there’s a BinaryExpression with a bad use of Count(), and there’s not. If our function returns Some then we have detected a bad use of count and return a new node in place of the incoming node. Otherwise we return the base implementation. Here is the implementation of the lines 7 and 8 in C# as a brief example of how ugly this can look in C#. Imagine trying to do C# Codegen like this:

To tie all this together here is the main implementation which fill in the blanks of how all this code interacts.

You can crack open your favourite .net decompiler and see the amazing results. Theres an obvious bug (answers on a postcard) but is functional in the simple cases. But for all that work (and there was lots) it raises a question:

Is this useful?

I’d argue that the main uses for Roslyn (outside of maintenance at Microsoft) is for companies wanting to sell refactoring tools. Jetbrains has this market currently sewn up with Resharper, and I’m not convinced they’ll throw away years of code to switch to an immature and unfinished platform. Possibly Roslyn will enable a competitor to Resharper into the market by providing  a lower barrier to entry, but I doubt there’s much market for it.

I can’t see any developer really spending much time playing with this outside of toy projects – it’s just not that friendly or productive. Bear in mind I could have achieved the same refactoring (without any type checks) using a couple of lines of regex, or paid a couple of hundred dollars for a Resharper license. As it stands it’s taken a couple of hours of playing with Roslyn.

And here’s the rub – exposing the compiler directly in this way is complex. Exposing complex things results in complex code. Complexity isn’t necessarily bad – it just limits how useful it can be to 99% of it’s audience.

Ironically, over at MSR in Cambridge Don Syme and the F# team have exposed the F# compiler in a more friendly way in Type Providers. By opening up the compiler with extension points for providing metadata to the compiler they’ve made it easier to consume typed data in F# from sources other than .NET IL. The uses for this are really wide ranging  and exciting.

Conclusion

Roslyn is a tremendous engineering feat –  it’s a clean(-ish) API over a complex problem. I’m not convinced whether people will really use it, as the complexity involved is staggering for anything other than simple demos.

Roslyn as a project is now over 4 years old and can only compile C# 3 code. That means currently there is no support for async or (ironically) dynamic code. Eric Lippert’s recent departure from Microsoft and the Roslyn project is surely another blow in getting the project completed and out of the door at MS.

Choosing to write a C# compiler in C# is a noble idea (for years languages have been written in “themselves” – Delphi was written in Delphi, F# written in F#, etc – it’s an excellent example of dogfooding). The problem is that C# just isn’t a great language for tackling this problem. F# and its support for pattern matching and discriminated unions, lex/yacc (along with it’s immutability by default, which is a cornerstone of Roslyn) would have been a better choice on the .NET platform, and while C++ is not perfect there are lots of options for lexing and parsing in the C/C++ space, which again C# really lacks.

Bootnotes

* Ironically C# is now more likely to stagnate due to the lack of a principal designer rather than poor tooling choices at Microsoft. Anders appears to have effectively left the .NET space to bring static typing to Javascript with Typescript.

** F# is an ideal language for not only consuming this API but for also writing code parsers and compilers. One has to wonder if Roslyn had been developed in F# if they’d be finished by now.

About these ads

About thedo666

Software developer trying to learn a new language - English!
This entry was posted in C#, Compiler, F#, Roslyn. Bookmark the permalink.

26 Responses to The Roslyn Incident

  1. Eric Lippert says:

    There are a number of inaccuracies in this article. To pick a few of the most obvious: First, the use of C++ as an implementation language for the C# and VB languages did not cause the languages to “stagnate”. The idea is ridiculous on the face of it; Microsoft produced five major releases of the C# and VB languages in the last twelve years with that codebase, providing new tools for literally millions of customers. This is *vitality*, not *stagnancy*. Second, my departure will not have a large impact on the Roslyn schedule or its quality; had I felt that Roslyn would be unsuccessful without me, I would not have left. It is in great shape and has an excellent team. Third, C# is an *awesome* implementation language for a compiler. And fourth, of course Roslyn and C# still have Anders as their chief architect. His (excellent) work on TypeScript doesn’t change that.

    • thedo666 says:

      Hi Eric. Firstly, thankyou for taking the time to read my ramblings.

      Obviously you have the benefit of being on the inside (or did), and this is clearly an opinion piece based purely on my own (limited) view of the project. I’d like to address some of your points however.

      **Assuming** that the C++ codebase in use for C# 5 is a direct descendant on the code used in the v1 compiler. Thats at least 10 years of history for 1 executable file – If the codebase wasn’t a cause of stagnation then I apologise. I’ve seen very few 10 year old pieces of code that haven’t been the source of ever slowing release schedules – in fact I’m not sure I’ve seen any.

      Regarding Stagnancy – again this is merely my view. I see Scala on the horizon with a very C#-like syntax doing **many** things I would have thought would have been suited to C# – pattern matching, immutable local values, compile-time mixing, etc. F# has many of these too. Now I know I’m not your only customer, but as these features become commonplace (and they are) C# falls behind – in mindshare anyway. 4 years ago LINQ blew us away. Nothing has had that impact since in C#, which is how I would defind my definition of stagnancy.

      Regarding C# as an *awesome* language to build a compiler: obviously I bow to your experience here – you’re obviously a heavyweight in this arena – but in my opinion there are *better* languages. I guess more people will side with you for 2 reasons –

      1: You’re Eric Lippert :)
      2: A lot of people only know C# and hold a blinkered view that its the best language ever ™. I’m not saying its bad. I’m just saying its a hammer and sometimes you need a screwdriver.

      Finally – the Anders thing. Again – you know, I don’t. I’m just saying what I’m hearing a LOT of – C# 2 came out and 3 was being talked about. Same when C# 3 came out. So far we’re hearing nothing from MS about the future (?) of C#. If anything fills me with any confidence its the passion Miguel and his Mono team have for keeping the platform alive.

      I totally respect your opinion, I I hope none of the blog came across as “Fact” or “Insider information” – its just my humble opinion. For the record (as stated in the main blog) I think that Roslyn is a complex solution to a complex problem – not a bad thing in itself. There was a lot of buzz about what it was good for and I simply felt compelled to write up my thoughts – not may people seem to be writing much about it yet (present company excepted of course).

      Have a merry Christmas

      • Eric Lippert says:

        To follow up on a few of your points:

        * There is no doubt that a crufty 10+ year old C++ codebase does make it harder to add new features, but that doesn’t mean that the language is stagnating; far from it. Our decision to do a total rewrite in C# was not taken lightly, and we do hope that doing so will make it cheaper to add new language features. However, that said, it is important to remember that the principal “brake” that is slowing down language innovation is the need to be as close to 100% backwards compatible as possible with an ever-growing body of real-world mission-critical code that is using ever-more-complex features. Programming languages become victims of their own success, and C# has been very successful. You should expect the rate at which major game-changing feature like generics, iterators, LINQ, dynamic and async are produced to slow somewhat over time.

        * F# is an awesome language to make a compiler in too. We considered developing Roslyn in F#, but remember, we already had an entire team of the most expert C# and VB programmers you could ask for, so the decision to go with C# and VB as implementation languages was an easy one.

        * Microsoft is making major investments in tooling for C#, VB, JavaScript, C++ and other languages. Making great tools so that developers can profitably make software for Microsoft’s platforms is what it is all about. Sometimes that investment will take the form of making the dev tools better and sometimes it will take the form of language innovation; what’s important is that the investments are being made.

        Thanks for your cogent criticisms of Roslyn, by the way. Consider posting them to the Roslyn forum; I’m sure that Roslyn program management would love to get your feedback!

        And happy Christmas to you too!

  2. Philip says:

    Actually, there’s one way this could become used by way more than refactoring tools or toy projects: by having a mechanism that allows you to have code in your project that modifies the AST pre-compilation.

    Imagine if you could create a SyntaxRewriter in your project that would be called during every compilation – similar to Boo. We could get rid of boilerplate INotifyPropertyChanged code, add “isDirty” support to entity classes without having to write code in every property, change all event handlers to weak event handlers without having to mess with syntax, enforce much more nuanced null checking or generic constraints…

    • thedo666 says:

      Absolutely fair – thats another use for it (the INotifyPropChanged example). But it also kindof backs up a point of mine also – you wouldnt want to code-gen anything more than 1 line of code in it.

      I just cant see software houses spending time doing that level of customisation. Which is why i mentioned a potential for bringing a potential Resharper competitor out – but doing it myself? I’l Pass.

      As for more type checks – that would require 2 compile steps (you couldnt get the type information without a 1st compile) so pre-compilation is probably a bad term.

      Merry Christmas!

    • nicolas says:

      casted in a multistage programming, this is all about which ‘runtime level’ you are working on. one’s dynamic runtime is another’s static environment, and they are all geared toward producing new types. Navigating with ease between those levels would make those scenario possible, and going up the ladder you’d naturally get uncluttered “business DSL” for free.
      So far the only innovation I see in that space are the type providers in F#, which are great for 99% of our current programming case.
      But if we extend that thinking which is definitely the good direction, we’ll need lighter way to switch “level” than compiling a dll and launching a visual studio attached as debug to another visual studio..

      What is quite nice though is the level of langage competition, in the computing ecosystem. Scala, Erlang, Ruby, and somehow Haskell.

      In that platform battle MSFT has one nice and decent gem, which is F# that tackles elegantly the problems of today that counts : big data, parsimonious and expressive code, web programming capacity improving at a fast pace.
      Those features make it a real contender to become the data-integration langage of tomorrow.

      C# will always be there, and is a very good langage, but I dont quite see where it is going. it will be there in the future, but will it write the future ?

  3. nicolas says:

    The killer feature of roslyn would be multi stage compilation/runtime. you read it here first.
    Of course, F# is miles away in terms of easyness of usage. In that langage, ideas are code, and code is ideas. it beats the c# OO in every way, and on top of has first class functional credential.

    But the Roslyn I feel is a different story. one still in the making for sure, but that could yield something. in the future…

  4. Jon Harrop says:

    I find Eric Lippert’s comments on your blog quite self-contradictory:

    “C# is an *awesome* implementation language for a compiler”

    “…we already had an entire team of the most expert C# and VB programmers you could ask for, so the decision to go with C# and VB as implementation languages was an easy one”

    So an entire team of experts worked for over 4 years to copy a basic feature other languages have had for decades and have yet to deliver but Eric still claims that C# is a good tool for problems like this. To me, that is strong evidence that C# is not only a poor implementation language for a compiler but that teams of experts using C# cannot compete with an individual developer using a language bred for metaprogramming.

    • Rodrick Chapman says:

      My understanding is that Roslyn is not merely a rewrite of the compiler. It’s designed to be used during the editing process and so has to return useful information about code that is in a “bad” state (which will be the case most of the time while editing). So, in addition to having syntax nodes for a normal AST, it also needs syntax nodes for various kinds of incomplete and bad code.

      I imagine that they didn’t know exactly how many kinds of syntax nodes that they would need; this amounts to having an unbounded sum type and pattern matching breaks down in such a situation.

      They also have to make it fast enough to work between keystrokes.

      • Jon Harrop says:

        “I imagine that they didn’t know exactly how many kinds of syntax nodes that they would need; this amounts to having an unbounded sum type and pattern matching breaks down in such a situation”. Firstly, if they didn’t have a design they were incompetent. Secondly, this is still nothing knew, Thirdly, pattern matching does not “break down” in the presence of open sum types. Exceptions are open sum types. OCaml’s polymorphic variants are open sum types. In Mathematica, everything is open and can be pattern matched over.

  5. Craig Stuntz says:

    The Delphi compiler is written in C, not Delphi. (Some of) the Delphi IDE is written in Delphi, but not the compiler.

  6. jpobst says:

    “Choosing to write a C# compiler in C# is a noble idea…”

    Mono’s C# compiler has been written in C# since the beginning (over 10 years ago). Same for Mono’s VB.NET compiler.

    • thedo666 says:

      The mono team managed to build the compiler in c# in under 4 years also. Wonder what the hold up is with Roslyn?

      You **can** use c# to do almost anything. It’s just not what I would have chosen :)

      • jpobst says:

        As an open source project, the quality standard for Mono is much, much lower. If it works on the developer’s machine, you release it, get bug reports, fix them, release again. There are still plenty of reported bugs on Mono’s compiler that haven’t been fixed.

        Microsoft has to release a product that actually works. Which means it has to compile the existing billions of lines of C# code exactly the same (including any bugs that people have come to rely on.)

        But yes, I agree that 4+ years is pushing it. :)

    • Foo says:

      Why doesn’t Microsoft just adopt the Mono compiler?

  7. Various Furriness says:

    tl;dr

    Have you tried Nemerle, guys?

  8. Pingback: F# Weekly #52, 2012 – New Year Edition « Sergey Tihon's Blog

  9. Mememe says:

    Agree with “long building” of Roslyn (and same long for “VS using Roslyn”). I cannot wait 5 more years until MS product will be polished. It could be way faster if MS looked at Nemerle.

  10. SamNium says:

    Please, take a look at http://qinjection.codeplex.com/.
    I’m developping it upper the roslyn api.
    There is still work to do … and any comment will be accepted.

  11. Arturo Hernandez says:

    You make a really good argument for building roslyn in F#. I suspect the choice may have to do with risk aversion. Similar to the question all programmers face. Do we really want to venture into writing our everyday production code in F#? Or do we stay in our comfort zone.

    I would disagree in the use cases for Roslyn. Writing code in one way, to have it be translated and executed in a very different way is certainly a use for it. LINQ does that. But it could do much more, and roslyn is a good way to get there.

  12. Pingback: Building a C# compiler in F# | Neil Danson's Blog

  13. Hung Nguyen says:

    The only thing matter to me, and to 99% .NET developers I believe, is whether the new Roslyn compiler can make application written in C# running more quickly or not ? The fact that desktop applications written in C# is running faster than Java but a huge more slowly than C/C++.

    • thedo666 says:

      Exactly – RyuJIT, SIMD and .NET Native are much more likely to make that happen than a re-architected compiler, which by its own definition should produce the same code as before, meaning Roslyn itself shouldn’t contribute to any performance increases.

      Another goal is to include features more easily – C# has certainly fallen behind the “state-of-the-art” in terms of language features, but the current set of proposed C# 6 features are pretty minimal.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s