Monday, April 5, 2010

Switching from C# to Java

In 2008 RedHat acquired Qumranet, a startup whose focus was Virtualization. Among other products Qumranet developed a management application for Virtualization.

The management application was written in C# and one of the first tasks we got was to make the management application cross platform, well this was expected considering the fact that the acquisition was done by RedHat...

We started exploring the web looking for ideas how to approach this task. At the beginning things did not look promising most of the references we found for porting projects from one technology to another were about complete failures, the only obvious suggestion that we saw all over was not to change technology and architecture at the same time.

Armed with this important advice we kept digging around and realized there are two different paths we can take. The first was to stick with C#, and the second, surprisingly, was to change technology.

Sticking with C# requires technologies to help us run it on Linux. We found Mono which is an open source implementation of Microsoft's .NET Framework which can be executed on Linux, and the second option we found was Grasshopper which is a project of Mainsoft to compile MSIL to Java Bytecode. The idea is to code in C#, compile it to Java Byte code and run the code on JRE on any JRE enabled platform, Linux included. Cool stuff right?
These 2 solutions were taken off the table, we are on our way to open source our project (Red Hat remember...) and we wanted to use a technology supported by Red Hat. In addition a basic POC we did (POC #1) using MONO proved that the technology was immature at the time and did not meet our needs.

Okay switching technology then. At this point Java looked like the natural choice. It is cross platform, has a lot of enterprise framework ready to use and it's syntax and OO principles are very similar to C# which is an advantage when it comes to training the development team.

Here is a problem for CS guys:
input: 2.5 developers, 4 month period, 100k lines of C# code and a relatively mature management application
output:parity application in java
constraint: during these 4 months 4 developers are adding required features on the C# version
algorithm: ?

After we realized we have no cheat sheet for this problem we considered 3 options:
1. Manual - Writing it all from scratch in java
2. Hybrid - integrate Java modules in the C# application
3. Automatic - automatically convert the code (Yes we believe in miracles, at this point who wouldn't?)

Manual -The obvious option was writing it from scratch, every developer's dream, who does not think that the second time he will write the same code in the most generic flawless way? well nice but no thank you, first of all it is most likely that we will not do the same mistakes again instead we will introduce new bugswhich will probably take forever to fix. In addition we will probably lose whatever maturity the application has. Another con which eventually ruled this option out was that writing from scratch in Java while trying to catch up on the moving target in C# seems impossible to us. We did another POC (POC #2) with a new generic architecture to get an estimate of the time and amount of work for this approach.

Hybrid - The current architecture of the C# version is based on the Command design pattern. Generally we can say that the flows/actions in the system can be mapped to commands. The Hybrid approach was to gradually port the flows from C# to java and during the process have both technologies live side by side. The obvious pros of such an approach are that we can have the system running at all times which can be good for the system maturity, we could also keep developing new features in that period (write them in Java of course) and we could harness the whole team for the conversion (+4 developers to work with us). Looking at the cons this is considered a high risk path, we cannot deliver such a product until the conversion process is complete and the time estimates for this process were kind of random, it actually risked the next release (which originally was planned in C# and was due at the end of the 4 month period), another con was that we realized that for the Hybrid approach we had to have some of the infrastructure ready in Java before we start migrating flows, which turned out to be a lot of code, we might as well port the whole thing. One other difficulty we encountered was that managing database transactions in such a system was very problematic. Anyway the high risk was decisive and yet another POC goes down the drain (POC #2.5, we can't actually call it a POC because it was never executed...)

Automatic - The inspiration for this was an article about Boeing and automatic conversion. Well we thought "if Boeing can do it so can we". Sounds stupid? Well it is.
Luckily for us we did not think that at the time. We looked for automatic conversion tools from C# to java, we actually came across 2 of them one is net2java, an abandoned project, which did not help us much, the second one was Tangible with which we started a POC (POC #3).
At the beginning it looked horrible, we converted the project and got over 50K compilation errors but as I mentioned the Boeing article was inspiring (plus we did not have other options) so we tried looking into the compilation errors, what we found was that some of them were repetitive and by doing a little work we can eliminate some of them.
We started by sending Tangible one or two issues which dominated the errors list, we were pleasantly surprised by Tangible's support, they were so cooperative and fast (or shall I say "he was", Dave, was one very efficient developer) it actually encouraged us to send a second chain of bugs followed by a third and a forth. We ended up filing more than 200 issues (over a 6 month period) of bigger and smaller issues we encountered on our journey to Java.

In addition to the ongoing effort with Tangible we had to dumb down the C# code.
For example we had to remove the usage of the Linq library in the C# side since it is not converted to Java properly.
And in the same time we wrote some "sed" scripts to manipulate the Java output and fix some errors like packaging, adding import statements at the beginning of the class or adding a static data member (logger) in each class.

Obviously there was some manual work as well, but I'll elaborate on the technical details of the conversion process itself in my next post.

It took us around 4 months to stabilize the process and get to a working version of our application in Java. We got to a parity version in Java which passed 90% of our automatic testing. We sometimes still can't believe it.
We did not do a scalability test on the Java version yet, I will blog about it as soon as we have some results.

Although for automobiles the next generation is hybrid for us automatic conversion did the job.