Sunday, December 4, 2011

Migrating oVirt from MS-SQL to PostgreSQL

This is a second post in the series of changing our application from Microsoft-based technologies to open sourced technologies.

The first post was about switching from C# to Java, this post is about changing the DBMS
from MS-SQL to PostgreSQL.

When we initially thought how to introduce this fundamental change to the application without affecting stability/maturity, we thought it would be right to:
  • Limit the code changes to the DB access layer code
  • Leave the DB schema and other components in the application untouched
By following these guidelines we can run the test suits we have, before and after the changes, as well as test the application on exiting databases with real data to find regressions.

At first we thought of using JPA and abstracting the DBMS layer with no tight coupling to Postgres. While looking into using JPA we realized that we do not have 'object-oriented' data base schema and for getting that we had to introduce changes both to the DB schema and the Java entities.
Changing the Java entities meant changing code all over the application, so although JPA seems like the long-term solution we strive for, to ease the migration path we were looking for a stepping stone.

The application's DB-access-layer was based on MS-SQL Stored Procedures (a set of pre-compiled SQL statements). In parallel to JPA we looked into migrating the SP to Postgres functions.
We found a tool named SQLWays for doing that automatically, after a few cycles with the tool we got about 70% of the work done, it handled both the SP translation and the db schema creation scripts.
In addition to that we wrote some sed scripts to manipulate the generated Postgres functions and do manual adaptations for specific more complicated SP, table views etc.
Most of our SP are simple ANSI-SQL and the db layer does not change often. This helped us getting the work done relatively fast. Using Postgres functions was starting to feel like a real alternative.

The next step was executing the functions from within the code, which was straight forward except for one change that is worth mentioning.

Cursors:

In MSSQL we used global cursor to return the SP results, the global cursor was translated by the tool to REFCURSORS. The problem was that we had code in the application that accessed the SP-results after the DB transaction was closed. In MSSQL it worked because the global cursor enables accessing the results after the transaction is committed.
The REFCURSORS in Postgres, on the other hand, are closed when the transaction is committed and we failed retrieving the data.
PostgreSQL offers another way to handle that - SETOF, it returns a set of rows which are treated by PostgreSQL as a table and available regardless of the DB transaction state.

After handling the cursors we got the application running on Postgres, you can imagine the big smiles it put on our faces, only it did not last long...
We soon realized we have a huge problem which we did not foresee coming - "read uncommitted" ?!?

Read uncommitted

The ANSI SQL standard defines 4 transaction isolation levels: serializable, repeatable-reads, read-committed and read-uncommitted.
The isolation level controls the degree of locking that occurs when selecting data.
Our application used MSSQL 'Read Uncommitted', which stands in the spec for data that have been updated but not yet committed and may be read by other transactions.

I am not going to defend the choice of using this isolation level as I have
never supported it, however I will describe the motivations to use it.
One motivation was to have a more responsive UI. Reflecting the current application status as fast as possible, even at the 'price' of occasionally exposing the user to a temporary state.
Another motivation was to reduce the possibility of deadlock, in MSSQL row locks
can be escalated to page or table locks for large read operations [see 1] thus,
by using read-uncommitted, table locks are less likely to occur.

Regardless if using read-uncommitted is the right solution for those issues, the fact is that the application is indeed very responsive and rarely experiences dead-locks.

When we got the application running on Postgres we noticed that the application behaves differently, the UI is less responsive and we stumbled into dead-locks quite often. Looking into it a little we found that in Postgres when you specify working with
read-uncommitted isolation level you de-facto get read-committed [see 2].
That 'forced' us to find an alternative solution for getting good UI responsiveness and avoid table locking. Then we introduced the compensation mechanism.

Compensation

After digging around we figured that we can improve system behaviour by reducing the locking scopes to shorter code segments, or in other words deal with the relatively long transactions in the application.

The idea was to break flows in the application into short segments, each segment is executed within an independent transaction.
By breaking the flow into segments we solved both issues:
- The UI responsiveness improved because the data was committed earlier and was 'available' for other transactions.
- The DB locking scope was shorter thus reducing the amount of dead-locks.

This solution could have been perfect if we did not have to handle error flows :)

Handling error flow was interesting, as we could encounter an error after some segment/s were committed, that means we need to undo work that was done by previously committed transactions.

For that we introduced the compensation mechanism.
The basic concept of compensation is that in each segment, if a persisted entity changes you take an in-memory snapshot of the original entity before changing it. when you get to the end of a segment and commit the transaction associated with this segment you also persist the in-memory snapshots into the DB (in a dedicated compensation table), then you move to the next segment. If the flow ended successfully you remove the snapshots from the DB.
In case of a failure in the flow we compensate for the previously committed transactions by
recovering the original data according to the snapshots we persisted in the DB.



We also implemented 2 optimizations for the compensation:
1. snapshoting an entity only before the first time it changes (currently compensation always return to the initial state, before the flow started, so if the entity changed a couple of times we only need the original state)
2. snapshotting only part of the entity properties as most of the time the only
property that was changed is the entity status.


The work is done and the application is running on Postgres.
If you want to take a peek at the code, it is open source so go ahead and look.

The project is oVirt - http://www.ovirt.org
The oVirt-engine-core code is available for HTTP browsing.

enjoy,
Livnat

[1] - http://msdn.microsoft.com/en-us/library/ms173763.aspx
[2] - http://www.postgresql.org/docs/8.4/static/transaction-iso.html

Wednesday, September 21, 2011

oVirt

It's been over a year since the "switching from c# to java" post and the full Linux version is ready and public (although still in beta)!

The really big news is that now the code is being open sourced and it is part of the oVirt open source stack.

oVirt is a new open source stack for virtualization which is being supported by some of the biggest players in the market: Cisco, IBM, Intel, NetApp, RedHat, SUSE and we expect to get more support as the project expands.

As part of launching oVirt we are having a three-day workshop.
The workshop includes sessions to cover the technical projects details, governance, getting involved, usage and much more.
The full git repos (source), site, forums will be launched at the event.

The workshop is open to the public and takes place at Cisco main campus, San Jose, California, 1st - 3rd November 2011.
Read more on the event page


We are very excited about open sourcing the code, we were waiting for this for quite some time and now it is actually happening.
By opening the code we hope the project will grow faster, become better and introduce new capabilities that users are actually looking for.
We want to provide a true, open alternative in the virtualizatin world.


Livnat

Monday, July 19, 2010

Recruiting a new employee

I find the recruiting process to be a complicated and delicate process.
The first non trivial task is to define what you're looking for, that's one of the most critical parts in the process. Skipping this part and you can end-up with poor results.
When you're buying shoes before going to a beach party you wouldn't buy high heeled shoes, no matter how pretty they look.
After you know what you are looking for there are 2 important questions to answer when you hire a new team member:
1. Does the candidate have the required technical skills?
2. Will he/she fit in socially?
For answering these 2 questions we have a lengthy process, I would like to share with you 2 of the phases which I find very helpful:
1.Interview by the team members
2.Technical exercise – a programming exercise given on site.


climbing the villarica


Interview by the team members:
I find many advantages to interviewing a candidate by the team members. It is a vote of confidence in your team members. For me it is like saying "I trust you and I think you are capable of knowing what is 'good' for our team". I also think that being part of the decision making of the "future" composition of the team to be empowering.
In addition it gives another set of skills to your team members that they might need in the future, which is great.
By making my team members part of the recruiting process they feel committed to the success of the new team member. I find them to be more supportive and involved in helping the new team member feel at home.
There is another very important aspect to recruiting, you want to make sure the candidate wants to come to work for your company, that he is aware of how great it is.
Talking to his peers might make your candidate more enthusiastic about coming to work for your company. The candidate feels more comfortable asking a colleague questions about the day to day work and usually finds the answers to be more reliable.
Many people get to meet the candidate on the way which I find beneficial for both sides.
It gives the candidate a better understanding of the type of people working for the company and it gives the company a better chance to evaluate correctly the candidate.
A great byproduct of having your team members interview candidates is freeing up your time to focus on other tasks. Delegating in this case works great (for me it is more than a byproduct, it is a significant time saver)
It is worth mentioning that not every team member is suitable for performing interviews, for example new team members, but that's for another post…

climbing the villarica


Technical exercise:
This is a great step in our interviewing process, it separates the candidates with good theoretical knowledge from the ones that can also deliver results.
The technical exercise is about using new APIs and reading existing code, not about testing which technologies and framework the candidates already knows.
For example, I wrote a bowling simulator. Before each match the bowler has to deposit an item with the cashier and retrieve it after the match ends. The code was written for a single threaded match and the candidate needs to modify the code to run more than one match simultaneously.
This step checks the ability to handle a task from start to end and deliver results at the end. In this step I pay attention to the questions the candidate asks me, for example there are candidates who ask a lot of questions along the way, which might indicate they need more guidance, and there are the ones who don't ask questions and solve a problem which is different than the one presented them.
Another thing which I find very important in this step is that by the end of the exercise the candidate produces a program which I can run.
It is less important if there are bugs or if his program solves 100% of the problem. I find those who are actually capable of delivering some kind of solution to have more potential.
I also put less emphasis on the time it takes, obviously it cannot take 2 days but I don't think that if some candidate takes 1 hour and another takes 2 hours has much significance, I usually tend to relate it to the stress the candidate might feel during an interview.
Anyway there is no guarantee for the results of the recruiting process, it can always surprise you.
Make sure you take into consideration all the things you have noticed along the way, the candidate's technical skills, behavior and general attitude. If you have a clear picture: "Yes, I would like to work with this candidate in the future" or "No, this candidate is not suitable for the team needs", then you probably did something right.

I would like to thank Oded for helping with this post.

Saturday, May 15, 2010

Chasing a moving target

Chasing a moving target:

A situation where you tell your customer "… I am not sure when the next version is expected or what features will be included in it but let me assure you sir, it is going to be good…" is a situation you probably want to avoid. Engineers can actually say that and feel good about themselves, but the product management, marketing and sales, well they will not tolerate it and unfortunately they can be loud...really loud.

If you can relate to the above you probably understand what we were trying to avoid in our porting project.
We were committed to move to Java and in the meantime release a C# version with new features.

In my last post I described the different options we considered for porting from C# to Java and concluded in the path we chose, the automatic conversion.
This post is about the technical process of automatic porting and the difficulties in chasing a moving target.

One of the objectives was reducing entropy, we could not commit on how long will it take us to stabilize an automatic converted project. It is actually more than stabalizing it is also filling the infrastructure gaps and fixing bugs originated in the Java-C# behavioral differences (e.g transaction management). We couldn't wait to finish the next C# version and only then start porting to Java..(the product management guys, remember..).
We got into a situation where we were working on the Java porting in parallel to developing the next C# version. In addition we obviously couldn't drop features between the two versions, if we introduce a feature in the next C# version it must be in the java version as well (the product managers can explain why, and believe me they can be very persuasive), that's how we found ourselves chasing a moving target.





We tried to figure out how can we utilize the automatic porting and keep up with the ongoing C# version, we considered 2 options, The first was to do a onetime conversion followed by a double commit (one for C# and one for Java) for each feature/fix in the product. The other was to create a recurring automatic process in which we'll be able to get for "free" all the changes committed in C#.

The onetime conversion seemed technically easier, after an initial conversion to Java we will maintain two code bases (one in C# and one in Java) and the team will double commit each feature/bug fix in the product. By taking this approach we could write features for the C# version and fill-in the gaps in the Java version. The problem was that the feature set and the bug fixes were too massive, taking this path might have ended up in missing the deadline for the C# version (the double commit is cumbersome, writing the code twice reviewing it twice testing it twice etc.), and probably not stabilizing the Java version. Another problem was that this manual process is error prone, doing the same twice... plus at least at the beginning commits to the Java code base could not be tested (The Java code did not immediately run after the conversion, there was some manual work to be done).

We had to engineer a recurring process with the following goals in mind:
1. Develop features in C#
2. (Magically) get them in the Java version
3. Have parallel work on the Java version: filling the missing infrastructure, fix conversion bugs etc.
4. Stabilize the Java version.

Our application is logically divided into Frontend and Backend (no surprise there).
This post is focused on the Backend side, we wanted to have the Backend converted to Java and working with the C# Frontend (as a stepping stone), get the C# client which is written in WPF and communicates using WCF with the C# backend, to work with a Java backend running on JBoss AS and communicate using web services.
In parallel there was another effort to port the Frontend to Java technologies.

The process we came up with was:
We had a repository for the on-going C# development, we created a branch from this repository in which we did a dumb down of the code to be able to use the automatic conversion, for example commenting out the usage of LINQ which the converter did not process well.
Here is a simplistic example for demonstrating the process, this is NOT how our RHEV-M code looks like (public data members and such).
C# using link:

public class Person
    {
        public Guid id;
        public String name;
        public List friends;

        public Person(String name)
        {
            this.name = name;
            id = Guid.NewGuid();
            friends = new List();
        }

        public void addFriend(Person p)
        {
            friends.Add(p);
        }

        public List getFriendNames()
        {
            List names = friends.Select(a => a.name).ToList();
            return names;
        }

    }
C# after dumb down:

public class Person
    {
        public Guid id;
        public String name;
        public List friends;

        public Person(String name)
        {
            this.name = name;
            id = Guid.NewGuid();
            friends = new List();
        }

        public void addFriend(Person p)
        {
            friends.Add(p);
        }

        public List getFriendNames()
        {
            List names = new List();
            foreach (Person p in friends)
            {
                names.Add(p.name);
            }
            return names;
        }
    }
Java:

public class Person
    {
        public Guid id;
        public String name;
        public java.util.ArrayList friends;

        public Person(String name)
        {
            this.name = name;
            id = Guid.NewGuid();
            friends = new java.util.ArrayList();
        }

        public final void addFriend(Person p)
        {
            friends.add(p);
        }

         public final java.util.ArrayList getFriendNames()
        {
            java.util.ArrayList names = new java.util.ArrayList();
            for (Person p : friends)
            {
                names.add(p.name);
            }
            return names;
        }
    }

The dumb down branch was the input for the converter and the output was committed to the third branch which was in Java, but did not compile (the raw output of the converter, as you can see in the example Guid is not a part of the JDK).
On top of it we created a post processing phase:
1. Files manipulation, some SED scripts to manipulate the packages of the files, add the import statements, etc.
2. Creation of a compatibility package, in which we created place holders for the System classes used in C# and that the converter did not translate into Java, for example 'Guid'/'DateTime' etc .
3. Generating the DB layer.

With the Data Access layer we had to be a little more creative, In C# we have a DBFacade class which executes stored procedure this was not translated into Java. We used the automatic conversion to generate the DBFacade class in Java but it contained only the method signatures. We wrote a java utility which actually filled in the methods by generating code which uses JDBC to execute the stored procedures. The utility used the method names to "guess" the stored procedures we wanted to execute and the method parameters to populate the stored procedure .

And then the code was committed to the last/"clean" branch.

We did this conversion process almost every week, the idea was to have an on-going 'parity' Java version.
The process is very technical and takes only a few hours from start to end. Now we have a process for getting all the code written in the C# version into Java and we could in parallel work on the gaps in the Java version.
There were some ground rules for working on the Java version, one of them was to change ONLY what must be changed, for example the naming convention was inherited from C# and is not consistent with the familiar Java naming convention, the idea was not to change that at this point, because every change we did in Java might have caused a conflict in the last step of the merging process and it was very painful, another rule was NO optimizations/cleanups in Java while we are converting, again to minimize conflicts.

There were some infrastructure gaps we worked on like the Active Directory, XML-RPC layer we used in C#, transaction management etc. and there were some technical gaps like implementing the compatibility place holders, filling the LINQ code etc.

The nice part was when we actually got the Java version running and we found bugs which were not caught in C#, we fixed them in the C# version and after about a week, when we did the conversion process, we had the fixes in Java.


Cheers,
Livnat

Monday, April 5, 2010

Switching from C# to Java

In 2008 RedHat acquired Qumranet, a startup whose focus was Virtualization. Among other products Qumranet developed a management application for Virtualization.

The management application was written in C# and one of the first tasks we got was to make the management application cross platform, well this was expected considering the fact that the acquisition was done by RedHat...

We started exploring the web looking for ideas how to approach this task. At the beginning things did not look promising most of the references we found for porting projects from one technology to another were about complete failures, the only obvious suggestion that we saw all over was not to change technology and architecture at the same time.

Armed with this important advice we kept digging around and realized there are two different paths we can take. The first was to stick with C#, and the second, surprisingly, was to change technology.

Sticking with C# requires technologies to help us run it on Linux. We found Mono which is an open source implementation of Microsoft's .NET Framework which can be executed on Linux, and the second option we found was Grasshopper which is a project of Mainsoft to compile MSIL to Java Bytecode. The idea is to code in C#, compile it to Java Byte code and run the code on JRE on any JRE enabled platform, Linux included. Cool stuff right?
These 2 solutions were taken off the table, we are on our way to open source our project (Red Hat remember...) and we wanted to use a technology supported by Red Hat. In addition a basic POC we did (POC #1) using MONO proved that the technology was immature at the time and did not meet our needs.

Okay switching technology then. At this point Java looked like the natural choice. It is cross platform, has a lot of enterprise framework ready to use and it's syntax and OO principles are very similar to C# which is an advantage when it comes to training the development team.

Here is a problem for CS guys:
input: 2.5 developers, 4 month period, 100k lines of C# code and a relatively mature management application
output:parity application in java
constraint: during these 4 months 4 developers are adding required features on the C# version
algorithm: ?


After we realized we have no cheat sheet for this problem we considered 3 options:
1. Manual - Writing it all from scratch in java
2. Hybrid - integrate Java modules in the C# application
3. Automatic - automatically convert the code (Yes we believe in miracles, at this point who wouldn't?)

Manual -The obvious option was writing it from scratch, every developer's dream, who does not think that the second time he will write the same code in the most generic flawless way? well nice but no thank you, first of all it is most likely that we will not do the same mistakes again instead we will introduce new bugswhich will probably take forever to fix. In addition we will probably lose whatever maturity the application has. Another con which eventually ruled this option out was that writing from scratch in Java while trying to catch up on the moving target in C# seems impossible to us. We did another POC (POC #2) with a new generic architecture to get an estimate of the time and amount of work for this approach.


Hybrid - The current architecture of the C# version is based on the Command design pattern. Generally we can say that the flows/actions in the system can be mapped to commands. The Hybrid approach was to gradually port the flows from C# to java and during the process have both technologies live side by side. The obvious pros of such an approach are that we can have the system running at all times which can be good for the system maturity, we could also keep developing new features in that period (write them in Java of course) and we could harness the whole team for the conversion (+4 developers to work with us). Looking at the cons this is considered a high risk path, we cannot deliver such a product until the conversion process is complete and the time estimates for this process were kind of random, it actually risked the next release (which originally was planned in C# and was due at the end of the 4 month period), another con was that we realized that for the Hybrid approach we had to have some of the infrastructure ready in Java before we start migrating flows, which turned out to be a lot of code, we might as well port the whole thing. One other difficulty we encountered was that managing database transactions in such a system was very problematic. Anyway the high risk was decisive and yet another POC goes down the drain (POC #2.5, we can't actually call it a POC because it was never executed...)


Automatic - The inspiration for this was an article about Boeing and automatic conversion. Well we thought "if Boeing can do it so can we". Sounds stupid? Well it is.
Luckily for us we did not think that at the time. We looked for automatic conversion tools from C# to java, we actually came across 2 of them one is net2java, an abandoned project, which did not help us much, the second one was Tangible with which we started a POC (POC #3).
At the beginning it looked horrible, we converted the project and got over 50K compilation errors but as I mentioned the Boeing article was inspiring (plus we did not have other options) so we tried looking into the compilation errors, what we found was that some of them were repetitive and by doing a little work we can eliminate some of them.
We started by sending Tangible one or two issues which dominated the errors list, we were pleasantly surprised by Tangible's support, they were so cooperative and fast (or shall I say "he was", Dave, was one very efficient developer) it actually encouraged us to send a second chain of bugs followed by a third and a forth. We ended up filing more than 200 issues (over a 6 month period) of bigger and smaller issues we encountered on our journey to Java.

In addition to the ongoing effort with Tangible we had to dumb down the C# code.
For example we had to remove the usage of the Linq library in the C# side since it is not converted to Java properly.
And in the same time we wrote some "sed" scripts to manipulate the Java output and fix some errors like packaging, adding import statements at the beginning of the class or adding a static data member (logger) in each class.

Obviously there was some manual work as well, but I'll elaborate on the technical details of the conversion process itself in my next post.

It took us around 4 months to stabilize the process and get to a working version of our application in Java. We got to a parity version in Java which passed 90% of our automatic testing. We sometimes still can't believe it.
We did not do a scalability test on the Java version yet, I will blog about it as soon as we have some results.


Although for automobiles the next generation is hybrid for us automatic conversion did the job.

Cheers,
Livnat