Friday, May 16, 2008

Source control with continuous integration process

Recently, while reading the book on Continuous Integration Improving Software Quality and Reduce Risk, I chance upon the following practices:

Commit code frequently. Try to make small changes and commit each task when is completed. Waiting more than a day or so to commit code to the version control repository makes integration time-consuming and may prevent developers from being able to use the latest changes. The longer you wait to integrate with others, the more difficult your integration will prove to be. Developers do not want to commit their code until it is “perfect”.

Yes, commit code frequently is very important especially in a CI (Continuous Integration) environment. But how frequent is frequent? Above suggested more than a day is too long and should day be used as the benchmark? In usual practice, task often assigned can be further breakdown into many smaller units. Does that mean we can commit this small unit each time is completed? Yes, but that will probably result in codes committed every hour. Other than that, usually codes are developed in the smallest unit before you work on the bigger units which depend on the smaller unit. Therefore, if there is no other part of the project having dependency on this small unit, then there isn’t really any advantage to commit the codes to allow other to make use of them.

From experience, people working on very closely related task often encountered conflict when attempting to merge codes. So even merging codes is a problem, not to say about integration. Therefore, closely related task assigned to different developers have to properly manage. The management of such process is very important.

Using source control is highly recommended, but I noticed that some common practice on committing, updating of source control are seldom discussed. Was it because everyone knows about these practices? I doubt so, with my recently experience working on a project with 6 developers of some with more than 5 years experience.

  1. One single developer can be committing codes up to 5 times in a span of 2 min. What happen here? These people are committing codes not from highest trunk level but from each folder where changes are made. Imagine you happen to get an update in between the 5 commit? Build fail as probably your update did not capture the last 2 commit which the first 3 commit have a dependency on the last 2.
  1. Every time I get an update of the source, my build fail. And usually, this is not due to a single fault where someone makes a mistake, but very often 3-5 mistakes are found that have cause the build to fail. What happen here? Partly due to point 1 and codes are not fully committed. Committing one folder but not the other which they have a dependency.
  1. Test fail. Developer are not getting an update and running test locally to ensure latest version merged and integrated seamlessly with their changes.

The consequence of such ill disciplines resulted in many man hours lost. I spend an average of half to an hour of time going thru the error and hunting down the person causing the build to fail each time I get an update. While trying to get to the bottom of this problem, I realize that the Cruise Control never had a single successful build for a week with up to an average of 40 build a day. Digging deeper and after much questioning, I realized that the some project have been removed from the build due to some testing causing the whole build process to hang.

Not to give up, I added those projects in and wanted to see how where the problem is. One amazing problem found was a test actually open up a notepad and cause the whole build to hang there. Next, a test in one of the project keeps causing an error on NANT which need human intervention to go into the build server to close the error prompt. Failing to do that, result in NANT time out and each build just stalled there. Worst still, no one make any noise about this issue.

Therefore in order to minimize such occurrence again, I have come out with the following guideline while using the source control and continuous integration:

  1. Get an update before working on any task.
  2. Work on the task with necessary test executed successfully.
  3. Get an update again before any commit.
  4. Run all test again to ensure that you task or changes integrate with the latest version without any problem.
  5. Commit your task in one revision.
  6. Check that build on the build server run successfully.
  7. Fix or rollback any error reported immediately.

Lastly, the book mentioned above is a great for anyone that are interested to implement the continuous integration process or enhance their existing process. The book covered the database, source code, test and deployment continuous integration. Imagine without this process and having to encounter similar problem above will be a nightmare of getting things right.

1 comment:

Anders Sandvig said...

I think you have some good points here, especially about the importance of updating your working copy before doing changes or committing something. Often when people are having problems with merge conflicts or breaking the build, the situation could easily have been avoided if they had started working on the latest revision and updated before they committed. If you always keep up-to-date with the current revision, whatever obstacles you encounter will almost always be smaller than if you don't update and wait until the very end with resolving every conflict.

The problem with programmers working "too closely" on the same part of the project I think is better resolved by people actually talking to each other. No matter how often—or seldom—you commit your changes, it won't help if you don't communicate with your team about what you are working on. Again, I think frequent commits are preferred because they will often force people to discuss issues and resolve conflicts early.