Hi Boris, Chris, Doug and Ed
Hey, how’s it going. I hope that you are all well.
Like a mall on Black Friday, around here it’s peak Eclipse committing season. Lots of bugs to fix. Lots of builds to run. As you know, builds are quite a pain point at Eclipse. I’m excited about the possibilities in the b3 project to make things better. Building software is complex, and Eclipse is no exception. In additional to tooling to make builds easier, we need hardware to make builds faster. Our build today takes about five hours to complete, and an additional 6.5 hours for the tests to complete. Really, it’s not pretty.
Many eclipse projects run their builds on Hudson on build.eclipse.org. Hudson is fantastic because there’s a rich set of plugins that you can use to enhance the functionality of your build. Also, since this server has local access to the eclipse.org filesystem for code checkouts, you’re less prone to network errors which can break the build. It also has ldap integration with your commiter login so you can restrict your build configuration to the commiters on your project. In theory, if you need more build machines to run your build – you can use the Amazon EC2 plugin to provision more machines in the cloud, or other plugins to start builds on local slave machines. Good stuff.
However, one of the things that the foundation doesn’t provide today is test machines. This means that we can’t run our build at the Eclipse foundation. The Eclipse Project builds zips for 14 different platforms. We run JUnit tests on three native platforms: Windows, Linux and Mac. They are the most commonly downloaded platforms. We need test machines to ensure that we don’t have any bugs specific to a platform. Why do our tests take so long? We have 54,000 JUnit tests. You don’t produce quality software by skimping on tests.
This isn’t just about the Eclipse and Equinox projects. This could be very useful for other projects, for instance, the XSL tools project has expressed interest in using test servers. In addition, these machines could be used as slaves machines for running the build in the event that the main Hudson server is too busy. If we had enough machines, we could run more tests in parallel and reduce the time it takes our build to complete. This would be a big win for the community and our committers.
One thing I investigated in running tests in the cloud. However, most cloud services don’t have provide a way to run tests on Macs and we need to make sure that our Mac users are happy. If there is a way, I’d appreciate a link. In addition, one of the advantages of running tests on machines local to the eclipse.org filesystem is that we don’t spend time copying stuff back and forth across the network. It’s just there.
So, what I’m asking from you is at the next board meeting, please bring up the issue of funding test infrastructure at the Eclipse foundation. It might be even be an advertising opportunity for one of the member companies if they donated hardware. Other companies could donate money to pay for the additional rack space. I don’t know right now what the final technical solution will be or what it will cost. All I’m asking right now is to start the conversation.
For many years, the Eclipse project has been criticized for not being open enough. Having our build process fully on eclipse.org servers would make us more open. It would also allow any of the Eclipse and Equinox project committers, regardless of company affiliation, to initiate a build. It we had enough hardware, our build could be faster and we could spend less time waiting for builds, and more time fixing bugs the builds reveal.
Please bring this issue up and the next board meeting.
P.S. Right now we have the following test machines and our tests take about 6-8 hours to complete. Obviously, if we had more machines running tests in parallel, the build would take less time.
1) JUnit: 2 linux, 2 windows, 1 mac, 1 test cvs server,
2) Performance: 2 windows, 2 linux, 1 database server
Kim, thank you for blogging about this. I'll bring it up at the next board meeting. Board meetings are every month (face to face once every quarter), and we are encouraged to bring up any committer-related issues. If there is anything else that you feel should be discussed at the board, contact any (or all) of the committer reps, using whatever channel works best for you.
Kim, I am surprised that you felt you needed to take the “political” route of addressing this to the committer reps.
As far as I'm concerned, this is not a controversial topic. If the community wants testing machines, we will figure out a way to get some testing machines. It might take a while, but that just the way it is.
Is there a bug open for this request where we can track the conversation?
I wasn't trying to be political. My understanding from Denis is that there isn't a budget for test machines today. So the only mechanism I know of to influence the budget is to ask the committer reps to ask the board. That's all 🙂
Here is a bug where this discussion started.
@Mike: Bug 294330 is one of the bugs. The original idea was to have the community use the SWARM plugin for Hudson so that machines could come and go and the swarm could grow as needed. However, there is no way to control that a particular slave machine is only targeted for testing. Also there were some valid security concerns as well raised. Most of these concerns are eliminated by the Foundation providing the necessary testing platforms.
There is precendence for this as the Apache Software Foundations provides a Master Hudson server plus at least 9 slave machines to help out with the various project. See Apache's Hudson for a complete example.
The main ASF page on builds, is a little out of date as it looks like there are multiple Ubuntu slaves on their Hudson instance.
As more projects start to use Hudson because of all the benefits it provides, we'll need more slaves or at least executors available to keep people and projects happy. Having fast builds is key to getting committers and the community feedback so they can react sooner rather than later.
Oh and Mike in Bug 294330 we even have community members willing to donate some time and space on their machines to help out. We just need to figure out a way to make it happen that satisfies all parties involved.
I opened up this bug to request test servers at the foundation.