The mission of OptaPlanner (renamed from Drools Planner) is to enable normal JavaTM programmers to solve NP-complete problems in production. This open source project brings intelligent heuristic and metaheuristic algorithms mainstream by implementing them in a stable, scalable, reusable and easy to use manner. It’s used in production all over the world.
Although OptaPlanner is focused on production use – and this requirement does take up a lot of my time – it’s very comfortable to me that my research is rooted on a production tested framework. It allows me to validate, test and benchmark my research contraptions more thoroughly.
Let me explain this with an example: A few years ago, I wrote a SimulatedAnnealingAcceptor, which enriches the Local Search solver with an acceptance criteria – to turn it into simulated annealing. At that time, it was still a prototype. And in fact, my first 2 implementations failed miserably at getting any decent results…
Battle tested components
A lot of things could have caused the problem, but I was able to eliminate a lot of potential causes early on, because the other code was unit/integration/production tested. For example: I could reasonably thrust that Score is accurate for the currently selected move.
So why didn’t my original implementation produce any good results? Was the SA acceptor accepting moves too much or too little? Or was it simply accepting the wrong moves? Or was the code slow, creating a performance bottleneck?
Luckily, enabling logging quickly answered these questions for me:
12:14:57,035 DEBUG Step index (1), time spend (1651), score (0hard/-96soft), new best score (0hard/-95soft), accepted/selected move count (1/30) for picked step (2010-01-20(Wed)_L->4(4) => 7(7)). 12:14:57,069 DEBUG Step index (2), ..., accepted/selected move count (1/92) ... 12:14:57,087 DEBUG Step index (3), ..., accepted/selected move count (1/84) ... ... 12:14:57,921 DEBUG Step index (104), ..., accepted/selected move count (1/7) ... 12:14:57,928 DEBUG Step index (105), ..., accepted/selected move count (1/9) ... ...
Everything looked normal, except for the selected count: it went down over time, so my SA acceptor was accepting moves more easily as time progresses… Not good. Based on this, I decided to double check if my formula wasn’t reversed somehow. Bingo, that was it!
Did my fixed SA prototype have better results than the existing algorithms, such as Tabu Search? This is a hard question to answer, because it depends on the use case, the dataset, available time and many more parameters. It’s not a yes/no answer. But still I needed to objectively prove that my SA prototype was worth keeping around.
With a little help from the benchmarker toolkit in OptaPlanner, I was able to quickly compare different algorithms (including the SA prototype) on different datasets for different parameters, with the benchmark report:
A benchmark report contains many charts and tables, to compare algorithms based on results, performance, scalability, etc. Specifically a “best score over time” chart made it clear that for this dataset, my SA prototype was not going to beat Tabu Search any time soon:
Was my SA prototype useless, seeing that Tabu Search performed better? Turns out it wasn’t, because on other use cases it does perform better. OptaPlanner features 11 complete examples, such as TSP, VRP, course scheduling, employee rostering, etc. Most of the examples solve a well-known problem defined by one of the realistic OR competitions.
By testing my new prototype on each of these examples, without having to actually implement any of them, showed that in some use cases, the SA implementation performed better than Tabu Search.
This set of pre-existing examples, datasets and already well performing algorithms, enable me – together with the benchmarker toolkit – to better determine the strengths, weaknesses, opportunities and threats of any new algorithm I invent.
A nice benefit of working on an open source project, is that others might review, improve or expand your code. And when you read each other’s code, the algorithm discussions become much more valuable. Important details aren’t abstracted away. Miscommunication is flushed out.
Doing research on top of existing production code has its advantages. Especially with open source code, for which you can rewrite any subcomponent as you see fit, it’s increasingly hard to argue for reïnventing yet another OR framework.