The LOTOS-EUROS model was developed by TNO and RIVM. It has a long history and today it is one of the leading models in its field in Europe. As it is also used for operational purposes, it is important that the run time is kept to a minimum. Therefore, the main part of the code had already been parallelized to make good use of the computing hardware at KNMI. However, the performance turned out to be far less than expected and was varying considerably when switching from one computing platform to another.
Steps to a higher performance
Improvements on the performance were achieved in a number of steps:
- We started by analysing the performance with our own tools on our own computers. That confirmed the results that had been found before on the computers of KNMI.
- Then, we took a closer look at the computations that were causing the problems:
- In one case, it was possible to shift the parallelization to a higher level. This makes it possible to distribute larger chunks of work and consequently reduces the overhead for starting and stopping parallel threads. A downside was that we had to introduce extra arrays to store intermediate results.
- At one location in the code, an interpolation was done repeatedly. That could be made faster by storing the interpolation weights instead of recomputing them every time.
- Finally, we did extensive performance tests to see if the modifications were sufficient.
A good result
In the end, the reference computation took 80 seconds where it took 125 seconds before (using 8 cores). On a single-core the computation took 280 seconds. A speed-up of only three seems little for eight cores, but that is because large parts of the code have not yet been parallelized. As the parallel part gets faster, the non-parallelized parts start to dominate the overall performance.