Mathematics for fast processing of genetic data

In this blog series, we demonstrate that our mathematical strengths are an important complement to our programming skills. Using a mathematical approach, we achieve solutions that programming alone wouldn’t.

The blogs are based on interviews with colleagues who recently completed projects where mathematics played a decisive role.

In this first post, we speak with Maarten, who conducted a fascinating project for Wageningen University & Research (WUR). He worked on efficiently storing and processing a vast amount of genetic data.

Maarten, can you briefly describe what you did in your project?

“I worked on a research project that combined enormous amounts of genetic data with pedigree and performance data to estimate breeding values. Such calculations normally require a tremendous amount of computing power and memory. My contribution showed that this is feasible on relatively standard computers, without expensive specialized hardware.”

How is this genetic information stored in a computer?

“A cow’s DNA can be stored as a series of numbers between 1 and 4. Normally, each number takes up 32 bits, but these numbers only require 2 bits per digit. This saves a huge amount of storage space, and we also used this to make the calculations faster and more efficient.”

How was this data used for calculations?

“With this data, you calculate the heritability of certain traits, which results in a breeding value for each animal. This involved a database of nearly 30 million animals. Thanks to our approach, these calculations could be performed in a scalable manner. For the client, this meant they received reliable results without the need for expensive computing infrastructure.”

What was your contribution?

“I developed a special method for matrix multiplication that optimally utilizes the compact DNA format. This made the calculations significantly faster than with standard solutions, and everything fits within the working memory of a computing server. This approach is a good example of how our mathematical perspective leads to solutions that programming alone wouldn’t have been able to achieve.”

Are you satisfied with the final result?

“Absolutely. The client was impressed that their enormous datasets could be processed on regular computers. This allowed them to continue their research more quickly, without investing in expensive hardware. Moreover, the result has led to a scientific publication in which my contribution is explicitly mentioned. This ensures that the impact of this project will continue to be visible in future research. For us, this is exactly the kind of impact we’re proud of: smart mathematics that delivers real results.”

The Mathematical Details

The most challenging performance problem lay in linear algebra with compactly stored vectors. Existing, highly optimized libraries like BLAS primarily work with standard floats, not with compact small integers. Maarten developed a modified implementation of matrix-matrix multiplication that could work with compactly stored vectors in combination with standard floats.

His insight into the larger algorithm allowed for clever choices in how the linear system was partitioned. For example, some matrices didn’t need to exist in their entirety at all. Or sometimes only the transpose was needed to continue calculations. Furthermore, it was crucial that the compact storage format didn’t cause any delays elsewhere.

The performance of the custom matrix multiplication had to match that of the BLAS routines. With this, VORtech made a decisive contribution to the performance of the entire project.