Developing data science applications

The amount of data that is generated in the technological and scientific sectors is growing by day. Building applications to analyse this data and turn it into reliable and useful information is VORtechs core competence.

VORtech offers a range of capabilities for developing data science applications. Which of these activities we actually do, depends on the needs in a specific project and your own capabilities. Below you will sketch how we can help you in each phase of a data science project.

Project definition

In this phase, you will define the goal of the project together with our consultants: what is it that you want to achieve? Companies that have a good overview of their business processes can usually define a project quickly. But sometimes a data science project cuts through the process lines and then things quickly become complex. Besides, there are often non-technical issues: what about the rights to the data and the results, are people willing to accept the outcomes, are there ethical issues?

In the project definition phase the business case is central: what will it cost and what will it deliver? For companies that are only just starting with data science, there is a lot of uncertainty in these aspects. It is usually difficult to estimate how much effort will be needed to collect and cleanse the data and to make it available in a convenient form. Likewise, it is initially hard to estimate the value of the results. Therefore, many projects start with an pilot phase.

Pilot phase

In the pilot phase, the necessary data will be collected and the quality will be assessed. Some preliminary analyses will be done to assess what kind of results can be expected. This will help to set proper expectations for the business case and to put the right focus in the project. In any case, this will provide the input for a go/no-go decision.

Data engineering and preprocessing

When the project is about a one-time analysis of data, then it is usually not too difficult to obtain the data. You can either give us a file with the data or provide access to the database so that we can collect the data ourselves. Cleaning the data is usually where most of the effort is: there are typically a lot of problems in data that has not been used for analysis before. This can be related to sensor faults or other input errors, but it can also be related to incompleteness or inconsistency of data. Until recently, data was hardly ever collected with data science applications in mind and can therefore often not be used as is. In practice, a lot of our effort is spent on filtering out the dirt.

A close cooperation with your own employees is essential in this phase, for they know how the data is stored and what the meaning of the data is. If the data sets are really big, special facilities are needed. VORtech has experts that can deal with large data sets.

In many cases, data science is not about generating a single report, but more about building an operation application that can be used at any moment or even be part of an automated business process. In that case, the data science application needs to be linked to operational business data stores. With their excellent background in software engineering, VORtech’s employees can do a good job here.

Algorithm development and analysis

This is the most knowledge-intensive part of any project. Our data scientist will develop a suitable model or perform a fitting analysis to obtain the right prediction or insight from the data. That is more than providing a single number: the reliability of the results is just as important. Without a good understanding of the uncertainty in the numbers, it is all too easy to draw the wrong conclusion.

Our work is based exclusively on open source tools. There are many commercial offerings in the market, but in many cases there is no need to spend much money on those.

Testing

Once the results are there, it is time to discuss them with the users. This discussion often leads to important insights that will help to improve the analyses or the algorithms. It is very common that several iterations are needed before the result is really optimal for the intended use.

Knowledge transfer

VORtech wants its customers to be able to work with the results of a project themselves. We make our money from the services that we offer, not from any specific product. This means that we will make an effort throughout the project to make you understand what we do and how we do it. Even so, many customers like to have a support contract after the project so that they can easily ask us for extra information or for some advice. But a support contract is not essential: you as a customer are free to continue working on the results or even to hire another consultant. It’s up to us to make sure that you do not feel a need to do so.