What is a data platform?

Data-driven decision making does not simply require data; you need a way to read, share, govern as well as explore lineage, metadata as well as trust that the underlying data is of both high quality and consistency. 

A data platform is the interface your employees access data. It’s the platform that handles access requests and handles permissions to sensitive data. The platform that ensures that data is up to date. it’s the way employees explore data that might be related to a given business case. It’s where data scientists will conduct various experiments and build data products based on live data directly from relevant sources.

This helps your business run more smoothly, efficiently and consistently.

How can my business benefit from data platforms?

A data platform can in itself cover many of your needs - from storing operational data to doing analysis. Building a data platform transforms data into a strategic asset allowing making decisions based on real-time information.

What makes a good data platform?

The Strengths & Pitfalls

While every company’s needs are unique, throughout our industry experience we’ve observed some common strengths and recurring issues with data platforms.


Seamless Data Flow Across Tools and Departments

Interoperability is the system’s ability to operate and integrate with other systems. We experience that certain employees are most comfortable with Excel, for instance, so it’s crucial that a system allows people to explore and transform data in Excel. Each department also has a set of tools specific to their needs; this could be geographic information system (GIS) data, where a department might have preferred tooling; In such a case, we ensure that the system will both get data from and push data into such a system seamlessly.


Optimized Data Freshness & Reduced Costs

Without the appropriate safeguards, a data platform can easily turn into a mess. Employees will create high quality data assets, however the individual might be responsible for updating the specific data, so you have cases where data hasn’t been updated for weeks without people knowing. Similarly the dependencies between data, if not properly tracked, can become unmanageable for any human brain, where it becomes impossible to ensure that each piece of data is updated accordingly. 

By utilizing the correct systems, we can reason about freshness between assets; I.e as soon as an invoice is filed in one system, the data platform ensures that relevant reports are automatically updated as well as related parties being notified. Good freshness is not merely ‘as soon as possible’ - certain domains might require slower freshness, and cost can be optimized by limiting the freshness of certain assets. An example being physical sensors might update every millisecond, however one might not need yearly summaries of those sensors to update more often than daily or weekly. 

We move away from the perspective of:

“we want this to run every night”

into

“we never want this data to have a higher delay than 4 hours”

“every morning at 6 we want these tables to be up to date”

which reflects the actual business rules and procedures. We can similarly map restraints such as ‘these assets cannot be updated before noon’, which might reflect regulatory restrictions, or the closing of certain markets.


Bridging the Gap Between Business & Domain

Most companies of adequate size have different departments and domains with different understandings of various terms. Similarly certain areas and data aspects might be very technical in nature or require a high level of business understanding.

We strive to include actual experts in a given case whenever possible; ideally enabling tech-savvy persons from the relevant business area in developing the assets, but alternatively building them in close collaboration. This ensures that each asset in the platform is of high quality and is understandable to employees that access it, without a bigger introduction.

We aim to work in a problem-driven manner

I.e. defining a data product and getting a clear understanding of the underlying domain, whereafter a solution is built with the explicit intent of solving a clear problem. Tying any such information into the data, and exposing it for future readers, makes it easier to explore in the future and understand the relevant context.


Building Trust and Agility through Data Democratization & Transparency

An extension of being business driven, is the ability for all employees, where applicable, in refining and exploring data in the tools they are comfortable with. Enabling data scientists in writing the needed code in the language of their choice, without worrying about scheduling and the subsequent maintenance.

Similarly, allowing everyone with adequate security clearance to explore data and utilize it across domains, breaking down the silos that easily form around various departments.

We often find that the problems we aim to solve in a business require data from various domains and require a cross-disciplinary effort.

By making the data easy to access, understand and explore, ideas and reflections bubble up from employees across a company. Similarly the ability to see lineage, i.e which assets depend on which, and when their last update-time was, confidence in data grows. This also minimizes vendor lock-in and improves maintainability.


Promoting Quality Checks

When democratizing data and allowing a wider range of people to contribute, certain quality gates might be crucial. We generally encourage a four-eye principle; ensuring that any change is vetted by at least one other person, however where possible we also enable automatic tooling to aid.

We aim to help the persons developing new data assets in creating relevant gates and checks, ensuring that invalid data or faulty data is automatically flagged and an alarm is raised. 


We promote a code-driven development style

This might be alien and scary for some. Working code-driven simply means that changes and features developed are done through code in some shape or form - this might be SQL or Python code or other technologies that the business is content with. We have experienced initial hesitancy, but both a productivity boost and a comfort when rolled out, where certain persons from each department help drive the change in close collaboration with less savvy coworkers. 

As the variety of persons contribute, a clear trail of changes is crucial. In case of an incident, we want to be able to rebuild all data assets without major manual intervention, and when bugs eventually occur, we want to be able to explore the steps that led up to it. 

Working code-driven trivializes review of changes, improves maintainability, as well as generalizing steps and practices that are repeated. One might have a similar procedure for filtering noise in sensor data, or aggregating monthly finance numbers; standardization and reuse is trivial when working code-driven.

Big Data? No, Right Data!

Neurospace believes that data should be a strategic goal, and the data’s size should be the least important aspect.

This is why, in Neurospace, we use the Right Data Framework, meaning that we only retrieve data when it has a purpose. This strategic approach ensures:

  • relevant and valuable data to the business case at hand
  • to avoid unmanageable amounts of data
  • a faster and more efficient gathering and analysis of dataNeurospace believes that data should be a strategic goal, and the data’s size should be its least important aspect.

Technologies

There are multiple technologies out there that can help you build your data platform. Neurospace builds platforms using Azure, or Google Cloud Platform

Azure Databricks

Databricks is an all-in-one data platform that enables exploring and viewing data, defining transformations of data, experimenting with various machine learning tools and much more. It integrates with a wide range of tools, and the underlying data can be explored through PowerBI, Excel, your own custom systems and much more. Their interactive editor enables data scientists in creating notebooks, defining various transformations and operations on data in both SQL, python and R. It has a wide range of integrations, both for geo-data, streaming and much more.

Google Cloud Platform

Google Cloud Platform, or GCP for short, is an enterprise scale suite of modular cloud computing services offering numerous solutions within data storage, cloud computing, and machine learning.

Dagster

Dagster is a data orchestration tool that allows the definition and execution of assets, with their relevant dependencies, required freshness, quality gates and checks. It is open-source, meaning that there is no initial cost, however they provide a great cloud solution that is packed with relevant tooling. It is code-driven and allows the developers to either utilize standardized integrations or build your own, essentially making abilities endless. We have experienced certain limitations with other tooling, however dagster has a great way of reasoning about assets. It can be used with any underlying data storage, and does not pose any clear limits, and can be combined with for instance databricks or other tools. We use it internally, while integrating with a wide range of platforms.

But first... Why Neurospace?

Regardless of which specific technology is used, Neurospace can guarantee your platform will:

  • Have a solid and well-documented architecture which lives up to the data safety and data quality standards.

  • Be guaranteed both reproducibility and traceability thanks to the used Infrastructure as Code tools.

  • Incorporate a number of tools that can make the intake and processing of new data sources into a simpler task.

Not sure yet?

Our Clients Say...

"

The AI Camp gave us insight into what Machine Learning is, and how it can create value. During the process we gained the knowlegde required to begin a machine learning project in our company

"

Jens Rishøj Skov

Kredsløb A/S

"

We had high expectations for the AI camp, including learning about machine learning and a common understanding of an optimization task that affects several heat production plants in Copenhagen, in order to get cheap district heating for our customers. The AI camp proved that there is a clear opportunity for optimization and we are confident in the results of the analysis. Neurospace has been quite committed, and has been good at asking about correlations, in order to get an understanding of the problem. CTR is ready to continue working with the pilot project, as our goals with the AI Camp have been fully met.

"

Michal Brahm Thomsen

CTR I/S

"

The AI Camp concept fits perfectly into our innovation agenda, where we need new technology and people to work together. In the process, we have become wiser about optimizations in production and gained an understanding of how machine learning can solve these in collaboration with our employees. The AI Camp process solves this perfectly. We are extremely satisfied and achieved all our goals with the AI Camp.

"

Martin Jørgensen

Danish Crown A/S

"

We are a mixed department of both IT experts and laypeople. Despite the varying levels of expertise on the subject, all participants have gained a better understanding of what AI and ML are, how to ensure collecting the right data, and where it makes sense to use these technologies. Additionally, both Maria and Bo from Neurospace are extremely professional and pleasant instructors.

"

Martin Lauritzen

Green Energy Scandinavia A/S

"

We have been introduced to the new world of machine learning by Neurospace. Their AI Camp provides a good basic understanding of machine learning and data. At Nordic Sugar Nakskov, we have gained an understanding of why it is important to examine correlations systematically instead of limiting the usage of sensors and measurements based on gut feelings.

"

Anders Juul-Jørgensen

Nordic Sugar A/S