In this blog post we will go through the different approaches to big data, define what big data is, and how to use it to create value. There are many opinions on how to handle and use big data, the reality is that many do not have a clear idea of the amount of data that are being generated in their companies.

With the introduction to machine learning and artificial intelligence an increased need for large datasets, from different sources, and in the right quality has arisen. With large datasets it is possible to create confident machine learning models that can create a competitive edge for companies. This is going to be a game changer for companies, both for improving decision making but also to be more efficient in production.

Big Data Definition

Initially Big data was defined by 3 V’s:

”….. High Volume, High Velocity, and/or high Variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization” Laney 2001


High Volume is defined as to how many observations that is represented in a given dataset. Some articles define the volume as being satisfying, by the number of terabytes or even exabytes. However, the volume cannot be defined as always being sufficient when having x terabytes. The right volume depends on the given business case.

High Velocity is defined as how frequently new data is retrieved and processed. Many machine learning tasks like predictive maintenance requires data with a high velocity (seconds, minutes) to be more accurate.

High Variety is defined as the different types of data: structured, semi structured and unstructured data such as text files, pictures, sensor data, web data etc.

As the years pass, more V’s have been introduced to the definition of Big Data such as Veracity (representativeness, unbiased data), and Value (does the data gathering and analysis of the data creates value to you and your company?).

It is important to know, that Big Data does not mean, that you should collect all the data you can. Be tactic, and start collecting information from those places, where an analysis can create value to your business, and make sure that the data is unbiased, of high quality, and representative.

Big Data

The evolution of analytics

Analytics is the content of how to use data and what kind of value it will generate. Analytics can be put into three different categories/sections named analytics 1.0, analytics 2.0 and analytics 3.0 [1].

Analytics 1.0 - the era of Business Intelligence (BI)

In this era the approach is to be data informed. Data is used primarily for history writing and reporting purposes. The most common example is monthly financial reports to summarize what has happened the last month. Often excel sheets are used to present and deliver information.

This was the era of the Enterprise Data Warehouse; used to capture information, and of Business Intelligence Software; used to present and report it.

Statements:
- Decision was based primarily on experience and intuition
- Data sources relativity small and from internal systems
- Most of the time is used to gather data - not to put them into use
- Data is not used as a strategy asset in the decision process

Analytics 2.0 - the era of Big Data

In this era the amount of data is growing, and the sources are shifting from being of internal sources to be a combination of internal and external sources. The amount of data sources puts new requirements on how to process all the information, increasing the need to use both internal and external processing capacities to handle them in the speed needed. A lot of the data that appeared in this era was unstructured and they required new technologies to put them in proper use, it could be technologies as machine learning (ML), computer vision systems and artificial intelligence (AI) The main focus in this era is to start using data to predict what will happen in the future, instead of only use data for “history writing”.
To be able to work with data in this new way requires a new type of employees and data analyst or data scientists appears in the companies to be able to handle this new business area.

Statements:
- Complex, large, unstructured data sources
- New analytical and computational capabilities
- Data stored and analyzed in public or cloud computing environment
- Machine learning methods increases the speed of analysis
- Visual analytics offers predictive and prescriptive techniques
- On-line companies will start create business on data

Analytics 3.0 - the era of Data enriched offerings

This era is characterized by the fact that all businesses can create data-based services and products. Data is not being supplied, but used for decision making both from the supplier - and from the customer side. Another term often used is that a company is “data centric”. Data is quite often imbedded in production and decision making processes, which makes it much harder for managers to “avoid” using data. Data and analytics will be all around in all processes in companies.

Statements:
- Create data and analytics-based products
- Not supplying data - help/guide customers in decision making
- Rapid and agile insight delivery
- All businesses can create data-based products and services
- Focus will shift from software development to data analysis
- Heavy reliance on machine learning
- Strict structures in place to communicate data science finding to decision makers
- High speed and agility needed

What is - right data?

To be able to figure out was is right and/or wrong data it is essential to start defining the purpose or goal. “What does the company desires to achieve?”
In many situations companies start the process gathering as much data as possible. Two years later the company will start actually looking at the data. The process can be much faster if only the right data is found and stored. In other words - the right data is the data that can be the basis for analyses that can reach the purpose.

How to start the journey

To start the journey locally moving from analytics 1.0 to 2.0 there are several steps to take to create a success within a reasonable timeframe.

  1. Define the porpurse, find the area where you expect to find the highest value - in other terms find the cases that generates the highest internal value. When you show the rest of the organization a success, it is much easier to get funding and support to extend the work with Big Data to the rest of the company.

  2. Always start working bottom up, start small, create a success and the scale up later.

  3. Identify the data that can help the company to achieve the purpose. There can be two possible outcomes either the data is available, but need attention before they can be used, or no data is available. If there is no data the journey will start with generating the data, closely followed by finding a way to store the data. If there are data available a cleaning process will start to identify the quality of the data.

  4. Involve the organization, there will be people that will be involved in both the transformation process as well as the implementation afterwards. The best result will always be to have people involved from the beginning to take ownership of both processes.

  5. There will always be a change management angel in the work with Big data and machine learning. New tools will generate the need for changes in the current process. This is an area that many people tend to move until very late in the journey, that can be a major mistake and the result can be a lower outcome.

Conclusion

To be able to move from analytics 1.0 to analytics 2.0 and even later the full way to analytics 3.0 requires top management focus and defined data strategy. The process does not necessarily have to run for many years, it is possible to create small successes fast to gain momentum in the organization. The possible gain for most companies is huge and if you do not start now, it is likely that your competitors will do this and then be more competitive moving forward. There is no doubt that this journey, working with Big Data, will generate a lot of internal value, but it is not enough just to store them, they have to be brought into use.

// Lars Endrup, Business Development Manager @neurospace

References

[1] Davenport, Thomas H. The era of analytics, Harvard Business Review 2013.