An anomaly (or outlier) is an abnormal observation that deviates from what would be considered normal. An anomaly is also called an outlier or a novelty.
Artificial Intelligence (AI), is the theory and development of computer systems that can perform tasks that would normally be considered to require human-like intelligence such as visual perception, speech recognition, decision making, and translation between languages. AI is mostly developed using machine learning.
A binary problem is a either-or problem, i.e. there are only two options.
Business Continuity Plan is a guideline for how the company handles differenct disaster scenarios such as hacker attack, data is lost in a natural disaster, or a partnership with a key supplier cease.
Classification is a category of problems of which you wish to detect whether a given object belongs to one class or another.
The Cloud consists of computer services such as servers, storage, and databases etc. which are available on-demand. This means you only pay for the resources used, and therefore do not have to pay for the cost and administration of owning the hardware. The service is provided by a cloud provider such as Google, Amazon, or Microsoft.
Computer Vision is an area within artificial intelligence which focusses on extracting information from images. Examples are identification of objects in images or classification of quality. Multiple matematical methods exits, however, machine learning have proven to be especially usefull. In the manufacturing industry, Computer Vision is often refered to as Machine Vision.
A Data Lake is a data storage repository that can store unbounded amounts of data. The data is often stored in a raw format, meaning the data has not been processed between the source and storage. A Data Lake is often used in analytical contexts, however, the analysis often requires data scientists or personal of similar expertice, since the data availability suffers increased complexities due to unprocessed data.
Data leakage is the use of a value during development of a model, that at the time of prediction can not be available. It often contains the information you are trying to predict.
A Data Mesh is a data management architecutral paradigm that enables analytical data at scale, by following a domain driven design approach and utilising distributed systems. This means that the data and its management is divided into business domains. The concept of data mesh therefore both consists of technical implementation details as well as organisational management principles.
A Data Warehouse is a type of data management system that collects data from muliple sources following a structrured schema. The system also makes data accesable through a single accespoint often using SQL. Data Warehouses is mostly used in analytical context such as with business intelligence.
False negatives are when a test wrongly predicts that a condition is not there. In other words, it is wrong that the test is negative.
False positives are when a test wrongly predicts that a condition is there. In other words, it is wrong that the test is positive.
A feature is an measureable individual characteristic that is used for prediction the output value. Vibration, Temperature, and Sound are examples of three features.
Feature importance is a way to analyse which features have the highest impact of the model’s predictions.
Frequency Aliasing is a state in which signal data (sensor data such as vibration) is collected at too low a frequency. This causes the signal to be incorrectly translated from analog to digital, resulting in a signal distortion. Nyquist-Shannon sampling theorem is one way to measure the right frequency response.
Frozen data refers to the situation where data from sensors is not transmitted to the desired destination, such as a data platform. Often, frozen data is expressed as the same measurement over an extended period of time.
An imbalanced dataset is a dataset of which the majority of the data contributes to one class, discriminating one or more other classes.
Internet of Things (IoT) describes physical objects such as machines that is connected to the internet with sensors and/or has its functionallity extended through embedded systems.
Mean is the same as an average.
Multicollinearity describes when two features have a perfectly correlation. It is therefore possible to predict one feature by knowing the other. If two features are perfectly correlated with one another, one should be removed, as no information gets lost and multicollinear variables can reduce the model performance.
A Neural Network (NN), sometimes called Artificial Neural Network (ANN), is a computer system within machine learning which is inspired by the neurons and synapses of the human brain. When there is a lot of information in the data, the complexity increases, which is often reflected in the neural network having several hidden layers, called deep learning.
A null-value is an observation where there is data missing. Null is used to represent that no value has been set. A dataset with many null-values implies that we are missing important information.
Overall Equipment Effectiveness (OEE) is a measurement for the productivity of production. It measures unplanned downtime, stops between shifts, and bad quality products.
Overfitting is a condition of a machine learning model where the model has learned the patterns from the training data too well, so much so that the model can not generalise to new data.
P-F interval is the interval between the moment of a registered sign of potential failure, and the moment a malfunction (breakdown).
Precision and Recall is used to evaluate classification problems. Precision and Recall is a sensitivity measurement, which expresses how well the model is predicting the true positives, compared to the number of false negatives and false positives.
Preventive Maintenance describes a maintenance approach of which you are preventing breakdowns, by planned maintenance for instance based on time.
A probability is a value between 0 and 1 that indicates how likely an event is to occur. The closer the value is to 1 the more likely it is to occur.
A Qualitative variable is a value that can be categorised into a specific group like sex and age-groups.
Quantitative variables are measurable. You can calculate a mean and standard deviation of the values, i.e. the value is numeric (continuous).
Quartiles describes a process of which data is sorted ascending, before data is divided into fractions. Typically data is divided into lower quartile (25% fraction), median (50% fraction), and upper quartile (75% fraction). Quartiles are a good aggregation method as they clarify anomalies as well as the distribution of data.
Reactive maintenance describes a maintenance approach of which you perform maintenance when breakdown has occurred.
Reinforcement Learning (RL) is a specific learning approach for Machine learning, where the model learns by trial and error. The objective is to maximize reward in a particular situation, for instance by maximizing points in a game.
Reproducible analyzes mean that it is possible to recreate the results of an analysis if the same data, code, and tools are used.
Right Data is the concept of being strategic with what data is collected in the company. Data is not retrieved before you know to what purpose, and how this data can help the company reach its strategic goals.
The standard deviation is used to quantify the amount of variation in the dataset.
Sudden failures are errors that occur on machines due to either fault mounting or in case of randomness. They often occur shortly after maintenance has been performed and, through predictive maintenance, can be detected in time to turn off the machine, reducing the likelihood of hazardous situations.
To train and verify a machine learning model, a dataset is split in to train, validation and test datasets. The majority of the data will be used for training the model. Validation data is used to validate the results of the training session, and the test data is used to evaluate whether the model is generalizable.
A machine learning model is often trained on historical data. In the training phase, the machine learning model is introduced to a larger dataset that it uses to learn patterns in the dataset. Based on patterns in historical data, the trained model will be able to make predictions on unseen data.
True negatives are when a test correctly predicts a condition is not there.
True positives are when a test correctly predicts a condition is there.
Unstructured Data is any data that is not structured in a predefined way such as images, audio, text, etc. Unstructured data can have structured metadata which describes the content of the data.
Unsupervised Learning is a learning approach in machine learning, where the true output is unknown during training. Unsupervised Learning is used when you do not know the true output, and therefore instead seek patterns to be able to e.g. group customers based on their buying behavior.