AI-assisted anomaly detection from log data
Pusa, Teemu (2023-12-08)
AI-assisted anomaly detection from log data
Pusa, Teemu
(08.12.2023)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe20231215154952
https://urn.fi/URN:NBN:fi-fe20231215154952
Tiivistelmä
As the production of software continues to increase, the volume of log data being generated is also on the rise. This surge in data has made it impractical for human operators to manually review each log line produced by software systems. This necessity has led to the development of automatic anomaly detection methods. Automatic anomaly detection methods would allow system operators to respond to incidents more quickly and improve the quality of the software.
In the past, anomaly detection from log data relied heavily on predefined rules. However, with the complexity of modern software systems, finding experts for every system component to write these rules has become difficult. Additionally, it is very labor-intensive to implement these rules. This has spurred interest in unsupervised anomaly detection methods.
The purpose of this thesis is to research which kind of methods can be used for automatic anomaly detection, what is required to use them in a production system, and how well deep learning-based methods would work with log data produced by hundreds of embedded devices. The thesis begins with a literature review to explore the various methods used for anomaly detection from log data. It then outlines the required infrastructure for efficient anomaly detection and concludes by testing the DeepLog Deep Learning method on real log data from a production system.
The key findings suggest that the DeepLog model performs effectively for anomaly detection when trained in an unsupervised fashion. However, it is essential to ensure that anomalous samples do not dominate the training data. This can be achieved either by completely excluding them from the training set or by ensuring that no single anomalous sample overwhelms the entire dataset, which could lead to overfitting. Moreover, the training dataset can be continuously refined by eliminating recognized anomalous sequences and subsequently retraining the model.
In the past, anomaly detection from log data relied heavily on predefined rules. However, with the complexity of modern software systems, finding experts for every system component to write these rules has become difficult. Additionally, it is very labor-intensive to implement these rules. This has spurred interest in unsupervised anomaly detection methods.
The purpose of this thesis is to research which kind of methods can be used for automatic anomaly detection, what is required to use them in a production system, and how well deep learning-based methods would work with log data produced by hundreds of embedded devices. The thesis begins with a literature review to explore the various methods used for anomaly detection from log data. It then outlines the required infrastructure for efficient anomaly detection and concludes by testing the DeepLog Deep Learning method on real log data from a production system.
The key findings suggest that the DeepLog model performs effectively for anomaly detection when trained in an unsupervised fashion. However, it is essential to ensure that anomalous samples do not dominate the training data. This can be achieved either by completely excluding them from the training set or by ensuring that no single anomalous sample overwhelms the entire dataset, which could lead to overfitting. Moreover, the training dataset can be continuously refined by eliminating recognized anomalous sequences and subsequently retraining the model.