The mathematics of neural networks behind image recognition
Jäppinen, Jarno (2024-05-24)
The mathematics of neural networks behind image recognition
Jäppinen, Jarno
(24.05.2024)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2024061049172
https://urn.fi/URN:NBN:fi-fe2024061049172
Tiivistelmä
This thesis aims to present a special case of neural network for image recognition: the convolutional neural network. Before that, the neural network in general, its components, architecture and operation will be presented. The idea of a convolutional neural network is to reduce the mathematical computation by focusing more on extracting the information that is necessary for recognition. With the convolution neural network technique, networks can be made quite large and complex without the computational needs increasing proportionally. In this paper, we will introduce the mathematics needed to train a neural network. The computation is based on optimizing the weight coefficients of the network by minimizing the error function using a backpropagation algorithm.
The work also includes a simple fully connected neural network and a more complex convolutional neural network trained on two different datasets. The CIFAR-10 and Fashion-MNIST datasets have been used. The CIFAR-10 dataset consists of small coloured images which were more challenging to recognize with the models compared to the Fashion-MNIST dataset where the images are greyscale and slightly smaller. With the CIFAR-10 data, the convolutional neural network learned with a fairly good accuracy in recognition while the fully connected neural network did not achieve very good learning results. With the Fashion-MNIST dataset, both models, fully connected and convolutional, learned to recognize objects quite well.
I conclude that the learning of recognition is particularly influenced by the content of the material. If the images are clear and contain only one object, classification is easy, as it is for humans. But if there are many different things in the picture it is much harder for a human as well as a machine to tell what is in the picture. The amount of computation required is also greatly affected by the size of the images and the number of different classification categories.
The work also includes a simple fully connected neural network and a more complex convolutional neural network trained on two different datasets. The CIFAR-10 and Fashion-MNIST datasets have been used. The CIFAR-10 dataset consists of small coloured images which were more challenging to recognize with the models compared to the Fashion-MNIST dataset where the images are greyscale and slightly smaller. With the CIFAR-10 data, the convolutional neural network learned with a fairly good accuracy in recognition while the fully connected neural network did not achieve very good learning results. With the Fashion-MNIST dataset, both models, fully connected and convolutional, learned to recognize objects quite well.
I conclude that the learning of recognition is particularly influenced by the content of the material. If the images are clear and contain only one object, classification is easy, as it is for humans. But if there are many different things in the picture it is much harder for a human as well as a machine to tell what is in the picture. The amount of computation required is also greatly affected by the size of the images and the number of different classification categories.