Investigation of the performance of OpenPose pose estimation system in real-world environments
Jousjärvi, Niko (2023-02-06)
Investigation of the performance of OpenPose pose estimation system in real-world environments
Jousjärvi, Niko
(06.02.2023)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
suljettu
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2023022027886
https://urn.fi/URN:NBN:fi-fe2023022027886
Tiivistelmä
The last decade has been a golden age of machine learning, data-analytics and artificial
intelligence. These fields see large scale developments every year, and one of the most exciting
fields in the space of artificial intelligence is machine vision. Machine vision allows us to make
systems that can analyze what is happening in an environment much better than any human
ever could. With the help of machine vision we can automate industries that have normally been
exclusively operated by humans, like convenience stores and businesses. We can also create
new surveillance systems that track and analyze people’s behavior and movement in a space,
for general safety, or for optimizing the layout and inner workings of these spaces. With machine
vision we can have large grocery stores with only a handful of employees, and an artificial
intelligence that tracks customers and the items that they have taken from the shelves.
The field of machine vision is vast, and in this thesis we are scratching the surface of the
systems and methods that exist today. In this thesis we look at so-called pose estimation
methods which are designed to recognize human subjects from an image and then evaluate the
poses of each subject. We are also taking a closer look at one of these methods, called
OpenPose, and testing how it performs when given video footage taken from real-world
environments with multiple subjects moving in the scenes at the same time. The video footage
is provided by the VIRAT video data-set.
The results of our analysis show that OpenPose can perform well in optimal scenarios,
but it has trouble when analyzing footage that has subjects far away from the video source,
grouped together tightly, or when the scene is generally busy with objects other than humans,
like a city street. Overall the results we have gotten are inconclusive, not only because we are
only taking a look at OpenPose and not a set of different pose estimation methods, but also
because the amount of video-data in the VIRAT video data-set is not enough for making
conclusive assumptions on the performance of OpenPose. With additional video-footage and a
broader methods of analysis we could see how these systems perform in real-world conditions.
intelligence. These fields see large scale developments every year, and one of the most exciting
fields in the space of artificial intelligence is machine vision. Machine vision allows us to make
systems that can analyze what is happening in an environment much better than any human
ever could. With the help of machine vision we can automate industries that have normally been
exclusively operated by humans, like convenience stores and businesses. We can also create
new surveillance systems that track and analyze people’s behavior and movement in a space,
for general safety, or for optimizing the layout and inner workings of these spaces. With machine
vision we can have large grocery stores with only a handful of employees, and an artificial
intelligence that tracks customers and the items that they have taken from the shelves.
The field of machine vision is vast, and in this thesis we are scratching the surface of the
systems and methods that exist today. In this thesis we look at so-called pose estimation
methods which are designed to recognize human subjects from an image and then evaluate the
poses of each subject. We are also taking a closer look at one of these methods, called
OpenPose, and testing how it performs when given video footage taken from real-world
environments with multiple subjects moving in the scenes at the same time. The video footage
is provided by the VIRAT video data-set.
The results of our analysis show that OpenPose can perform well in optimal scenarios,
but it has trouble when analyzing footage that has subjects far away from the video source,
grouped together tightly, or when the scene is generally busy with objects other than humans,
like a city street. Overall the results we have gotten are inconclusive, not only because we are
only taking a look at OpenPose and not a set of different pose estimation methods, but also
because the amount of video-data in the VIRAT video data-set is not enough for making
conclusive assumptions on the performance of OpenPose. With additional video-footage and a
broader methods of analysis we could see how these systems perform in real-world conditions.