AI-assisted Software Development Effort Estimation
Koskinen, Karri (2021-06-07)
AI-assisted Software Development Effort Estimation
Koskinen, Karri
(07.06.2021)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2021060835687
https://urn.fi/URN:NBN:fi-fe2021060835687
Tiivistelmä
Effort estimation is a critical aspect of software project management. Without accurate estimates of the developer effort a particular project will require, the project's timeline and resourcing cannot be efficiently planned, which greatly increases the likelihood of the project failing to meet at least some of its goals.
The goal of this thesis is to apply machine learning methods to analyze the work hour data logged by individual employees in order to provide project management with useful estimations of how much more effort it will take to finish a given project, and how long that will take. The work is conducted for ATR Soft Oy, using the data from their internal work hour logging tool.
At first a literature review is conducted to determine what kind of estimation methods and tools are currently used in the software industry, and what kind of objectives and requirements organizations commonly set for their estimation processes. The basics of machine learning are explained, and a brief look is taken at how machine learning is currently used to support software engineering and project management. The literature review revealed that while machine learning methods have been applied to software project estimation for decades at this point, such data-driven methods generally suffer from a lack of relevant historical project data, and thus aren't commonly used in the industry.
Initial insights were gathered from the work hour data and analysis goals were refined accordingly. The data was pre-processed to a form suitable for training machine learning models. Two different modeling scenarios were tested: Creating a single general model from all available data, and creating multiple project-specific models of a more limited scope.
The modeling performance data indicates that machine learning models based on work hour data are capable of achieving better results in some situations than traditional expert estimation. The models developed here are not reliable enough to be used as the sole estimation method, but can provide useful additional information to support decision making.
The goal of this thesis is to apply machine learning methods to analyze the work hour data logged by individual employees in order to provide project management with useful estimations of how much more effort it will take to finish a given project, and how long that will take. The work is conducted for ATR Soft Oy, using the data from their internal work hour logging tool.
At first a literature review is conducted to determine what kind of estimation methods and tools are currently used in the software industry, and what kind of objectives and requirements organizations commonly set for their estimation processes. The basics of machine learning are explained, and a brief look is taken at how machine learning is currently used to support software engineering and project management. The literature review revealed that while machine learning methods have been applied to software project estimation for decades at this point, such data-driven methods generally suffer from a lack of relevant historical project data, and thus aren't commonly used in the industry.
Initial insights were gathered from the work hour data and analysis goals were refined accordingly. The data was pre-processed to a form suitable for training machine learning models. Two different modeling scenarios were tested: Creating a single general model from all available data, and creating multiple project-specific models of a more limited scope.
The modeling performance data indicates that machine learning models based on work hour data are capable of achieving better results in some situations than traditional expert estimation. The models developed here are not reliable enough to be used as the sole estimation method, but can provide useful additional information to support decision making.