Data analysis with limited data availability: prostate cancer prediction and characterization as a case study
Montoya Perez, Ileana (2024-05-17)
Data analysis with limited data availability: prostate cancer prediction and characterization as a case study
Montoya Perez, Ileana
(17.05.2024)
Turun yliopisto
Julkaisun pysyvä osoite on:
https://urn.fi/URN:ISBN:978-951-29-9646-9
https://urn.fi/URN:ISBN:978-951-29-9646-9
Tiivistelmä
Research studies conducted on limited datasets (i.e., data from tens to maximum hundreds of observations) may be the only practical option for many research areas, as data collection might be costly, complex, or both. Data analysis on these datasets is challenging as it can lead to inaccurate results. In this thesis, we addressed this challenge in the context of prostate cancer research by empirically assessing the predictive and characterization capabilities of attributes with the following objectives: to evaluate the predictive power of features extracted from prostate magnetic resonance imaging (MRI) using cross-validation techniques, to develop and evaluate a cross-validation method for small sample sizes that allow receiver operating characteristic (ROC) analysis, and to identify and compare relevant predictors among MRI features, clinical variables, gene expressions, and kallikreins for prostate cancer detection and stratification. To achieve these objectives, we used data from approved studies and registered clinical trials at Turku University Hospital, involving a strong collaboration between university departments and hospitals. This collaboration enabled the collection of diverse, high-quality features to enhance prostate cancer diagnosis and prognosis research.
The results of this thesis can be summarized as follows. First, when evaluating radiomic features from various MRI modalities, our findings demonstrate the potential that these features have in stratifying prostate tumors into low- and highrisk. Second, in terms of model evaluation using ROC analysis and cross-validation, our research highlights a significant negative bias in the area under the ROC curve when estimated by leave-one-out (LOOCV) and introduces a novel cross-validation method called tournament leave-pair-out (TLPOCV) as a more reliable method for ROC analysis than LOOCV. Finally, our results provide empirical evidence of the predictive potential that quantitative and qualitative features from MRI, clinical variables, gene expressions, and kallikreins—individually and in combination—have in detecting and stratifying prostate cancer.
The findings in this research are of interest not only to medical professionals and healthcare providers engaged in prostate cancer research but also to those involved in analyzing and learning from size-constrained datasets while achieving clinically meaningful evaluation outcomes.
The results of this thesis can be summarized as follows. First, when evaluating radiomic features from various MRI modalities, our findings demonstrate the potential that these features have in stratifying prostate tumors into low- and highrisk. Second, in terms of model evaluation using ROC analysis and cross-validation, our research highlights a significant negative bias in the area under the ROC curve when estimated by leave-one-out (LOOCV) and introduces a novel cross-validation method called tournament leave-pair-out (TLPOCV) as a more reliable method for ROC analysis than LOOCV. Finally, our results provide empirical evidence of the predictive potential that quantitative and qualitative features from MRI, clinical variables, gene expressions, and kallikreins—individually and in combination—have in detecting and stratifying prostate cancer.
The findings in this research are of interest not only to medical professionals and healthcare providers engaged in prostate cancer research but also to those involved in analyzing and learning from size-constrained datasets while achieving clinically meaningful evaluation outcomes.
Kokoelmat
- Väitöskirjat [2889]