Metaheuristic Algorithms Enhanced Multi-Modal Sensor Fusion for Object Recognition in Autonomous Vehicles
Ghosh, Debom (2025-02-06)
Metaheuristic Algorithms Enhanced Multi-Modal Sensor Fusion for Object Recognition in Autonomous Vehicles
Ghosh, Debom
(06.02.2025)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
suljettu
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025021011162
https://urn.fi/URN:NBN:fi-fe2025021011162
Tiivistelmä
This research details a sensor fusion framework for autonomous vehicles. Our hy-
brid architecture brings together two datasets— KITTI and BDD100k—to serve a
singular purpose. That is, our primary contribution is the development of an adap-
tive feature selection mechanism that dynamically tunes to different sensor fusion
algorithms. The real-time tuning of the tunable parameters results in an optimiza-
tion of the data being fused from the vehicle’s heterogeneous, multi-modal sensor
suite—specifically, from a 64-channel (or 32-channel) LiDAR and an array of multi-
spectral cameras.
The architecture proposed here introduces three key innovations. First is the hier-
archical feature extraction pipeline, which uses spatio-temporal correlations across
modalities to perform feature extraction in a computationally efficient way and to a
high degree of accuracy. Second is the adaptive weight optimization strategy, which
uses different sensor modalities to make the necessary changes to the weights of the
features extracted from the different environmental sensors. The third innovation
is an optimization framework that works in real-time and maintains a high degree
of visual fidelity while also being robust enough to perform well across many dif-
ferent kinds of scenes. The overall architecture is thoroughly validated on several
industry-standard datasets (i.e., BDD and KITTI), and the methods used therein
are evaluated over several ablation studies to verify the necessity and sufficiency of
all the components of the architecture.
The framework’s reliability across different operational situations is confirmed by
thorough validation through 10-fold cross-validation and statistical significance test-
ing (p < 0.01). It appears to work particularly well in trying conditions—like signif-
icant rain or fog, or nighttime—where old-fashioned, single-sensor approaches often
perform poorly.
brid architecture brings together two datasets— KITTI and BDD100k—to serve a
singular purpose. That is, our primary contribution is the development of an adap-
tive feature selection mechanism that dynamically tunes to different sensor fusion
algorithms. The real-time tuning of the tunable parameters results in an optimiza-
tion of the data being fused from the vehicle’s heterogeneous, multi-modal sensor
suite—specifically, from a 64-channel (or 32-channel) LiDAR and an array of multi-
spectral cameras.
The architecture proposed here introduces three key innovations. First is the hier-
archical feature extraction pipeline, which uses spatio-temporal correlations across
modalities to perform feature extraction in a computationally efficient way and to a
high degree of accuracy. Second is the adaptive weight optimization strategy, which
uses different sensor modalities to make the necessary changes to the weights of the
features extracted from the different environmental sensors. The third innovation
is an optimization framework that works in real-time and maintains a high degree
of visual fidelity while also being robust enough to perform well across many dif-
ferent kinds of scenes. The overall architecture is thoroughly validated on several
industry-standard datasets (i.e., BDD and KITTI), and the methods used therein
are evaluated over several ablation studies to verify the necessity and sufficiency of
all the components of the architecture.
The framework’s reliability across different operational situations is confirmed by
thorough validation through 10-fold cross-validation and statistical significance test-
ing (p < 0.01). It appears to work particularly well in trying conditions—like signif-
icant rain or fog, or nighttime—where old-fashioned, single-sensor approaches often
perform poorly.