An Eye for AI: A Multimodal Bottleneck Transformer Approach for Predicting Individual Eye Movements : Towards Foundation Models for Human Factors & Neuroscience
Dolmans, Tenzing (2023-06-19)
An Eye for AI: A Multimodal Bottleneck Transformer Approach for Predicting Individual Eye Movements : Towards Foundation Models for Human Factors & Neuroscience
Dolmans, Tenzing
(19.06.2023)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2023072691718
https://urn.fi/URN:NBN:fi-fe2023072691718
Tiivistelmä
Human perception has been a subject of study for centuries. Various eye tracking methods in many study designs have shed light on individual differences in perception and visual navigation. However, accurately identifying individuals based on gaze behaviour remains a challenge. Artificial intelligence (AI) based methods have led to large successes in domains such as vision and language; they are also making their introduction in human factors & neuroscience (HFN). Leveraging AI for HFN requires quantities of data several orders of magnitude larger than the field is used to organising; there exists a clear discrepancy in the standardisation of data publication. In this work, we work towards foundation models (FM) for HFN by highlighting important data insights from AI. A multimodal bottleneck transformer is proposed, a model architecture that can effectively and efficiently represent and work with the varying modalities encountered in HFN. Results indicate that classification of individuals and prediction of gaze is possible, given more training data.