Investigating and Modeling the Effects of Task and Context on Drivers' Attention

Date

2024-07-18

Authors

Kotseruba, Iuliia

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Driving, despite its widespread nature, is a demanding and inherently risky activity. Any lapse in focus, such as failing to look at the traffic signals or not noticing the actions of other road users, can lead to severe consequences. Technology for driver monitoring and assistance aims to mitigate these issues, but requires a deeper understanding of how drivers observe their surroundings to make decisions.

In this dissertation, we investigate the link between where drivers look, tasks they perform, and the surrounding context. To do so, we first conduct a meta-study of the behavioral literature that documents an overwhelming importance of the top-down (task-driven) effects on gaze. Next, we survey applied research to show that most models do not necessarily make this connection and instead establish correlations between where the drivers looked and images of the scene, without explicitly considering drivers' actions and environment.

Next, we annotate and analyze the four largest publicly available datasets that contain driving footage and eye-tracking data. The new annotations for task and context show that data is dominated by trivial scenarios (e.g. driving straight, standing) and help uncover problems with the typical data recording and processing pipelines that result in noisy, missing, or inaccurate data, particularly during safety-critical scenarios (e.g. intersections). For the only dataset with the raw data available, we create a new ground truth which alleviates some of the discovered issues. We also provide recommendations for future data collection.

Using the new annotations and ground truth, we benchmark a representative set of bottom-up models for gaze prediction (i.e. those that do not represent the task explicitly). We conclude that while corrected ground truth boosts performance, the implicit representation is not sufficient to capture the effects of task and context on where drivers look.

Lastly, motivated by these findings, we propose a task- and context-aware model for drivers' gaze prediction with explicit representation of the drivers' actions and context. The first version of the model, SCOUT, improves state-of-the-art performance by over 80% overall and 30% on the most challenging scenarios. We then propose SCOUT+, which relies on the more readily available route and map information similar to what the driver might see on the in-car navigation screen. SCOUT+ achieves comparable results as the version that uses more precise numeric and text labels.

Description

Keywords

Computer science, Artificial intelligence, Cognitive psychology

Citation