DISENTANGLING VISUAL CONCEPTS ACROSS SPACE AND TIME: FROM IMAGE HIERARCHIES TO VIDEO DYNAMICS

Loading...
Thumbnail Image

Authors

Kowal, Matthew Paul

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This dissertation advances the interpretability of deep vision models, with a particular focus on disentangling representations across space, layers, and time. As deep learning systems increasingly underpin critical applications, understanding their internal representations and decision-making processes is essential. The dissertation is structured into three parts that address this challenge from complementary perspectives. The first part introduces a framework for quantifying static and dynamic information in spatiotemporal models, offering a principled measure of how such models encode temporal dependencies compared with static counterparts. The second part presents a novel methodology for discovering and localizing semantically meaningful concepts within these spatiotemporal models, facilitating a deeper understanding of the internal features that drive predictions. The third part extends this analysis by identifying interlayer concept circuits, i.e., structured pathways through which concepts propagate across layers, revealing how information flows and transforms within deep image models. Together, these contributions provide a toolkit for interpreting complex neural architectures and lay the groundwork for more transparent and accountable artificial intelligence systems in dynamic visual domains. Overall, this dissertation provides new tools and insights for understanding the multilayered and spatiotemporal characteristics of deep vision models.

Description

Keywords

Computer science, Artificial intelligence

Citation