Theses and Dissertations

Permanent URI for this collectionhttps://hdl.handle.net/10315/42925

This collection consists of theses and dissertations produced by graduate students affiliated with the York Centre for Vision Research. These works represent significant contributions to the interdisciplinary field of vision science and have been approved in accordance with the academic standards of their respective graduate programs (including Biology, Digital Media, Electrical Engineering and Computer Science, Interdisciplinary Studies, Kinesiology & Health Science, Philosophy, Physics & Astronomy, Psychology, and others). This collection is managed and deposits authorized by the Coordinator for the Centre.

Browse

Now showing 1 - 20 of 223

Access status: Open Access ,
Lightning Artist Toolkit: A Hand-Drawn Volumetric Animation Pipeline
(2025-07-23) Fox-Gieg, Nicholas Allan; Graham Wakefield
This research contributes a set of methods for freely integrating live-action volumetric video with hand-drawn volumetric animation. The Kinect, the first consumer depth camera, arrived in 2010; in 2016, the HTC Vive headset introduced the first mass-market 6DoF controllers. Combined, these two advances unlocked a new approach to creating frame-by-frame animation with 6DoF drawing tools, which my research has developed as the Lightning Artist Toolkit—a complete pipeline for hand-drawn volumetric animation; at the time of writing, the only open-source example of its kind. The goal of the project is to make creation in 3D as expressive and intuitive as creation in 2D, by retaining the human gesture from its origins in hand-drawn animation on paper. Importing and manipulating scanned photographic images alongside handmade drawings has been a core feature of 2D image editing and animation tools for over fifty years. Initially, applying raster editing capabilities to real-world animation production was impractical—so the earliest hand-drawn computer-animated short films used 2D vector strokes. Today, operating naïvely on 3D voxels similarly requires excessive computational resources to be scaled up for even a few minutes of high-resolution footage, and working with 3D vector graphics representations offers a promising solution. At this project’s core is a collection of applied machine learning systems that transform live-action volumetric video into a sequence of volumetric brushstroke vectors. Integrated into a conventional animation workflow, this is suitable for the practical production of hand-drawn 3D animated short films in an XR drawing system. The contribution is less a computer vision challenge with an objective goal, as with for example point cloud segmentation, than it is an attempt to approximate the aesthetics of human vision—to generate a collection of brushstrokes from a point cloud that resembles what an artist might draw from scratch in XR, in imitation of a drawing process that records as markings the information from a scene that was subjectively important to an individual artist. In addition to supporting animation production through this workflow, this project also contributes a large public dataset of 3D drawings that may be usable in new and unexpected ways.
Access status: Open Access ,
A Solution for Scale Ambiguity in Generative Novel View Synthesis
(2025-04-10) Forghani, Fereshteh; Brubaker, Marcus
Generative Novel View Synthesis (GNVS) involves generating plausible unseen views of a scene given an initial view and the relative camera motion between the input and target views using generative models. A key limitation of current generative methods lies in their susceptibility to scale ambiguity, an inherent challenge in multi-view datasets caused by the use of monocular techniques to estimate camera positions from uncalibrated video frames. In this work, we present a novel approach to tackle this scale ambiguity in multi-view GNVS by optimizing the scales as parameters in an end-to-end fashion. We also introduce Sample Flow Consistency (SFC), a novel metric designed to assess scale consistency across samples with the same camera motion. Through various experiments, we demonstrate our approach yields improvements in terms of SFC, providing more consistent and reliable novel view synthesis.
Access status: Open Access ,
Modeling of Eye contact behavior
(2025-04-10) Hosseini, Zahra; Troje, Nikolaus
With the rise of online platforms and avatar-based communication, understanding eye contact a key non-verbal cue is crucial for trust in conversations. This study examines eye contact behavior across face-to-face interactions, a screen-sized window interface, and online meetings. We collected twelve hours of eye contact data from 48 individuals using eye trackers and motion capture in dyadic settings. Our analysis showed consistent eye contact patterns in face-to-face and screen-sized window interactions, while online meetings caused significant shifts due to the lack of direct eye contact. To model this behavior, we trained a diffusion model (DDPM) to generate synthetic eye movements that preserved key features of real data. We evaluated our model using metrics such as eye contact frequency. This study provides insights into how communication media influence gaze behavior and explores methods for generating realistic eye movements in conversational settings.
Access status: Open Access ,
Comparing The Kinematics Of Grasping Vs. Placing In Humans
(2025-04-10) Por Davoody, Niousha; Crawford, John Douglas
While many studies have examined reach-to-grasp movements, the placement component remains less explored. Grasping and placing tasks share some common characteristics, such as precise localization and orientation of the hand, but differ in cognitive intent and sensory feedback, with grasping relying more on visual input and placing on somatosensory feedback relative to surroundings. This study employs a within-subjects 2x2x2 design, examining the effects of Task (grasp vs. place), Object Orientation (clockwise vs. counterclockwise), and Target Location (left vs. right) in right-handed participants performing in near-total darkness. Each participant completed 160 randomized trials across eight conditions, tracking hand and eye movements via an OptiTrack system and eye tracker. Results revealed significant main effects for Task, Location, and Orientation, along with notable interactions. Contrary to the hypothesis, placing tasks were faster than grasping tasks and exhibited higher orientation errors. This result contradicts the expectation that placement would require more precise alignment, suggesting that the simplified placement task used in this study may rely more on visual feedback, which was absent, compared to grasping. Movements toward the right showed faster velocities and fewer errors, reflecting hemispheric motor advantages, while clockwise orientations were associated with lower orientation errors compared to counterclockwise orientations. Interaction effects between Location and Orientation influenced certain variables, highlighting the role of spatial and alignment demands in motor control. These findings suggest that while grasping and placing tasks share overlapping motor control processes, they also engage distinct mechanisms under specific spatial conditions.
Access status: Open Access ,
From Discrete to Continuous: Learning 3D Geometry from Unstructured Points by Random Continuous Space Queries
(2025-04-10) Jia, Meng; Kyan, Matthew J.
In this dissertation, we focus on generalizing recent point convolution methods and building well-behaved point-cloud 3D shape features to achieve more robust, invariant, and versatile implicit neural representations (INR) of 3D shapes. In recent efforts to explore point-cloud based learning methods to improve 3D shape analysis, there has been much attention paid to the use of INR-based frameworks. Existing methods, however, mostly formulate models with an encoder-decoder architecture that incorporates a global shape embedding space, which often fails to model fine-grained local details efficiently, limiting overall generalization performance. To overcome this problem, we propose a convolutional feature space sampling operation (Dual-Feature Sampling or DFS) and develop a novel INR learning framework (Stochastic Continuous Function Learning or SCFL). This framework is first adapted and evaluated for its use in surface reconstruction of generic objects from sparsely sampled point clouds, which is a task that has been extensively used to bench-mark INR 3D shape learning methods. This study demonstrates impressive capabilities of our method, namely: 1) an ability to faithfully recover fine details and uncommon shape characteristics; 2) improved robustness to point-cloud rotation; 3) flexibility to handle different levels of sparsity in the input point clouds; 4) significantly better generalization in the presence of unseen shape categories. In addition, the proposed DFS operator proposed for this framework is well-formulated and general enough that it can be easily made compatible for integration into existing systems designed to address more complex 3D shape tasks. In this work, we harness this powerful ability to represent shape, within a newly proposed SCFL-based occupancy network, applied to shape based processing problems in medical image registration and segmentation. Specifically, our network is adapted and applied to two different, traditionally challenging problems: 1) liver image-to-physical registration; and 2) tumour-bearing whole brain segmentation. In both of these tasks, significant deformation can severely degrade and hinder performance. We illustrate however, that accuracy in both tasks can be considerably improved over baseline methods using our proposed network. Finally, through the course of the investigations conducted, an intensive effort has been made throughout the dissertation to review, analyze and offer speculative insights into the features of these proposed innovations, their role in the configurations presented, as well as possible utility in other scenarios and configurations that may warrant future investigation. It is our hope that the work in this dissertation may help to spark new ideas to advance the state of the art in learning-based representation of 3D shapes and encourage more interest in novel applications of INR to solve real-world problems.
Access status: Open Access ,
Underwater gesture-based human-to-robot communication
(2025-04-10) Codd-Downey, Robert Frank; Jenkin, Michael
Underwater human to robot interaction presents significant challenges due to the harsh environment, including reduced visibility from suspended particulate matter and high attenuation of light and electromagnetic waves generally. Divers have developed an application-specific gesture language that has proven effective for diver-to-diver communication underwater. Given the wide acceptance of this language for underwater communication, it would seem an appropriate mechanism for diver to robot communication as well. Effective gesture recognition systems must address several challenges. Designing a gesture language involves balancing expressiveness and system complexity. Detection techniques range from traditional computer vision methods, suitable for small gesture sets, to neural networks for larger sets requiring extensive training data. Accurate gesture detection must handle noise and distinguish between repeated gestures and single gestures held for longer durations. Reliable communication also necessitates a feedback mechanism to allow users to correct miscommunications. Such systems must also deal with the need to recognize individual gesture tokens and their sequences, a problem that is hampered by the lack of large-scale labelled datasets of individual tokens and gesture sequences. Here these problems are addressed through weakly supervised learning and a sim2real approach that reduces by several orders of magnitude the effort required in obtaining the necessary labelled dataset. This work addresses this communication task by (i) developing a traditional diver and diver part recognition system (SCUBANetV1+), (ii) using this recognition within a weak supervision approach to train SCUBANetV2, a diver hand gesture recognition system, (iii) SCUBANetV2 recognizes individual gestures, and provides input to the Sim2Real trained SCUBALang LSTM network which translates temporal gesture sequences into phrases. This neural network pipeline effectively recognizes diver hand gestures in video data, demonstrating success in structured sequences. Each of the individual network components are evaluated independently, and the entire pipeline evaluated formally using imagery obtained in both the open ocean and in pool environments. As a final evaluation, the resulting system is deployed within a feedback structure and evaluated using a custom unmanned unwatered vehicle. Although this work concentrates on underwater gesture-based communication, the technology and learning process introduced here can be deployed in other environments for which application-specific gesture languages exist.
Access status: Open Access ,
Effects Of Tool Use And Perturbation During Motor Adaptation On Hand Localization In Immersive Virtual Reality
(2025-04-10) Khan, Maryum; Henriques, Denise
Our brain has a remarkable capacity for learning movements and adapting them to accomplish a motor goal. In many adaptation studies, participants move in a 2D plane while their hand is represented by a cursor. When visual feedback of hand position is misaligned, people can quickly compensate for this perturbation, show persistent reach aftereffects, and even misestimate the location of the unseen hand in the direction of previous visual training. However, it is unknown how well this generalizes to real-world settings or to the tools we use every day. Immersive virtual reality was used to test if end-effector shifts are also observed in more naturalistic virtual reality environments and if they extend to tools as end effectors. In the Hand Experiment, previous work from our lab was replicated where we found shifts in end-effector localization after adapting reach movements to a 30° and 60° visuomotor rotation of the hand, showing a similar magnitude of both shifts in where people indicate their perceived/felt hand and reach aftereffects following training to the perturbation in the VR environment. In the Pen Experiment, this paradigm was extended to investigate how well people can adapt when aiming with a common tool, like a pen, and whether the tool location is also recalibrated. The extent that the unseen location of hand-held tool, as well as the hand (in separate trials) recalibrates with adaptation was measured. Our results provide insight into the adaptative processes involved when learning to wield tools in more complicated, realistic environments.
Access status: Open Access ,
Normalized Moments for Photo-realistic Style Transfer
(2025-04-10) Canham, Trevor Dalton; Brown, Michael S.
Style transfer, the operation of matching appearance features between source and target images, is a complex and highly subjective problem. Due to the profundity of the concept of artistic style, the optimal solution is ill-defined, so the variety of approaches that have been proposed represent partial solutions to varying degrees of efficiency, usability and appearance of results. In this work a photo-realistic style transfer method for image and video is proposed that is based on vision science principles and on a recent mathematical formulation for the deterministic decoupling of features. As a proxy for mimicking the effects of camera color rendering or post processing, the employed features (the first through fourth order moments of the color distribution) represent important cues for visual adaptation and pre-attentive processing. The method is evaluated on the above criteria in a series of application relevant experiments and is shown to have results of high visual quality, without spatio-temporal artifacts, and validation tests in the form of observer preference experiments show that it compared very well with the state-of-the-art (deep learning, optimal transport, etc.) The computational complexity of the algorithm is low, and a numerical implementation that is amenable for real-time video application is proposed and demonstrated. Finally, general recommendations for photo-realistic style transfer are discussed.
Access status: Open Access ,
Investigating Pannexin1a-Mediated Mechanisms Of Pain And Neuroinflammation Using Zebrafish
(2025-04-10) Jeyarajah, Darren; Zoidl, George
This thesis explores the role of Panx1a in modulating pain and neuroinflammatory responses in zebrafish (Danio rerio). Nociception was induced using acetic acid (AA) treatments. Behavioral assays conducted on Panx1a knockout (KO) zebrafish larvae demonstrate significant alterations in response to acetic acid (AA)-induced pain. Pharmacological interventions using probenecid, a Panx1 inhibitor, and ibuprofen, a cyclooxygenase (COX) inhibitor, reveal their potential in modulating pain behaviors and rescuing nociceptive deficits. Furthermore, molecular analyses employing quantitative polymerase chain reaction (qPCR) and RNA sequencing (RNA-seq) elucidate the regulatory impact of Panx1a on gene expression related to nociception, neuroinflammation, and synaptic plasticity. In summary, this thesis provides evidence of Panx1a's involvement in pain and neuroinflammation, proposing zebrafish as a viable model for studying nociception.
Access status: Open Access ,
Sex-Related Differences In Visuomotor Skills, Cognition, And Emotionality Following Concussive Injury
(2025-04-10) Marks, CeAnn Alexia; Sergio, Lauren
Sex-related differences are commonly overlooked in most biomedical fields including concussion research. Much of the current concussion literature focuses on the analyses of males or a combined approach, lacking the separation of sexes for analytical purposes. Methods: Data were collected from 299 university athletes with varying concussion histories. Kinematic visuomotor measures and emotional symptoms were assessed through a basic visuomotor task and SCAT self-report measures. Results: Visuomotor performance varied substantially with concussion history and sex, with multiple concussions being linked to better performance. Emotionality results revealed females with 2 or more concussions have higher odds of being irritable, while 21-22-year-old females have lower odds of being nervous/anxious compared to their younger counterparts. No significant emotionality results were discovered for males. Conclusion: This study underscores distinctive recovery metrics between sexes in emotional and visuomotor domains following concussive injury. Findings suggest the need for tailored diagnostics and treatment for athletes following injury.
Access status: Open Access ,
Developing A Non-Human Primate Model To Dissect The Neural Mechanism Of Facial Emotion Processing
(2025-04-10) Taghian Alamooti, Shirin; Kar, Kohitij
Facial emotion recognition is a cornerstone of social cognition, vital for interpreting social cues and fostering communication. Despite extensive research in human subjects, the neural mechanisms underlying this process remain incompletely understood. This thesis investigates these mechanisms using a non-human primate model to provide deeper insights into the neural circuitry involved in facial emotion processing. We embarked on a comparative analysis of facial emotion recognition between humans and rhesus macaques. Using a carefully curated set of facial expression images from the Montreal Set of Facial Displays of Emotion (MSFDE), we designed a series of binary emotion discrimination tasks. Our innovative approach involved detailed behavioral metrics that revealed significant parallels in emotion recognition patterns between the two species. These findings highlight the macaques’ potential as a robust model for studying human-like facial emotion recognition. Building on these behavioral insights, the second phase of our research delved into the neural underpinnings of this cognitive process. We conducted large-scale, chronic multi-electrode recordings in the inferior temporal (IT) cortex of rhesus macaques. By mapping the neural activity associated with the classification of different facial emotions, we uncovered specific neural markers that correlate strongly with behavioral performance. These neural signatures provide compelling evidence for the role of the IT cortex in processing complex emotional cues. Our findings bridge the gap between behavioral and neural perspectives on facial emotion recognition, offering a comprehensive understanding of the underlying mechanisms. This research not only underscores the evolutionary continuity of social cognition across primate species but also sets the stage for future explorations into the neural basis of emotion processing. The integration of behavioral analysis with advanced neural recording techniques presents a powerful framework for advancing our knowledge of social cognition and its disorders.
Access status: Open Access ,
Brain Responses To Symmetries In Naturalistic Novel Three-Dimensional Objects
(2025-04-10) Ragavaloo, Shenoa; Kohler, Peter
Human brains are sensitive to symmetry, especially vertical reflection, which is present in human faces and many other biological forms. However, symmetries in most visual scenes are rotated relative to the observer’s viewing location, failing to produce symmetry in the retinal image. We investigated the differences between perspective-distorted symmetry, and images that produce symmetry on the retina, and measured the association between responses to symmetry and autism spectrum disorder (ASD). We found that perspective-distorted symmetry with cues to 3D shape elicited responses, and both image-level and perspective distorted 3D symmetry elicited stronger responses than 2D symmetry. 3D image-level symmetry created stronger responses than 3D perspective-distorted symmetry. Lastly, there was no association between responses to symmetry and ASD. We conclude that symmetry processing occurs in the absence of a symmetry-related task, even for perspective-distorted symmetry. Additionally, there may not be any association between conditions that affect global processing and symmetry processing.
Access status: Open Access ,
Role(s) Of Pannexin1A/B In The Physiology Of The Zebrafish Visual System
(2025-04-10) Houshangi-Tabrizi, Sarah; Zoidl, Georg R.
Panx1 proteins are glycosylated integral membrane channels with unique conduction properties, functioning as an ATP channel and non-selective ion channel in different physiological pathways. In zebrafish, the mammalian Panx1 ohnologues, Panx1a and Panx1b, have distinct tissue expression patterns. We previously demonstrated that in the retina, Panx1a is localized in the horizontal cell layer and the ON/OFF ganglion cell layer, while Panx1b protein is present in the horizontal cell layer, ganglion cell layer, and in the end-feet of the Muller glia astrocytes. Here we investigated the optic flow response in the Panx1a-/- and Panx1b-/- 6dpf larvae utilizing molecular, systems, and behavioral assays. The RNA-seq analysis revealed broad regulation of genes involved in axon guidance, retinal axon guidance, astrocytes, axons, dendrites, and synapse, confirmed by RT-qPCR in the 3dpf and 6dpf Panx1a-/- and Panx1b-/-. We demonstrate that Panx1a-/- and Panx1b-/- display an inability to make a leftward and rightward directional motion in low light contrast conditions when exposed to the left and right moving gratings. We also show how the strategic localization of Panx1a and Panx1b in the habenula region modulates visually guided behavior. Lastly, Panx1a-/- and Panx1b-/- demonstrate the inability to generate functional saccades and display ocular motor deficiencies linked to potential neurological disorders. These findings suggest that Panx1 modulates the axonal growth in axon guidance pathfinding and together are interconnected to the habenula region, leading to synaptic plasticity of the retinal neural circuitry, and regulating visually guided locomotion in the zebrafish larvae.
Access status: Open Access ,
Effect of Dynamic Visual Cues during Dynamic Balance on Perceived Movement and Postural Responses
(2025-04-10) Jaksic, Kayton Yanko; Cleworth, Taylor
To maintain balance, there are complex interactions that take place between one’s environment and sensory systems. Correctly perceiving one’s own movement may dictate the strategy one uses to maintain upright. While perceived movement has been studied, there is a lack of research on perceived self-motion and postural responses when visually perturbed during a dynamic balance task. Virtual reality (VR) was used to elicit visual perturbations while the support surface was translating +/- 5cm. Ground reaction forces were measured using a force plate and kinematics were collected using 3D motion capture. Perceived sway amplitude was tracked through a controller compatible with the VR head mounted display (HMD). Postural responses were significantly greater in anterior conditions. There was an observed disconnect between perceived and actual movement, where quotient amplitude measures increased in posterior conditions. Biomechanical constraints, threat-modulated behaviour and an increased conscious control of movement may explain these findings.
Access status: Open Access ,
Realism and Features Supporting Lightness Constancy in Virtual Scenes
(2025-04-10) Patel, Khushbu Yogeshbhai; Murray, Richard F.
Lighting and surface properties play an important role in visual perception. Our visual system decodes two-dimensional retinal images to discern potential three-dimensional scenes. A particular challenge in the context of achromatic lights and surfaces is lightness constancy — the ability to maintain a consistent perception of an object’s reflectance despite varying illumination conditions. Although humans are generally adept at maintaining lightness constancy, it is not perfect. This dissertation examines lightness constancy within both real-world and virtual environments, including flat-panel displays and virtual reality (VR). Chapter 2 evaluates lightness constancy through an asymmetric lightness matching task across a shadow boundary, using physical surfaces, a flat-panel display, and an immersive VR environment. While the VR condition exhibited realistic levels of lightness constancy, participants showed significantly lower levels of constancy in flat-panel display compared to the physical environment. Notably, participant variability was more pronounced in both virtual environments. In Chapter 3 the study extends to lightness matching across various 3D orientations using both physical surfaces and in VR. The findings reveal that lightness constancy is significantly weaker in VR compared to physical environments. Building on Chapter 3, Chapter 4 further evaluates lightness constancy using the same task, but incorporates more realistic and accurate rendering techniques, along with realistic materials, in VR. Contrary to expectations, the results show no notable improvement in lightness constancy, underscoring the persistent challenges in achieving realism for tasks evaluating lightness across 3D orientations in VR. Despite robust 3D shape, lighting, and depth cues available in VR, constancy is significantly worse in VR and our understanding of lightness perception fails to explain why. This discrepancy highlights a gap in our knowledge, pointing to potentially overlooked factors critical for accurate lightness perception, such as fine material details or subtle surface textures. In conclusion, the findings in this dissertation suggest that VR is a reasonable proxy for real-world scenarios in tasks when lightness is judged across a shadow boundary, but current technology falls short in replicating realistic lightness constancy when image luminance varies from one location to another due to differences in 3D orientation relative to a light source.
Access status: Open Access ,
Examining Eye Movements In Response To Repeated Exposure To Building Stimuli
(2025-04-10) Yerkeyev, Maxym Sergiyovy; Freud, Erez
Research has consistently indicated a strong connection between eye movements and memory processes. This thesis explores this relationship by examining how repeated exposure to AI-generated images of buildings influences eye movements and memory. Twenty-four participants viewed 120 building images across four levels of repetition (novel, once, three, and five times), while their eye movements were recorded, and their memory assessed. The results showed significant repetition effects on both eye movements and memory. Eye movement measures revealed a decrease in fixation count and saccadic amplitude, and an increase in fixation duration, with increased repetitions. Memory measures revealed improved recognition and confidence, with increased repetitions. These repetition effects align with previous studies on faces and scenes, suggesting that the effects supersede differences in how specific object categories are processed. Overall, this thesis demonstrates that memory and oculomotor systems are associated in processing buildings, just as they are in processing faces and scenes.
Access status: Open Access ,
Factors in Perceptual Shape Completion
(2025-04-10) Chosang, Tenzin; Elder, James H.
Humans rely on bounding contours to segment scenes and recognize objects in our 3D world, despite the challenge of occlusions that partially block objects in the visual field. Our ability to perceptually complete these occluded contours, filling in missing fragments, plays a crucial role in object recognition. Previous research has assessed local and global methods for shape completion based on their objective accuracy, but not in relation to human perceptual completion. In this thesis, observers viewed partially erased bounding contours, simulating occlusion, and adjusted the position of a dot along a virtual line where the contour would likely continue. The missing intervals ranged from 10% to 50% of the total shape. Analysis of objective error and bias revealed that humans use more than local cues in shape completion, indicating a complex integration of information to perceptually restore missing contours.
Access status: Open Access ,
Gaze-Contingent Multi-Modal and Multi-Sensory Applications
(2024-11-07) Vinnikov, Margarita; Allison, Robert
Gaze-contingent displays are applications that are driven by the user's gaze. They are an important tool for many multi-modal and multi-sensory domains. They can be used to precisely control the retinal image in real time to study visual control of natural behaviour through experimentation, or to improve user experience in virtual reality applications. In this dissertation, I explored the application of gaze-contingent display technology to dierent models and senses and evaluate whether such applications can be useful for simulation, psychophysical research and human-computer interaction. Specically, I have looked at the visual gaze-contingent display and an audio gaze-contingent display. I examined the effects of simulated visual defects on user's perception and control of self-motion during locomotion. I found that gaze-contingent display simulations of visual defects signicantly altered visual patterns and impaired the accuracy and precision of judgement of heading. I also examined the impact of simulating gaze-contingent depth-of-field for monocular and stereoscopic displays. The experimental data showed that the alleviation of negative eects associated with stereo displays depends on the user's age and the types of scenes that are viewed. Finally, I simulated gaze-contingent audio displays that imitated the cocktail party effect. My audio enhancement techniques turned to be very benecial for applications that have to deal with user's attention to multiple sources of sounds (speakers) such as teleconferences and social games. Finally, in this dissertation, I demonstrated that gaze-contingent systems can be used in many aspects of virtual system design and if combined together (used for multiple cues and senses) can be a very powerful tool for augmenting and improving the overall user experience.
Access status: Open Access ,
Revisiting gamut expansion for color space conversion
(2024-11-07) Le, Hoang Minh; Brown, Michael S.
Cameras and image-editing software are capable of processing images in the wide-gamut ProPhoto color space, which encompasses 90\% of all visible colors. However, when images are prepared for sharing, this rich color representation is transformed and clipped to fit within the smaller-gamut standard RGB (sRGB) color space, which represents only 30\% of visible colors. Recovering the lost color information poses a challenge due to this clipping procedure. We propose three methods to address this issue. The first method proposes a deep neural network specifically trained for wide-gamut color restoration, utilizing datasets generated by a software-based camera image signal processor that produces pairs of ProPhoto and sRGB images. The second method implements a technique that incorporates a small set of color values sampled from the original ProPhoto image that is saved together with the final smaller-gamut sRGB image. The method then uses the subsampled wide-gamut color values to estimate the original ProPhoto image from the sRGB image. The third method proposes a lightweight multi-layer perceptron (MLP) trained on pairs of ground truth and clipped ProPhoto values during the gamut compression phase. The MLP is saved as metadata in the sRGB image and can later be used to predict and restore the original wide-gamut colors during the gamut expansion phase. Additionally, we have created several large-scale public datasets of wide-gamut/small-gamut image pairs to support research on color space conversion.
Access status: Open Access ,
Image White Balance for Multi-Illuminant Scenes
(2024-11-07) Arora, Aditya; Derpanis, Konstantinos G.
Performing white-balance (WB) correction for scenes with multiple illuminants remains a challenging task in computer vision. Most previous methods estimate per-pixel scene illumination directly in the RAW sensor image space. Recent work explored an alternative fusion strategy, where a neural network fuses multiple white-balanced versions of the input image processed to sRGB using pre-defined white-balance settings. Inspired by this line of work, we present two contributions targeting fusion-based multi-illuminant WB correction. First, we introduce a large-scale multi-illumination dataset rendered from RAW images to support training fusion models and evaluation. The dataset comprises over 16,000 sRGB images with ground truth sRGB white-balance corrected images. Next, we introduce an attention-based architecture to fuse five white-balance settings. This architecture yields an improvement of up to 25% over prior work.

Browse

Recent Submissions