Centre for Vision Research
Permanent URI for this community
Browse
Browsing Centre for Vision Research by Title
Now showing 1 - 20 of 230
Results Per Page
Sort Options
Item Open Access 3D Modelling for Improved Visual Traffic Analytics(2018-08-27) Soto, Eduardo R Corral; Elder, James H.Advanced Traffic Management Systems utilize diverse types of sensor networks with the goal of improving mobility and safety of transportation systems. These systems require information about the state of the traffic configuration, including volume, vehicle speed, density, and incidents, which are useful in applications such as urban planning, collision avoidance systems, and emergency vehicle notification systems, to name a few. Sensing technologies are an important part of Advanced Traffic Management Systems that enable the estimation of the traffic state. Inductive Loop Detectors are often used to sense vehicles on highway roads. Although this technology has proven to be effective, it has limitations. Their installation and replacement cost is high and causes traffic disruptions, and their sensing modality provides very limited information about the vehicles being sensed. No vehicle appearance information is available. Traffic camera networks are also used in advanced traffic monitoring centers where the cameras are controlled by a remote operator. The amount of visual information provided by such cameras can be overwhelmingly large, which may cause the operators to miss important traffic events happening in the field. This dissertation focuses on visual traffic surveillance for Advanced Traffic Management Systems. The focus is on the research and development of computer vision algorithms that contribute to the automation of highway traffic analytics systems that require estimates of traffic volume and density. This dissertation makes three contributions: The first contribution is an integrated vision surveillance system called 3DTown, where cameras installed at a university campus together with algorithms are used to produce vehicle and pedestrian detections to augment a 3D model of the university with dynamic information from the scene. A second major contribution is a technique for extracting road lines from highway images that are used to estimate the tilt angle and the focal length of the camera. This technique is useful when the operator changes the camera pose. The third major contribution is a method to automatically extract the active road lanes and model the vehicles in 3D to improve the vehicle count estimation by individuating 2D segments of imaged vehicles that have been merged due to occlusions.Item Open Access A 360-degree Omnidirectional Photometer Using a Ricoh Theta Z1(2023-12-08) MacPherson, Ian Michael; Brown, Michael S.Spot photometers measure the luminance emitted or reflected from a small surface area in a physical environment. Because the measurement is limited to a "spot," capturing dense luminance readings for an entire environment is impractical. This thesis demonstrates the potential of using an off-the-shelf commercial camera to operate as a 360-degree luminance meter. The method uses the Ricoh Theta Z1 camera, which provides a full 360-degree omnidirectional field of view and an API to access the camera's minimally processed RAW images. Working from the RAW images, this thesis describes a calibration method to map the RAW images under different exposures and ISO settings to luminance values. By combining the calibrated sensor with multi-exposure high-dynamic-range imaging, a cost-effective mechanism for capturing dense luminance maps of environments is provided. The results show that the Ricoh Theta calibrated as a luminance meter performs well when validated against a significantly more expensive spot photometer.Item Open Access A Cloud-Based Extensible Avatar For Human Robot Interaction(2019-07-02) AlTarawneh, Enas Khaled Ahm; Jenkin, MichaelAdding an interactive avatar to a human-robot interface requires the development of tools that animate the avatar so as to simulate an intelligent conversation partner. Here we describe a toolkit that supports interactive avatar modeling for human-computer interaction. The toolkit utilizes cloud-based speech-to-text software that provides active listening, a cloud-based AI to generate appropriate textual responses to user queries, and a cloud-based text-to-speech generation engine to generate utterances for this text. This output is combined with a cloud-based 3D avatar animation synchronized to the spoken response. Generated text responses are embedded within an XML structure that allows for tuning the nature of the avatar animation to simulate different emotional states. An expression package controls the avatar's facial expressions. The introduced rendering latency is obscured through parallel processing and an idle loop process that animates the avatar between utterances. The efficiency of the approach is validated through a formal user study.Item Open Access A Role for Hippocampal Sharp-wave Ripples in Active Visual Search(2017-07-27) Leonard, Timothy S.; Hoffman, KariSharp-wave ripples (SWRs) in the hippocampus are thought to contribute to memory formation, though this effect has only been demonstrated in rodents. The SWR, a large deflection in the hippocampal LFP (local field potential), is known to occur primarily during slow wave sleep and during immobility and consummator behaviors. SWRs have widespread effects throughout the cortex, and are directly implicated in memory formation their occurrence correlates with correct performance, and their ablation impairs memory in spatial memory tasks. Though SWRs have been reported in primates, their role is poorly understood. Whether or not SWRs play a role in memory formation, as they do in rodents, has yet to be confirmed. This work encompasses three separate studies with the goal of determining whether there is a link between SWR occurrence and memory formation in the macaque. Chapter 2 establishes the validity of the modified Change Blindness task as a memory task which is sensitive to normal hippocampal function in monkeys. Chapter 3 establishes that SWR events occur during waking (and stationary) activity, during visual search, in the macaque. Until this work, the prevalence of SWRs in macaques during waking exploration was unknown. Chapter 4 shows that gaze during SWRs was more likely to be near the target object on repeated than on novel presentations, even after accounting for overall differences in gaze location with scene repetition. The increase in ripple likelihood near remembered visual objects suggests a link between ripples and memory in primates; specifically, SWRs may reflect part of a mechanism supporting the guidance of search based on experience. The amalgamation of this work reveals several novel findings and establishes an important step towards understanding the role that SWRs play in memory formation in predominantly-visual primate brains.Item Open Access A Solution for Scale Ambiguity in Generative Novel View Synthesis(2025-04-10) Forghani, Fereshteh; Brubaker, MarcusGenerative Novel View Synthesis (GNVS) involves generating plausible unseen views of a scene given an initial view and the relative camera motion between the input and target views using generative models. A key limitation of current generative methods lies in their susceptibility to scale ambiguity, an inherent challenge in multi-view datasets caused by the use of monocular techniques to estimate camera positions from uncalibrated video frames. In this work, we present a novel approach to tackle this scale ambiguity in multi-view GNVS by optimizing the scales as parameters in an end-to-end fashion. We also introduce Sample Flow Consistency (SFC), a novel metric designed to assess scale consistency across samples with the same camera motion. Through various experiments, we demonstrate our approach yields improvements in terms of SFC, providing more consistent and reliable novel view synthesis.Item Open Access A Spherical Visually-Guided Robot(2020-11-13) Dey, Bir Bikram; Jenkin, Michael R.Spherical robots provide a number of advantages over their wheeled counterparts, but they also presents a number of challenges and complexities. Chief among these are issues related to locomotive strategies and sensor placement and processing given the rolling nature of the device. Here we describe Dragon Ball, a visually tele-operated spherical robot. The Dragon Ball utilizes a combination of a geared wheel to move the center of mass of the vehicle coupled with a torque wheel to change direction. Wide angled cameras mounted on the robot's horizontal axis provide a 360 view of the space around the robot and are used to simulate a traditional pan tilt zoom camera mounted on the vehicle for visual tele-operation. The resulting vehicle is well suited for deployment in contaminated environments for which vehicle remediation is a key operational requirement.Item Open Access A Study of Colour Rendering in the In-Camera Imaging Pipeline(2020-05-11) Karaimer, Hakki Can; Brown, Michael S.Consumer cameras such as digital single-lens reflex camera (DSLR) and smartphone cameras have onboard hardware that applies a series of processing steps to transform the initial captured raw sensor image to the final output image that is provided to the user. These processing steps collectively make up the in-camera image processing pipeline. This dissertation aims to study the processing steps related to colour rendering which can be categorized into two stages. The first stage is to convert an image's sensor-specific raw colour space to a device-independent perceptual colour space. The second stage is to further process the image into a display-referred colour space and includes photo-finishing routines to make the image appear visually pleasing to a human. This dissertation makes four contributions towards the study of camera colour rendering. The first contribution is the development of a software-based research platform that closely emulates the in-camera image processing pipeline hardware. This platform allows the examination of the various image states of the captured image as it is processed from the sensor response to the final display output. Our second contribution is to demonstrate the advantage of having access to intermediate image states within the in-camera pipeline that provide more accurate colourimetric consistency among multiple cameras. Our third contribution is to analyze the current colourimetric method used by consumer cameras and to propose a modification that is able to improve its colour accuracy. Our fourth contribution is to describe how to customize a camera imaging pipeline using machine vision cameras to produce high-quality perceptual images for dermatological applications. The dissertation concludes with a summary and future directions.Item Open Access A Unified Multiscale Encoder-Decoder Transformer for Video Segmentation(2024-07-18) Karim, Rezaul; Wildes, Richard P.This dissertation presents an end-to-end trainable and unified multiscale encoder-decoder transformer for dense video estimation, with a focus on segmentation. We investigate this direction by exploring unified multiscale processing throughout the processing pipeline of feature encoding, context encoding and object decoding in an encoder-decoder model. Correspondingly, we present a Multiscale Encoder-Decoder Video Transformer (MED-VT) that uses multiscale representation throughout and employs an optional input beyond video (e.g., audio), when available, for multimodal processing (MED-VT++). Multiscale representation at both encoder and decoder yields three key benefits: (i) implicit extraction of spatiotemporal features at different levels of abstraction for capturing dynamics without reliance on additional preprocessing, such as computing object proposals or optical flow, (ii) temporal consistency at encoding and (iii) coarse-to-fine detection for high-level (e.g., object) semantics to guide precise localization at decoding. Moreover, we explore temporal consistency through a transductive learning scheme that exploits many-to-label propagation across time. To demonstrate the applicability of the approach, we provide empirical evaluation of MED-VT/MEDVT++ on three unimodal video segmentation tasks: (Automatic Video Object Segmentation (AVOS), actor-action segmentation, Video Semantic Segmentation (VSS)) and a multimodal task (Audio Visual Segmentation (AVS)). Results show that the proposed architecture outperforms alternative state-of-the-art approaches on multiple benchmarks using only video (and optional audio) as input, without reliance on additional preprocessing, such as object proposals or optical flow. We also document details of the model’s internal learned representations by presenting a detailed interpretability study, encompassing both quantitative and qualitative analyses.Item Open Access Abnormal Brain Connectivity in the Primary Visual Pathway in Human Albinism(2016-09-20) Grigorian, Anahit; Schneider, Keith A.In albinism, the ipsilateral projection of retinal axons is significantly reduced, and most fibres project contralaterally. The retina and optic chiasm have been proposed as sites for misrouting. The number of lateral geniculate nucleus (LGN) relay neurons has been linked to LGN volume, suggesting a correlation between LGN size and the number of tracts traveling through the optic radiation (OR) to the primary visual cortex (V1). Using diffusion data and both deterministic and probabilistic tractography, we studied differences in OR between albinism and controls. Statistical analyses measured white matter integrity in areas corresponding to the OR, as well as LGN to V1 connectivity. Results revealed reduced white matter integrity and connectivity in the OR region in albinism compared to controls, suggesting altered structural development. Previous reports of smaller LGN and the altered thalamo-cortical connectivity reported here demonstrate the effect of misrouting on structural organization of the visual pathway in albinism.Item Open Access ACT-R Based Models For Learning Interactive Layouts(2015-01-26) Das, Arindam; Stuerzlinger, WolfgangThis dissertation presents research on learning of interactive layouts. I develop two models based on a theory of cognition known as ACT-R (Adaptive Control of Thought–Rational). I validate them against experimental data collected by other researchers. The first model is a simulation model that emulates the transition from novice to expert level in text entry. The model transcribes the presented English letters on a traditional phone keypad. It predicts the non-movement time to copy a pre-cued letter. It explains the visual exploration strategy that a user may employ in the novice to expert continuum. The second model is a closed-form model that accounts for the combined effect of practice, decay, proactive interference and mental effort on task completion time while practicing target acquisition on an interactive layout. The model can quantitatively compare a set of layouts in terms of the mental effort expended to learn them. My first model provides insight into how much practice is needed by a learner to progress from novice to expert level for an interactive layout. My second model provides insight into how effortful is it to learn a layout relative to other layouts.Item Open Access Action Intention Modulates the Activity Pattern in Early Visual Areas(2018-08-27) Velji- Ibrahim, Jena; Crawford, John DouglasThe activity pattern in the early visual cortex (EVC) can be used to predict upcoming actions as it is functionally connected to higher-order motor areas. However, the mechanism by which the EVC enhances action-relevant features is unclear. We explored this using fMRI. Participants performed Align or Open Hand movements to two oriented objects. We localized the calcarine sulcus, corresponding to the periphery, and the occipital pole, corresponding to the fovea. During planning, univariate analysis did not reveal significant results so we used multi-voxel pattern analysis (MVPA) to decode action type and object orientation. Though objects were located in the periphery, we found a significant decoding accuracy for orientation in an action-dependent manner in the occipital pole and action network areas. We established the functional connectivity between the EVC and somatomotor areas during planning using psychophysiological interaction (PPI) analysis. Taken together, our results show object orientation is modulated by action preparation.Item Open Access Active Observers in a 3D World: Human Visual Behaviours for Active Vision(2022-12-14) Solbach, Markus Dieter; Tsotsos, John K.Human-like performance in computational vision systems is yet to be achieved. In fact, human-like visuospatial behaviours are not well understood – a crucial capability for any robotic system whose role is to be a real assistant. This dissertation examines human visual behaviours involved in solving a well-known visual task; The Same-Different Task. It is used as a probe to explore the space of active human observation during visual problem-solving. It asks a simple question: “are two objects the same?”. To study this question, we created a set of novel objects with known complexity to push the boundaries of the human visual system. We wanted to examine these behaviours as opposed to the static, 2D, display-driven experiments done to date. We thus needed to develop a complete infrastructure for an experimental investigation using 3D objects and active, free, human observers. We have built a novel, psychophysical experimental setup that allows for precise and synchronized gaze and head-pose tracking to analyze subjects performing the task. To the best of our knowledge, no other system provides the same characteristics. We have collected detailed, first-of-its-kind data of humans performing a visuospatial task in hundreds of experiments. We present an in-depth analysis of different metrics of humans solving this task, who demonstrated up to 100% accuracy for specific settings and that no trial used less than six fixations. We provide a complexity analysis that reveals human performance in solving this task is about O(n), where n is the size of the object. Furthermore, we discovered that our subjects used many different visuospatial strategies and showed that they are deployed dynamically. Strikingly, no learning effect was observed that affected the accuracy. With this extensive and unique data set, we addressed its computational counterpart. We used reinforcement learning to learn the three-dimensional same-different task and discovered crucial limitations which only were overcome if the task was simplified to the point of trivialization. Lastly, we formalized a set of suggestions to inform the enhancement of existing machine learning methods based on our findings from the human experiments and multiple tests we performed with modern machine learning methods.Item Open Access Active Visual Search: Investigating human strategies and how they compare to computational models(2024-03-16) Wu, Tiffany; Tsotsos, John K.Real world visual search by fully active observers has not been sufficiently investigated. Whilst the visual search paradigm has been widely used, most studies use a 2D, passive observation task, where immobile subjects search through stimuli on a screen. Computational models have similarly been compared to human performance only to the degree of 2D image search. I conduct an active search experiment in a 3D environment, measuring eye and head movements of untethered subjects during search. Results show patterns forming strategies for search, such as repeated search paths within and across subjects. Learning trends were found, but only in target present trials. Foraging models encapsulate subject location-leaving actions, whilst robotics models captured viewpoint selection behaviours. Eye movement models were less applicable to 3D search. The richness of data collected from this experiment opens many avenues of exploration, and the possibility of modelling active visual search in a more human-informed manner.Item Open Access Age Estimation from Children's Faces(2015-08-28) Harrington, Alexandra Elizabeth; Wilkinson, Frances E.In this thesis, we addressed the question of whether or not people could estimate age from children’s faces 7 to 11 years of age. We found that undergraduates were able to make accurate relative age judgments for males and females, even in faces as little as two years apart, and that their performance improved as the age differences between the faces being compared increased. They were also able to make accurate absolute age judgments that increased with increasing face age for both genders. We also looked at estimate bias and while estimates were generally low in bias, the bias was in direction of the mean age of the stimuli. Additionally, we found that there is generally an advantage for male faces presented in frontal view. Finally, we looked at one possible factor influencing age estimates– facial expression. It was unlikely that facial expression was a primary cue informing age estimates.Item Open Access AI-Assisted Pipeline for 3D Face Avatar Generation(2024-11-07) Fadaeinejad, Amin; Troje, NikolausFilling virtual environments with realistic-looking avatars is essential for games, film production, and virtual reality. Creating a fun and engaging experience requires a wide variety of different-looking avatars. There are two main methods to create realistic-looking avatars. One is to scan a real person's face using a light room. The second is for the artist/designer to create the avatar manually using advanced tools. Both of these approaches are expensive in terms of time, computing, and human labour. This thesis leverages generative models like Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) to automate avatar creation. Our pipeline offers control over three aspects: face shape, skin color, and fine details like beards or wrinkles. This provides artists flexibility in avatar creation and can integrate with tools like MOSAR for controlling avatars from 2D images.Item Open Access Altered White Matter Structure in Adults Following Early Monocular Enucleation(2018-03-01) Wong, Nikita Ann; Steeves, JenniferVisual deprivation from early monocular enucleation (the surgical removal of one eye) results in a number of long-term behavioural and morphological adaptations in the visual, auditory, and multisensory systems. This thesis aims to investigate how the loss of one eye early in life affects structural connectivity within the brain. A combination of diffusion tensor imaging and tractography was used to examine structural differences in 18 tracts throughout the brain of adult participants who had undergone early monocular enucleation compared to binocularly intact controls. We report significant structural changes to white matter in early monocular enucleation participants that extend beyond the primary visual pathway to include interhemispheric, auditory and multisensory tracts, as well as several long association fibres. Overall these results suggest that early monocular enucleation has long-term effects on white matter structure throughout the brain.Item Open Access An Evaluation of Saliency and Its Limits(2019-11-22) Wloka, Calden Frank; Tsotsos, John K.The field of computational saliency modelling has its origins in psychophysical studies of visual search and low-level attention, but over the years has heavily shifted focus to performance-based model development and benchmarking. This dissertation examines the current state of saliency research from the perspective of its relationship to human visual attention, and presents research along three different but complementary avenues: a critical examination of the metrics used to measure saliency model performance, a software library intended to facilitate the exploration of saliency model applications outside of standard benchmarks, and a novel model of fixation control that extends fixation prediction beyond a static saliency map to an explicit prediction of an ordered sequence of saccades. The examination of metrics provides a more direct window into algorithm spatial bias than competing methods, as well as presents evidence that spatial considerations cannot be completely isolated from stimulus appearance when accounting for human fixation locations. Experimentation over psychophysical stimuli reveals that many of the most recent models, all which achieve high benchmark performance for fixation prediction, fail to identify salient targets in basic feature search, more complex singleton search, and search asymmetries, suggesting an overemphasis on the specific performance benchmarks that are widely used in saliency modelling research and a need for more diverse evaluation. Further experiments are performed to test how different saliency algorithms predict fixations across space and time, finding a consistent spatiotemporal pattern of saliency prediction across almost all tested algorithms. The fixation control model outperforms competing methods at saccade sequence prediction according to a number of trajectory-based metrics, and produces qualitatively more human-like fixation traces than those sampled from static maps. The results of these studies together suggest that the role of saliency should not be viewed in isolation, but rather as a component of a larger visual attention system, and this work provides a number of tools and techniques that will facilitate further understanding of visual attention.Item Open Access An Exploration of Sex- and Hormone-Related Differences in Cognitive-Motor Performance, Brain Network Integrity, and Recovery Metrics Following Concussion(2021-11-15) Pierias, Alanna; Sergio, Lauren E.Concussion, a form of mild traumatic brain injury, presents itself differently in all who sustain them. This makes diagnosis, recovery tracking, and return to play decisions extremely difficult. Along with general individual differences, sex and sex hormones may have an impact on incidence, symptoms, and time to recovery. The purpose of these dissertation projects was to better understand the general effects of concussion and add to the literature informing clinicians of what to look for following injury. Within this, another focus was to explore the potential impact of aspects related to sex on current metrics for examining concussive injury. Data was collected from varsity level university athletes with and without a previous history of concussion. Data included visuomotor performance on a standard and non-standard reaching task, sport concussion assessment tool measures including symptom scales and balance, resting state functional magnetic resonance imaging, and hormone levels (estradiol, testosterone, and progesterone) via salivary enzyme immunoassays. Results from the three studies add to the literature surrounding both individualized and sex-related differences in outcomes following concussion, as well as the impact of hormones on general visuomotor performance and resting state functional connectivity (rs-FC) in females. Sex-specific differences were noted in visuomotor performance and symptom presentation. Progesterone and testosterone both exhibited positive relationships with visuomotor performance, and progesterone also exhibited a positive relationship with rs-FC in the dorsal attention network and a negative relationship with rs-FC in the salience ventral attention network. Overall, these results provide more detailed insight into the heterogeneous nature of concussion, and support that it is important to consider individuality in all cases of return to play decisions.Item Open Access Analyizing Color Imaging Failure on Consumer Cameras(2022-12-14) Tedla, SaiKiran Kumar; Brown, Michael S.There are currently many efforts to use consumer-grade cameras for home-based health and wellness monitoring. Such applications rely on users to use their personal cameras to capture images for analysis in a home environment. When color is a primary feature for diagnostic algorithms, the camera requires color calibration to ensure accurate color measurements. Given the importance of such diagnostic tests for the users' health and well-being, it is important to understand the conditions in which color calibration may fail. To this end, we analyzed a wide range of camera sensors and environmental lighting to determine (1): how often color calibration failure is likely to occur; and (2) the underlying reasons for failure. Our analysis shows that in well-lit environments, it is rare to encounter a camera sensor and lighting condition combination that results in color imaging failure. Moreover, when color imaging does fail, the cause is almost always attributed to spectral poor environmental lighting and not the camera sensor. We believe this finding is useful for scientists and engineers developing color-based applications with consumer-grade cameras.Item Open Access Analytically Defined Spatiotemporal ConvNets for Spacetime Image Understanding(2020-08-11) Hadji, Isma; Wildes, RichardThis dissertation introduces a novel hierarchical spatiotemporal orientation representation for spacetime image analysis. This representation is designed to combine the benefits of the multilayer architecture of Convolutional Networks (ConvNets) and a more controlled approach to spacetime analysis. A distinguishing aspect of the approach is that unlike most contemporary convolutional networks no learning is involved; rather, all design decisions are specified analytically with theoretical motivations. This approach makes it possible to understand what information is being extracted at each stage and layer of processing as well as to minimize heuristic choices in design. Another key aspect of the network is its recurrent nature, whereby the output of each layer of processing feeds back to the input. The multilayer architecture that results systematically reveals hierarchical image structure in terms of multiscale, multiorientation properties of visual spacetime. To illustrate the utility of the proposed research, the designed networks has been tested on two spacetime image understanding tasks, dynamic texture recognition and video object segmentation. Further, the role of learning in the context of the proposed analytic approach to network design is systematically explored, thereby yielding a promising hybrid architecture. Finally, a new, large scale dynamic texture dataset is introduced and used for evaluation.