Evaluating Temporal Queries over Videos

Loading...
Thumbnail Image

Date

2023-12-08

Authors

Chen, Yueting

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Videos have been an important part of people's daily lives and are continuously growing in terms of volume, size, and variety of content. Recent advances in Computer Vision (CV) algorithms have improved accuracy and efficiency, making video annotations possible with high accuracy. In this work, we follow a general framework to first obtain annotations utilizing state-of-the-art CV algorithms, and then consider three research problems on evaluating temporal queries with such annotations.

Specifically, we first investigate the temporal queries that consider only co-occurrence relationships between objects on video feeds, where we take the first step and define such queries in a way that they incorporate certain physical aspects of video capture such as object occlusion. We propose two techniques, Marked Frame Set (MFS) and Sparse State Graph (SSG), to organize all detected objects in the intermediate data generation layer, which effectively, given the queries, minimizes the number of objects and frames that have to be considered during query evaluation.

Then, we consider the query with a ranking mechanism that aims to retrieve clips from large video repositories in which objects co-occur in a query-specified fashion. We propose a two-phased approach, where we build indexes during the Ingestion Phase, and then answer queries during the Query Phase using the Partition-Based Query Processing (PBQP) algorithm, which efficiently produces the desired (query-specified) number of results with the highest scores.

Finally, we further consider both spatial and temporal information with graph representations and define the problem of Spatial and Temporal Constrained Ranked Retrieval (STAR Retrieval) over videos. Based on the graph representation, we propose a two-phase approach, consisting of the ingestion phase, where we construct and materialize the Graph Index (GI), and the query phase, where we compute the top-ranked windows (video clips) according to the window matching score efficiently. We propose two algorithms to perform Spatial Matching (SMA) and Temporal Matching (TM) separately with an early-stopping mechanism.

We present the details of the above three research problems and our proposed methods. Via experiments conducted on various datasets, we show the effectiveness of our proposed methods.

Description

Keywords

Computer science

Citation

Collections