A Unified Model for Tracking and Image-Video Object Detection

P. Liu, R. Wang, P. Zhang, Omid Poursaeed, Y. Zhou, X. Cao, S. Roy, A. Shah, S. Lim

Abstract

Recent developments in deep learning have pushed the performance of image OD to new heights by learning-based, data-driven approaches. On the other hand, video OD remains less explored, mostly due to much more expensive data annotation needs. At the same time, Multi-Object Tracking (MOT) shares similar spirits with video OD. However, most MOT datasets are class-specific, which constrains a model’s flexibility to perform tracking on other objects. We propose TrIVD (Tracking and Image-Video Detection), the first framework that unifies image OD, video OD, and MOT within one end-to-end model. Experiments demonstrate that TrIVD achieves state-of-the-art performance across all image/video OD and MOT tasks.

Type

Conference proceedings

Publication

ArXiv

Date

January, 2023

Links

PDF ArXiv