VAN: Versatile Affinity Network for End-to-end Online Multi-Object Tracking

Hyemin Lee (POSTECH)*, Inhan Kim (POSTECH), Daijin Kim (Pohang University of Science and Technology)

Keywords: Motion and Tracking

Abstract: In recent years, tracking-by-detection has become the most popular multi-object tracking (MOT) method, and deep convolutional neural networks (CNNs)-based appearance features have been successfully applied to enhance the performance of candidate association. Several MOT methods adopt single-object tracking (SOT) and handcrafted rules to deal with incomplete detection, resulting in numerous false positives (FPs) and false negatives (FNs). However, a separately trained SOT network is not directly adaptable because domains can differ, and handcrafted rules contain a considerable number of hyperparameters, thus making it difficult to optimize the MOT method. To address this issue, we propose a versatile affinity network (VAN) that can perform the entire MOT process in a single network including target specific SOT to handle incomplete detection issues, affinity computation between target and candidates, and decision of tracking termination. We train the VAN in an end-to-end manner by using event-aware learning that is designed to reduce the potential error caused by FNs, FPs, and identity switching. The proposed VAN significantly reduces the number of hyperparameters and handcrafted rules required for the MOT framework and successfully improves the MOT performance. We implement the VAN using two baselines with different candidate refinement methods to demonstrate the effects of the proposed VAN. We also conduct extensive experiments including ablation studies on three public benchmark datasets: 2D MOT2015, MOT2016, and MOT2017. The results indicate that the proposed method successfully improves the object tracking performance compared with that of baseline methods, and outperforms recent state-of-the-art MOT methods in terms of several tracking metrics including MOT accuracy (MOTA), identity F1 score (IDF1), percentage of mostly tracked targets (MT), and FP.

SlidesLive

Similar Papers

Gaussian Vector: An Efficient Solution for Facial Landmark Detection
Yilin Xiong (Central South University)*, Zijian Zhou (Horizon), yuhao dou (Horizon), ZHIZHONG SU (Horizon Robotics)
Attention-Aware Feature Aggregation for Real-time Stereo Matching on Edge Devices
Jia-Ren Chang (National Chiao Tung University, aetherAI), Pei-Chun Chang (National Chiao Tung University), Yong-Sheng Chen (National Chiao Tung University)*