Selected Publications
See the full list at google scholar
*: corresponding author.
Motion-Guided Visual Tracking.
Pengyu Zhang, Simiao Lai, Dong Wang, Huchuan Lu.
Machine Intelligence Research (MIR), to appear.
In this study, we attempt to exploit the motion cue to guide visual trackers without bells and whistles. First, we decouple motion into two types: camera motion and object motion. Then, we predict them individually via the proposed camera motion modeling and object trajectory prediction. Each module contains a motion detector and a verifier. As for camera motion, we apply the off-the-shelf keypoint matching method to detect the camera movement and propose a novel self-supervised camera motion verifier to validate its confidence. Given the previous object trajectory, object trajectory prediction aims to predict the future location of the target and selects a reliable trajectory to handle fast object motion and occlusion. Numerous experiments on several mainstream tracking datasets, including OTB100, DTB70, TC128, UAV123, VOT2018 and GOT10k, demonstrate the effectiveness and generalization ability of our module, with real-time speed.
To provide a thorough review of multi-modal tracking, different aspects of multi-modal tracking algorithms are summarized under a unified taxonomy, with specific focus on visible-depth (RGB-D) and visible-thermal (RGB-T) tracking. Subsequently, a detailed description of the related benchmarks and challenges is provided. Extensive experiments were conducted to analyze the effectiveness of trackers on five datasets: PTB, VOT19-RGBD, GTOT, RGBT234, and VOT19-RGBT. Finally, various future directions, including model design and dataset construction, are discussed from different perspectives for further research.
In this work, we propose a novel offline-trained Meta-Updater to address an important but unsolved problem: Is the tracker ready for updating in the current frame? The proposed module can effectively integrate geometric, discriminative, and appearance cues in a sequential manner, and then mine the sequential information with a designed cascaded LSTM module. Moreover, we strengthen the effect of appearance information on the module, i.e., the additional local outlier factor is introduced to integrate into a newly designed network. We integrate our meta-updater into eight different types of online update trackers. Extensive experiments on four long-term and two short-term tracking benchmarks demonstrate that our meta-updater is effective and has strong generalization ability.
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking(VTUAV), including 500 sequences with 1.7 million high-resolution frame pairs. In addition, comprehensive applications (short-term tracking, long-term tracking and segmentation mask prediction) with diverse categories and scenes are considered for exhaustive evaluation. Moreover, we provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specifc trackers. In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels. Numerous experiments on several datasets are conducted to reveal the effectiveness of HMFT and the complement of different fusion types.
A Unified Approach for Tracking UAVs in Infrared.Jinjian Zhao, Xiaohan Zhang, Pengyu Zhang*.International Conference on Computer Vision Workshop(ICCVW), 2021.
[Paper]
We design a unified framework, including a local tracker, camera motion estimation module, bounding box refinement module, re-detection module and model updater. The camera motion estimation module achieves motion compensation for the local tracker. Then, the bounding box refinement module aims to measure an accurate bounding box. If the target is missing, we switch to the re-detection module to re-localize the target when it reappears. We also adopt a model updater to control the updating process and filter out unreliable samples. Numerous experimental results on 9 visual/thermal datasets show the effectiveness and generalization of our framework.
In this work, we observe that the implicit attribute information can boost the model discriminability, and propose a novel attribute-driven representation network to improve the RGB-T tracking performance. First, according to appearance change in RGB-T tracking scenarios, we divide the major and special challenges into four typical attributes and design an attribute-driven residual branch to mine the attribute-specific property. Furthermore, we aggregate these representations in channel and pixel levels by using the proposed attribute ensemble network (AENet) to adaptively fit the attribute-agnostic tracking process. We conduct numerous experiments on three RGB-T tracking benchmarks to compare the proposed trackers with other state-of-the-art methods. Experimental results show that our tracker achieves very competitive results with a real-time tracking speed.
In this study, we propose a novel RGB-T tracking framework by jointly modeling both appearance and motion cues. First, to obtain a robust appearance model, we develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities. The fusion weights are determined by using offline-trained global and local multimodal fusion networks, and then adopted to linearly combine the response maps of RGB and T modalities. Second, when the appearance cue is unreliable, we comprehensively take motion cues, i.e., target and camera motions, into account to make the tracker robust. We further propose a tracker switcher to switch the appearance and motion trackers flexibly. JMMAC obtained the 1st place on the public set of VOT2019-RGBT and VOT2020-RGBT.
In this paper, we propose a fast and object-adaptive spatial regularization (FOSR) model to alleviate those drawbacks. By introducing FOSR method, more discriminative filters can be efficiently obtained by jointly learning in spatial and frequency domain. Besides, an object-adaptive SR map that contains object information can be offline and online learned within a data-driven manner. Extensive experiments on two benchmarks, OTB-2015 and VOT-2016, validate the effectiveness and generality of our model in helping state-of-the-art SR based trackers to achieve more than 5 times of speedup and a relative gain of 3.7% and 3.3% in success and precision plots on OTB-2015, respectively. Additionally, FOSR can help pure CF based trackers to remarkably improve their accuracy with comparable speed.