Label assignment refers to determining positive/negative labels foreach sample to supervise the training process. Existing Siamese-based trackersprimarily use fixed label assignment strategies according to human prior...Label assignment refers to determining positive/negative labels foreach sample to supervise the training process. Existing Siamese-based trackersprimarily use fixed label assignment strategies according to human priorknowledge;thus, they can be sensitive to predefined hyperparameters and failto fit the spatial and scale variations of samples. In this study, we first developa novel dynamic label assignment (DLA) module to handle the diverse datadistributions and adaptively distinguish the foreground from the backgroundbased on the statistical characteristics of the target in visual object tracking.The core of DLA module is a two-step selection mechanism. The first stepselects candidate samples according to the Euclidean distance between trainingsamples and ground truth, and the second step selects positive/negativesamples based on the mean and standard deviation of candidate samples.The proposed approach is general-purpose and can be easily integrated intoanchor-based and anchor-free trackers for optimal sample-label matching.According to extensive experimental findings, Siamese-based trackers withDLA modules can refine target locations and outperformbaseline trackers onOTB100, VOT2019, UAV123 and LaSOT. Particularly, DLA-SiamRPN++improves SiamRPN++ by 1% AUC and DLA-SiamCAR improves Siam-CAR by 2.5% AUC on OTB100. Furthermore, hyper-parameters analysisexperiments show that DLA module hardly increases spatio-temporal complexity,the proposed approach maintains the same speed as the originaltracker without additional overhead.展开更多
Small-object detection has long been a challenge.High-megapixel cameras are used to solve this problem in industries.However,current detectors are inefficient for high-resolution images.In this work,we propose a new m...Small-object detection has long been a challenge.High-megapixel cameras are used to solve this problem in industries.However,current detectors are inefficient for high-resolution images.In this work,we propose a new module called Pre-Locate Net,which is a plug-and-play structure that can be combined with most popular detectors.We inspire the use of classification ideas to obtain candidate regions in images,greatly reducing the amount of calculation,and thus achieving rapid detection in high-resolution images.Pre-Locate Net mainly includes two parts,candidate region classification and behavior classification.Candidate region classification is used to obtain a candidate region,and behavior classification is used to estimate the scale of an object.Different follow-up processing is adopted according to different scales to balance the variance of the network input.Different from the popular candidate region generation method,we abandon the idea of regression of a bounding box and adopt the concept of classification,so as to realize the prediction of a candidate region in the shallow network.We build a high-resolution dataset of aircraft and landing gears covering complex scenes to verify the effectiveness of our method.Compared to state-of-the-art detectors(e.g.,Guided Anchoring,Libra-RCNN,and FASF),our method achieves the best m AP of 94.5 on 1920×1080 images at 16.7 FPS.展开更多
基金support of the National Natural Science Foundation of China (Grant No.52127809,author Z.W,http://www.nsfc.gov.cn/No.51625501,author Z.W,http://www.nsfc.gov.cn/)is greatly appreciated.
文摘Label assignment refers to determining positive/negative labels foreach sample to supervise the training process. Existing Siamese-based trackersprimarily use fixed label assignment strategies according to human priorknowledge;thus, they can be sensitive to predefined hyperparameters and failto fit the spatial and scale variations of samples. In this study, we first developa novel dynamic label assignment (DLA) module to handle the diverse datadistributions and adaptively distinguish the foreground from the backgroundbased on the statistical characteristics of the target in visual object tracking.The core of DLA module is a two-step selection mechanism. The first stepselects candidate samples according to the Euclidean distance between trainingsamples and ground truth, and the second step selects positive/negativesamples based on the mean and standard deviation of candidate samples.The proposed approach is general-purpose and can be easily integrated intoanchor-based and anchor-free trackers for optimal sample-label matching.According to extensive experimental findings, Siamese-based trackers withDLA modules can refine target locations and outperformbaseline trackers onOTB100, VOT2019, UAV123 and LaSOT. Particularly, DLA-SiamRPN++improves SiamRPN++ by 1% AUC and DLA-SiamCAR improves Siam-CAR by 2.5% AUC on OTB100. Furthermore, hyper-parameters analysisexperiments show that DLA module hardly increases spatio-temporal complexity,the proposed approach maintains the same speed as the originaltracker without additional overhead.
基金the National Science Fund for Distinguished Young Scholars of China (No. 51625501)the Aeronautical Science Foundation of China (No. 201946051002)
文摘Small-object detection has long been a challenge.High-megapixel cameras are used to solve this problem in industries.However,current detectors are inefficient for high-resolution images.In this work,we propose a new module called Pre-Locate Net,which is a plug-and-play structure that can be combined with most popular detectors.We inspire the use of classification ideas to obtain candidate regions in images,greatly reducing the amount of calculation,and thus achieving rapid detection in high-resolution images.Pre-Locate Net mainly includes two parts,candidate region classification and behavior classification.Candidate region classification is used to obtain a candidate region,and behavior classification is used to estimate the scale of an object.Different follow-up processing is adopted according to different scales to balance the variance of the network input.Different from the popular candidate region generation method,we abandon the idea of regression of a bounding box and adopt the concept of classification,so as to realize the prediction of a candidate region in the shallow network.We build a high-resolution dataset of aircraft and landing gears covering complex scenes to verify the effectiveness of our method.Compared to state-of-the-art detectors(e.g.,Guided Anchoring,Libra-RCNN,and FASF),our method achieves the best m AP of 94.5 on 1920×1080 images at 16.7 FPS.