Abstract:
The scenario of UAV-based target tracking is complex, and the rapid movement speed of the target leads to significant scale variations in the tracked object. The attention mechanism is a prevalent method for dealing with the influence of complex backgrounds. However, existing attention mechanisms primarily focus on the relationship between the template and the search region, neglecting the interdependencies among features at different levels. To tackle these problems, this paper employs the spatiotemporal attention mechanism, integrating temporal and spatial information, to model the background in drone tracking and resolve the issue of complex backgrounds. Simultaneously, in order to alleviate the high computational complexity and instability of attention mechanisms in feature fusion, Top-k operations are performed before feature fusion to select the most relevant features. In addition, in the feature extraction stage, dual features are extracted as input features for the adaptive anchor box generation module to solve the problem of target scale changes. Finally, by removing baseline redundant weights from the loss function improves training effectiveness and tracking performance. Experimental results demonstrate that the proposed algorithm effectively mitigates the impact of scale variations and external factors on the performance of UAV-based target tracking. It achieves an accuracy of 76.6% on the UAV dataset, with accuracy exceeding 72% across multiple challenging scenarios.