Towards Gradient Equalization and Feature Diversification for Long-Tailed Multi-Label Image Recognition

Abstract

Multi-label image recognition with convolutional neural networks has achieved remarkable progress in the past few years. However, most existing multi-label image recognition methods suffer from the long-tailed data distribution problem, i.e., head categories occupy most training samples, while tailed classes have few samples. This work firstly studies the influence of long-tailed data distribution on existing multi-label image recognition methods. Based on this, two crucial issues of the existing methods are identified: 1) severe gradient imbalance between head and tailed categories, even though re-balancing strategies are adopted; 2) the lack of diversity of tail category training samples. To tackle the first issue, this paper proposes a group sampling strategy to create group-wise balanced data distribution. Meanwhile, a dynamic gradient balancing loss is proposed to equalize the gradient for all categories. To tackle the second issue, this paper proposes a diversity enhancement module to fuse the information across all categories, preventing the network from overfitting tail classes. Furthermore, it also balances the gradient, promoting the discriminability of learned classifiers. Our method significantly outperforms the baseline method and achieves competitive performance with state-of-the-art methods on VOC-LT and COCO-LT datasets. Extensive ablation studies are conducted to verify the effectiveness of the essential proposals.

Publication
In IEEE Transactions on Multimedia (TMM), 2025