Publication Detail

The Text-to-Image Person Re-identification (TI-ReID) task objective is to precisely identify the person’s images with the textual description of the person. The mainstream research methods focus on cross-modal aligning local features, and overlook the learning of intra-modal and cross-modal relationships between different features. This renders the person features lacking in high-level semantic information. To resolve such issues, we propose the Progressive Relationship-Mining Graph Network (RMGNet), including the Intra-Modal Relationship-Mining (IMRM) and the Cross-Modal Relationship-Mining (CMRM) module. These modules are employed to model and mine semantic relationship information among different features. Specifically, the IMRM module models and mines the high-level semantic interrelationships inherent in the image and text features. The CMRM module introduces the nearest neighbor method to model cross-modal semantic relationships to enhance the cross-modal semantic correspondence capabilities of person features. On this basis, we design the Adaptive Corner Center (Acc) loss and the Coarse-to-Fine Learning (C2FL) strategy. These ensure the network receives consistent and effective metric learning supervision throughout the entirety of the training process. To validate the efficacy of the proposed method, extensive experiments are conducted on three prevalent datasets: CHUK-PEDES, ICFC-PEDES, and RSTPReid. The achieved mAP of 70.59%, 41.62%, and 49.58% surpassed those current state-of-the-art methods.