Object-Centric Transformer Framework for Fine-Grained Image-Text Retrieval with Global Consistency *
Abstract: Cross-modal image-text retrieval enables efficient heterogeneous modality interaction via vision-language semantic alignment, advancing multimodal intelligence applications. However, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results