Abstract: Zero-shot referring expression comprehension aims at localizing bounding boxes in an image corresponding to provided textual prompts, which requires: (i) afine-grained disentanglement of ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results