Comparative overview of two 3DVG approaches. (a) Supervised 3DVG involves input from 3D scans combined with text queries, guided by object-text pair annotations, (b) Zero-shot 3DVG identifies the ...
Abstract: End-to-end Visual Question Answering (VQA) models take both an image and a question as input and directly provide an answer to the question as output. Recently, visual program synthesis has ...
Abstract: Visual program synthesis is a promising approach to ex-ploit the reasoning abilities of large language models for compositional computer vision tasks. Previous work has used few-shot ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results