Abstract: We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results