The rapid evolution of generative AI has produced tools capable of creating images, music, and short video clips. But a new class of platforms is beginning to tackle something much more ambitious: ...
Abstract: Script Event Prediction aims to predict multiple subsequent events based on a given sequence of past events. While one-step script event prediction methods have achieved relatively high ...
Abstract: Visual affordance grounding aims to segment all possible interaction regions between people and objects from an image/video, which benefits many applications, such as robot grasping and ...