Abstract: Audio-Visual Question Answering (AVQA) requires complex reasoning across auditory and visual modalities. While recent advancements leverage sophisticated spatio-temporal representations, ...
Abstract: Visual place recognition is a fundamental task essential for applications like visual localization and loop closure detection. Existing methods perform well under controlled environments, ...