The contents of this repository are provided under the Creative Commons Attribution (CC-BY) 4.0 license as laid out in the LICENSE.md file. The R projects in this repository designed for the ...
We introduce D-ORCA, a dialogue-centric omni-modal large language model optimized for robust audio-visual captioning. Unlike existing models that struggle with speaker attribution and temporal ...