Abstract: Facial Expression Recognition (FER) is a crucial task in affective computing, but its conventional focus on the seven basic emotions limits its applicability to the complex and expanding ...
Abstract: We present a modular pipeline for summarizing broadcast news videos using large language and vision models, specifically integrating Whisper for ASR, TransNetV2 for shot segmentation, LLaVA ...