Abstract: Visual Language Models (VLMs) have swiftly accelerated the blending of the visual modality with textual information, enabling more natural and contextually aware human–AI interaction. This ...