Abstract: Multi-label image classification, which involves recognizing multiple objects within a single image, is a fundamental task in computer vision. Recently, Visual-Language Models (VLMs) have ...
We are excited to release the CapRL 2.0 series: CapRL-Qwen3VL-2B and CapRL-Qwen3VL-4B. These models feature fewer parameters while delivering even more powerful captioning performance. Notably, ...
Stage-1 Generation: The code in this stage is mainly built on the PyTorch framework. Specifically, it requires PyTorch version 1.10.0 or later, along with the ...
Abstract: Image steganography conceals secret data within a cover image to generate a new image (stego image) in a manner that makes the secret data undetectable. The main problem in image ...