Abstract: Large-scale vision foundation models have made significant progress in visual tasks on natural images, with vision transformers (ViTs) being the primary choice due to their good scalability ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results