Abstract: Open-vocabulary 3D scene understanding presents a significant challenge in the field. Recent works have sought to transfer knowledge embedded in vision-language models from 2D to 3D domains.