NAVE
Networked Augmented Virtual Environment (NAVE) Group
Publication:Tong Chen, Shengjia Liang, Yuan Xiong, Qiang Zhou, Qichuan Geng and Zhong Zhou. Enhancing open-vocabulary scene understanding via push–pull alignment in gaussian splatting[J]. The Visual Computer, 2026, 42(1): 38. (CCF rank C Journal) pdf
 
      

Open-vocabulary scene understanding based on 3D Gaussian Splatting (3DGS) has shown promising potential for applications such as embodied agents and object localization. By integrating open-vocabulary embeddings into spatial 3D gaussians, these models enable a more comprehensive understanding of scenes. However, existing methods often suffer from misalignment due to the gap between RGB and language modalities, leading to incorrect interpretations of similar-looking objects. To address this issue, we propose a cross-modal integration approach that aligns multiple representations through spatial gaussian positioning. We introduce Push-Pull alignment in Gaussian Splatting(PPGS), a novel bimodal framework that bridges RGB and language modalities through cohesive representation fields. Leveraging the illumination-invariant properties of language embeddings, we design the bridge module, which uses the geometrically-grounded positions for the gaussians as a direct bridge between the two modalities. This module significantly enhances cross-modal alignment, improves high-fidelity rendering, and ensures accurate language feature embeddings. Furthermore, our framework dynamically adjusts gradients based on the distinct optimization requirements of RGB and language during joint learning, ensuring stable and efficient convergence. Comprehensive experiments demonstrate that PPGS achieves superior language query accuracy and enhanced visual quality compared to existing language-embedded representations, with Intersection over Union (mIoU) increasing by 6% and Peak Signal-to-Noise Ratio (PSNR) showing gains over mainstream methods, all within only 50% of the training time.

create by admin at 2026-03-27 13:36:22