Eight Papers Accepted by ICCV 2025
- SMSTracker: Tri-path score mask sigma fusion for multi-modal tracking.
- Spatial preference rewarding for MLLMs spatial understanding.
- Label-efficient generalizable depth completion with projection ambiguity and consistency.
- PCR-GS: Colmap-free 3D Gaussian splatting via pose co-regularizations.
- Face retouching with diffusion data generation and spectral restorement.
- Versatile transition generation with image-to-video diffusion.
- TimeExpert: An expert-guided video LLM for video temporal grounding.
- R1-VL: Learning to reason with multimodal large language models via step-wise group relative policy optimization.