1

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians

Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding

Robustness and Generalization via Generative Adversarial Training

Coupling Explicit and Implicit Surface Representations for Generative 3D Modeling

Self-supervised Learning of Point Clouds via Orientation Estimation

Neural Puppet: Generative Layered Cartoon Characters

Differential Privacy has Disparate Impact on Model Accuracy