Click Icon RelationField

Relate Anything in Radiance Field

Sebastian Koch1,2       Johanna Wald3       Mirco Colosi2       Narunas Vaskevicius2      
Pedro Hermosilla4       Federico Tombari3,5       Timo Ropinski1
1Ulm University 2Bosch Center for AI 3Google 4TU Vienna 5TU Munich


RelationField is the first framework to extract inter-object relationships directly from neural radiance fields.

Abstract

Neural radiance fields are an emerging 3D scene representation and recently even been extended to learn features for scene understanding by distilling open-vocabulary features from vision-language models. However, current method primarily focus on object-centric representations, supporting object segmentation or detection, while understanding semantic relationships between objects remains largely unexplored. To address this gap, we propose RelationField, the first method to extract inter-object relationships directly from neural radiance fields. RelationField represents relationships between objects as pairs of rays within a neural radiance field, effectively extending its formulation to include implicit relationship queries. To teach RelationField complex, open-vocabulary relationships, relationship knowledge is distilled from multi-modal LLMs. To evaluate RelationField, we solve open-vocabulary 3D scene graph generation tasks and relationship-guided instance segmentation, achieving state-of-the-art performance in both tasks.

Method


RelationField learns a 3D feature field (a) that can be queried with a relationship query location (b) which changes the relationship field of the 3D volume depending on what position is selected. The relationship feature is sampled and rendered along a ray according to NeRF’s rendering weights. The language loss maximizes the cosine similarity between the extracted sparse features from the 2D views and the rendered 3D relationship features.
We estimate 2D relationship proposals from a multi-model LLM prompted with SoM (e) for each training view and encode extracted textual relationship description into the image plane (d). A pair pixel sampler samples subject and object pixels (c) for which the relationship feature is distilled into the 3D volume.

Qualitative results


Using RelationField you can perform free form queries which capture various relationships and affordances in different scenes.

BibTeX


  @article{koch2024relationfield,
      title={RelationField: Relate Anything in Radiance Fields},
      author={Koch, Sebastian and Wald, Johanna and Colosi, Mirco and Vaskevicius, Narunas and Hermosilla, Pedro and Tombari, Federico and Ropinski, Timo},
      year={2024},
  }