Semantic search over gaussian splats

02 / Method

How it was built

Drone frames go through structure-from-motion to solve camera poses, then into a 3D gaussian-splat fit that reconstructs the block as a few hundred thousand oriented, coloured gaussians. A text-promptable segmentation model then masks each captured view for a fixed vocabulary of classes. A multi-view lifting step fuses those 2D masks back onto the underlying gaussians, so each splat ends up tagged with the classes it belongs to.

What you see here is one pre-computed scene, not a live pipeline. The viewer is static: the splat file and the per-class index are shipped to the browser, and the interaction is just a lookup that dims everything outside the selected class.

The source is not public yet; a longer write-up will follow.