3D Gaussian Splatting

2025-01-20 5-minute read

Computer Vision

Machine Learning

Code Section
Paper Section
- 1. Preliminaries
- 2. Paper Logic

Code Section Link to heading

1. Program Link to heading

GitHub Source

2. Issues Link to heading

Conda installation process:

Downloading the CUDA version of PyTorch results in the CPU version.
Reference Link
Undefined symbol: iJIT_NotifyEvent Cause: Likely due to incompatibility between Docker container and conda environment. Error: libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent, Solution: Try running pip install mkl==2024.0

3. Notes Link to heading

Official Link

Reference Link

4. Local Reproduction Link to heading

Create your own dataset:
Requirements: colmap and ffmpeg
Conda run:
If Conda lacks libgl (due to running in Docker container), install libgl:
```
conda install -c conda-forge libgl
```

Visualize with SIBR_viewers

 # Dependencies
 sudo apt install -y libglew-dev libassimp-dev libboost-all-dev libgtk-3-dev libopencv-dev libglfw3-dev libavdevice-dev libavcodec-dev libeigen3-dev libxxf86vm-dev libembree-dev

 # Project setup
 cd SIBR_viewers
 git checkout fossa_compatibility # Only needed for Ubuntu 22.04
 cmake -Bbuild . -DCMAKE_BUILD_TYPE=Release # Add -G Ninja to build faster
 cmake --build build -j24 --target install

 # Run
 ./install/bin/SIBR_gaussianViewer_app -m ../output/water_bottle/

For visualization inside Conda, there may be an x11 error:

[SIBR] ##  ERROR  ##:   FILE /workspace/gaussian-splatting/SIBR_viewers/src/core/graphics/Window.cpp
                        LINE 30, FUNC glfwErrorCallback
                        GLX: Failed to create context: GLXBadFBConfig

Solution: a. On host machine

glxinfo| grep OpenGL #Check OpenGL core profile version string' = 4.6

b. Inside Docker container:

export MESA_GL_VERSION_OVERRIDE=4.6

c. Run again inside Docker:

./install/bin/SIBR_gaussianViewer_app -m ../output/water_bottle/

Reference Link

Local Data Test
Original data video captured on phone:
SIBR：

Gaussian distribution visualization:

Paper Section Link to heading

1. Preliminaries Link to heading

Sparse/ Semi-Dense/ Dense SLAM Sparse SLAM: Traditional visual SLAM, such as vins-mono, tracks feature points in images to estimate the camera’s trajectory. It only provides limited information and cannot fully describe the scene. Semi-Dense SLAM: LiDAR-based SLAM, like LIO-SAM, scans the environment, generating more map data but still not fully describing the scene. Dense SLAM: Generates a mesh, point cloud, or Gaussian splatting (3DGS), providing a detailed global structure. This can assist downstream systems in recognizing obstacles, pedestrians, and objects for better path planning and decision-making in autonomous driving.
Structure-from-Motion sfm Concept: The technique of inferring 3D structure from images taken from multiple perspectives. Using this method, geometric information of a 3D scene can be extracted from a set of 2D images, generating a sparse point cloud.
Process: (1) Input Images: The SFM (Structure from Motion) method typically relies on at least two or more camera views from which image feature points are extracted. (2) Feature Matching: Features are extracted from these images, and image matching techniques (such as SIFT, SURF, etc.) are used to find correspondences of these features across different images. (SIFT and SURF are common feature extraction algorithms, both based on image texture information. They can extract key points from an image, and matching algorithms are used to determine the correspondences between these key points.) (3) Camera Position and Orientation Calculation: The SFM method can also estimate the camera’s motion trajectory (i.e., the position and orientation of the camera), which helps to determine the position of feature points in 3D space. (4) 3D Reconstruction: Using geometric methods like triangulation, the matching feature points from the images are converted into points in 3D space, generating a sparse point cloud.
Sparse Point Cloud Characteristics: (1) Sparsity: Since the SFM method typically relies on identifiable features in the image (such as corners or edges), these feature points do not cover the entire scene. As a result, the number of points in a sparse point cloud is relatively small and scattered throughout the scene. (2) Incompleteness: These points only represent specific parts of the scene, such as important corner points or boundaries. They cannot fully describe the geometric shape of the entire scene. To fill in the gaps and generate a dense point cloud, additional techniques (such as traditional MVS or the popular 3DGS) are typically needed.
colmap Concept: COLMAP is an open-source 3D reconstruction software that combines Structure from Motion (SfM) and Multi-View Stereo (MVS) technologies, offering both graphical and command-line interfaces. It can automatically extract features, match feature points, estimate camera poses, and generate both sparse and dense 3D point clouds from ordered or unordered image sets. COLMAP is widely used in fields such as computer vision, 3D modeling, and augmented reality.
Fast Rasterization Concept: A rendering technique that converts the geometric shapes in a 3D scene (such as lines and polygons) into a 2D pixel grid for display on a screen. It is a step in the graphics rendering process. Fast Rasterization accelerates the process of converting 3D models into 2D image pixels through optimization algorithms.
Volumetric Rendering Concept: A rendering technique that not only considers the surface of the scene but also focuses on the volumetric data within it. By simulating the propagation of light through different mediums (such as smoke, clouds, fog, etc.), volume rendering creates more realistic and detailed visual effects. It is commonly used to optimize 3D data, particularly in rendering translucent objects or scenes with complex light propagation. However, in 3D Gaussian Splatting (3DGS), the application of volume rendering is more focused on optimizing 2D images.
blender Concept: An open-source 3D rendering software. Once the 3D point cloud data is imported into Blender, it can be visualized and rendered. In Blender, techniques such as Volumetric Rendering and Fast Rasterization can be used to optimize the rendering effects.

2. Paper Logic Link to heading

Step	Description	Technology Used
1. Input Images	Collect a set of 2D images from different viewpoints.
2. Feature Point Extraction	Apply feature extraction algorithms like SIFT (Scale-Invariant Feature Transform).	SIFT, SURF, ORB
3. Sparse Point Cloud Generation	Use Structure-from-Motion (SfM) or similar techniques to generate a sparse point cloud and estimate camera poses.	Structure-from-Motion (SfM)
4. Gaussian Point Cloud Representation	Convert the sparse point cloud into 3D Gaussian point cloud, where each point represents a Gaussian distribution with position, opacity, and color information.	3D Gaussian Splatting
5. Optimization	Optimize the Gaussian points by adjusting their positions, opacities, and colors based on new images.	Bundle Adjustment, Optimization Algorithms
6. Rendering	Render the Gaussian point cloud into a 2D image using fast rasterization techniques.	Fast Rasterization, Volumetric Rendering
7. Output	Output the optimized 2D image and the optimized Gaussian point cloud for further applications, such as rendering or localization.