GPT Enhanced Physical-based Dataset Generation for Computer Vision Task

Image generated by DALL-E

The rapidly evolving field of computer vision has seen significant advancements in pose estimation and 3D reconstruction. However, the development of robust methods requires datasets that offer a wide range of scenarios, including deformable shapes and varying materials. Currently, there is a gap in the availability of such datasets. This project aims to fill this gap by utilizing cutting-edge techniques to generate a unique dataset that can serve as a benchmark for future research.

 

The primary goal of this project is to develop a comprehensive framework for generating near-photorealistic images that accurately represent deformable objects and changing materials. This framework will be instrumental in testing the limitations of current pose estimation and 3D reconstruction methods, including techniques such as NeRF (Neural Radiance Fields), NeuS (Neural Implicit Surfaces) techniques and 3D Gaussian-Splatting techniques.

 

Tasks and Responsibilities:

  • Mitsuba3 Integration:
  1. Investigate how Mitsuba3 can be used to generate realistic images with changing materials.
  2. Identify additional variables that Mitsuba3 can generate for annotations, comparing these with existing datasets.
  • Taichi Utilization:
  1. Explore Taichi’s capabilities for creating synthetic images with deformable shapes and simulating object collisions and motion.
  2. Assess the potential applications of Taichi in this context.
  • LLM (Large Language Model) Involvement:
  1. Utilize GPT-4 or another suitable LLM to generate content for diverse scenes, thereby overcoming the impracticality of manually creating millions of scenarios.
  • Evaluation and Benchmarking:
  1. Conduct a thorough evaluation of current reconstruction and pose estimation methods using the generated dataset.
  2. Summarize the findings in a comprehensive benchmark report, highlighting strengths, weaknesses, and areas for improvement in these methods.

 

Prerequisites:

  1. Python
  2. Pytorch
  3. Some knowledge of deep-learning, fine-tuning LLM and computational graphics.

 

Contact:

[email protected]

[email protected]

[1] GPT-4 Technical Report, OpenAI, arXiv2023

[2] 3D Gaussian Splatting for Real-Time Radiance Field Rendering, Siggraph 2023

[3] Neuralangelo: High-Fidelity Neural Surface Reconstruction, CVPR2023

[4] BakedSDF: Meshing Neural SDFs for Real-Time View Synthesis, Siggraph2023

[5] Instant Neural Graphics Primitives with a Multiresolution Hash Encoding, Siggraph2022

[6] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects, arXiv2024