Although recent work on text-conditional 3D object generation has shown promising results, state-of-the-art methods typically require several GPU hours to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in seconds or minutes. In this article, we explore an alternative 3D object generation method that produces 3D models in just 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-image diffusion model, then produces a 3D point cloud using a second diffusion model that conditions the generated image. Although our method still falls short of the state of the art in terms of sample quality, it is one to two orders of magnitude faster to sample, providing a practical compromise for certain use cases. We publish our pre-trained point cloud diffusion models, along with our code and evaluation models, at this url https.
A system for generating 3D point clouds from complex prompts
![](https://definewsnetwork.com/wp-content/uploads/2024/04/point-e-a-system-for-generating-3d-point-clouds-from-complex-prompts-860x860.png)
Leave a comment