While recent work on text-driven 3D object generation has shown promising results, state-of-the-art methods typically require multiple GPU hours to produce a single pattern. This is in stark contrast to state-of-the-art generative imaging models, which produce samples in seconds or minutes. In this paper, we explore an alternative method for generating 3D objects that produces 3D models in just 1-2 minutes on a single GPU. Our method first generates one synthetic representation using a text-to-image diffusion model and then produces a 3D point cloud using a second diffusion model that conditions the generated image. Although our method still falls short of the state-of-the-art in terms of sample quality, sampling is one to two orders of magnitude faster, offering a practical trade-off for some use cases. We publish our pre-trained point cloud diffusion models, as well as the evaluation code and models, at this https URL.