Stability AI has yet again come up with a breakthrough model in AI image generation. I am talking about SDXL Turbo!
SDXL Turbo uses the Adversarial Diffusion Distillation approach to achieve state-of-the-art performance. Now you do not have to use multiple inference steps and wait for good-quality image generation, SDXL Turbo can produce a high-quality image in just 1-5 steps.
SDXL Turbo model weights are available at Hugging Face for anyone to try it out.
How Adversarial Diffusion Distillation Technique Speeds Up AI Image Generation?
Adversarial Diffusion Distillation or ADD is an innovative approach that condenses the generation process of diffusion models from hundreds of steps to a mere 1-4 steps without sacrificing the quality of the generated images. The result? High-fidelity images produced in real-time.
Diffusion models have gained popularity for their capacity to create highly detailed and diverse images. However, their computational intensity, due to a lengthy inference process, has hindered their practical application. The new ADD method addresses this challenge head-on by introducing a two-pronged training goal.
- The adversarial loss component of ADD ensures that the AI consistently outputs images that closely resemble genuine photos right from the initial forward pass. This meticulous attention eliminates common issues such as blurriness or distortions that have plagued other distillation methods.
- The distillation loss hinges on a pretrained diffusion model (DM) serving as the ‘teacher.’ By leveraging the rich knowledge base of an already trained DM, the system maintains the sophisticated layering of components, a signature strength found in larger DMs.
The researchers have configured the ADD such that it operates without classifier-free guidance during inference, further cutting down on the computational load. Remarkably, the model still retains the capacity for iterative refinement, an advantage that outstrips previous single-step image generation models like GANs.
To learn more about ADD, refer to this paper.
Run SDXL Turbo in Kaggle Notebook
If you want to learn how to use Kaggle Notebooks to run image generation models, then check out this tutorial.
Let’s open a fresh Kaggle notebook and enable the GPU/Accelerator. We will start off by installing the Diffusers library.
!pip install accelerate
!pip install git+https://github.com/huggingface/diffusers
Now we can import and define the SDXL Turbo model pipeline.
from diffusers import AutoPipelineForText2Image
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo",
# attach pipeline to GPU
Finally, let’s generate our first image with SDXL Turbo. I will use only one inference step.
prompt = "A futuristic space traveler, cinematic and highly detailed."
image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0).images
That was fast! I have never seen any diffusion model generate a good-quality image this fast. SDXL Turbo is exceptional and definitely a huge milestone for text-to-image models.
This is an example of image generation from scratch, where the user passes a prompt to an AI model and gets an image as output. However, we can also use SDXL Turbo to perform image generation based on an input image combined with the text prompt. It is known as Image-to-Image generation.