SDXL Turbo – Express Text-To-Image Generation Model Released

Stability AI has yet again come up with a breakthrough model in AI image generation. I am talking about SDXL Turbo!

SDXL Turbo uses the Adversarial Diffusion Distillation approach to achieve state-of-the-art performance. Now you do not have to use multiple inference steps and wait for good-quality image generation, SDXL Turbo can produce a high-quality image in just 1-5 steps.

SDXL Turbo model weights are available at Hugging Face for anyone to try it out.

How Adversarial Diffusion Distillation Technique Speeds Up AI Image Generation?

Adversarial Diffusion Distillation or ADD is an innovative approach that condenses the generation process of diffusion models from hundreds of steps to a mere 1-4 steps without sacrificing the quality of the generated images. The result? High-fidelity images produced in real-time.

Diffusion models have gained popularity for their capacity to create highly detailed and diverse images. However, their computational intensity, due to a lengthy inference process, has hindered their practical application. The new ADD method addresses this challenge head-on by introducing a two-pronged training goal.

  1. The adversarial loss component of ADD ensures that the AI consistently outputs images that closely resemble genuine photos right from the initial forward pass. This meticulous attention eliminates common issues such as blurriness or distortions that have plagued other distillation methods.
  2. The distillation loss hinges on a pretrained diffusion model (DM) serving as the ‘teacher.’ By leveraging the rich knowledge base of an already trained DM, the system maintains the sophisticated layering of components, a signature strength found in larger DMs.

The researchers have configured the ADD such that it operates without classifier-free guidance during inference, further cutting down on the computational load. Remarkably, the model still retains the capacity for iterative refinement, an advantage that outstrips previous single-step image generation models like GANs.

To learn more about ADD, refer to this paper.

Run SDXL Turbo in Kaggle Notebook

I will run SDXL Turbo in a Kaggle Notebook because it provides a generous free GPU usage quota. You can use other GPU providers as well.

Let’s open a fresh Kaggle notebook and enable the GPU/Accelerator. We will start off by installing the Accelerate and Diffusers libraries.

!pip install accelerate==0.28.0
!pip install diffusers==0.27.2

Now we can import and define the SDXL Turbo model pipeline.

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", 
                                                 torch_dtype=torch.float16, 
                                                 variant="fp16")

# move pipeline to GPU
pipe.to("cuda")

If you want you can even change the sampler of the diffusion pipeline. For example, to use the DPM++ 2M Karras sampler follow the code snippet below.

# use DPM++ 2M Karras sampler with the diffusion pipeline
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

Finally, let’s generate our first image with SDXL Turbo. I will use only one inference step.

prompt = "A futuristic space traveler, cinematic and highly detailed."

# specify image generation parameters
image = pipe(prompt=prompt, 
             num_inference_steps=1, 
             guidance_scale=0, 
             width=512, 
             height=768).images[0]

# display image
image
SDXL Turbo Diffusers

That was fast! I have never seen any diffusion model generate a good-quality image this fast. SDXL Turbo is exceptional and definitely a huge milestone for text-to-image models.

This is an example of image generation from scratch, where the user passes a prompt to an AI model and gets an image as output. We can also use SDXL Turbo to generate image based on an input image along with the text prompt. It is known as Image-to-Image generation.

Leave a Reply

Your email address will not be published. Required fields are marked *