In this tutorial, we will use Stable Diffusion, ControlNet and the Diffusers Python library to merge a logo with AI generated images. The code has been implemented on Google Colab, but you can execute it on your system as well if you have access to a decent GPU.
What is ControlNet?
ControlNet is a deep learning model that helps in controlling the image generation of Stable Diffusion models. For example, with the help of ControlNet, we can use a pose of a person and use it generate different types of images with each image having someone in the same pose.
In addition to using pose, we can also use other properties of an image in ControlNet to control image generation, such as the depth of the objects, edges in the image, or segmenting the image into different objects.
In this tutorial, we will use ControlNet with Python in Colab, but you can also use ControlNet with the no-code tool Automatic1111.
Major libraries used in the code
Apart from the usual libraries like numpy, os, PIL, and matplotlib, we will also use the following libraries:
- torch – 2.0.0
- diffusers – 0.19.1
- transformers – 4.30.2
Make sure you use the same versions of these libraries or else the code might break for you.
Import packages and models
Let’s import the required modules and libraries for image generation using Stable Diffusion.
import os import cv2 import torch import numpy as np from PIL import Image import matplotlib.pyplot as plt from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
Create a pipeline with the Stable Diffusion model and the ControlNet model. We will be using the dreamlike-photoreal-2.0 stable diffusion model for this task. You can use any other diffusion model as well.
The Diffusers library allows us to easily import both ControlNet and Stable Diffusion models from Hugging Face.
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16) pipe = StableDiffusionControlNetPipeline.from_pretrained("dreamlike-art/dreamlike-photoreal-2.0", controlnet=controlnet, torch_dtype=torch.float16 ) #use GPU pipe.to("cuda")
Let’s load the logo image. This is the image that we will merge with the images generated by the dreamlike-photoreal-2.0 model.
sample_image = Image.open("step.png") sample_image
Next step is to apply a canny edge detector on this logo.
#convert image from PIL format to numpy array canny_image = np.array(sample_image) low_threshold = 100 high_threshold = 200 canny_image = cv2.Canny(canny_image, low_threshold, high_threshold) plt.imshow(canny_image)
Expand canny image
The default size of the images generated by the sable diffusion model is 512 x 512. It is advisable to have the ControlNet input image with the same dimensions. Therefore, we will use the function below to expand the logo image to the size of 512 x 512.
def expand_image(img): #create a blck canvas of dimensions 512 x 512 background = np.zeros((512, 512), dtype=np.uint8) h, w = img.shape[:2] y, x = 100, 200 # dimensions of the logo-image #ensure the pasted image doesn't go beyond the bounds of the background y_end, x_end = min(y + h, background.shape), min(x + w, background.shape) background[y:y_end, x:x_end] = img[:y_end - y, :x_end - x] return background
#expand the logo-image big_canny = expand_image(canny_image) #convert numpy array to PIL image big_canny = Image.fromarray(big_canny) #display expanded canny image big_canny
As you can see, now we have a bigger image with Canny edges of the logo highlighted.
Generate images using Stable Diffusion
Now we will generate a few images with the help of a prompt.
prompt = "((top view from drone)), (an extremely detailed natural landscape), vegetation, plants, hdr, 4k, ((volumetric lights)), digital painting, beautiful, colorful, serene, intricate, slow shutter speed" negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality" image = pipe( prompt, big_canny, negative_prompt=negative_prompt, guidance_scale=9, num_images_per_prompt=6, num_inference_steps=35).images
Let’s define a function to display the images in a grid format.
def image_grid(imgs, rows, cols): assert len(imgs) == rows * cols w, h = imgs.size grid = Image.new("RGB", size=(cols * w, rows * h)) grid_w, grid_h = grid.size for i, img in enumerate(imgs): grid.paste(img, box=(i % cols * w, i // cols * h)) return grid
#display generated images in 3 x 2 grid image_grid(image,3,2)
As you can see, the logo has naturally merged with these colorful images generated from the dreamlike-photoreal-2.0 model. We can further refine these images by using a better prompt and fine-tuning the hyperparameters like the number of inference steps and the guidance scale.