Merge a Logo with an Image using ControlNet

In this tutorial, we will use Stable Diffusion, ControlNet and the Diffusers Python library to merge a logo with AI generated images. The code has been implemented on Google Colab, but you can execute it on your system as well if you have access to a decent GPU.

What is ControlNet?

ControlNet is a deep learning model that helps in controlling the image generation of Stable Diffusion models. For example, with the help of ControlNet, we can use a pose of a person and use it generate different types of images with each image having someone in the same pose.

controlnet canny python
Source: huggingface.co/blog/controlnet

In addition to using pose, we can also use other properties of an image in ControlNet to control image generation, such as the depth of the objects, edges in the image, or segmenting the image into different objects.

In this tutorial, we will use ControlNet with Python in Colab, but you can also use ControlNet with the no-code tool Automatic1111.

Major libraries used in the code

Apart from the usual libraries like numpy, os, PIL, and matplotlib, we will also use the following libraries:

  • torch – 2.0.0
  • diffusers – 0.19.1
  • transformers – 4.30.2

Make sure you use the same versions of these libraries or else the code might break for you.

Import packages and models

Let’s import the required modules and libraries for image generation using Stable Diffusion.

import os
import cv2
import torch
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

Create a pipeline with the Stable Diffusion model and the ControlNet model. We will be using the dreamlike-photoreal-2.0 stable diffusion model for this task. You can use any other diffusion model as well.

The Diffusers library allows us to easily import both ControlNet and Stable Diffusion models from Hugging Face.

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny",  torch_dtype=torch.float16)

pipe = StableDiffusionControlNetPipeline.from_pretrained("dreamlike-art/dreamlike-photoreal-2.0", 
    controlnet=controlnet, 
    torch_dtype=torch.float16
)

#use GPU
pipe.to("cuda")

Preprocess Image

Let’s load the logo image. This is the image that we will merge with the images generated by the dreamlike-photoreal-2.0 model.

sample_image = Image.open("step.png")
sample_image

Next step is to apply a canny edge detector on this logo.

#convert image from PIL format to numpy array
canny_image = np.array(sample_image)

low_threshold = 100
high_threshold = 200

canny_image = cv2.Canny(canny_image, low_threshold, high_threshold)

plt.imshow(canny_image)

Expand canny image

The default size of the images generated by the sable diffusion model is 512 x 512. It is advisable to have the ControlNet input image with the same dimensions. Therefore, we will use the function below to expand the logo image to the size of 512 x 512.

def expand_image(img):
    #create a blck canvas of dimensions 512 x 512
    background = np.zeros((512, 512), dtype=np.uint8)
    
    h, w = img.shape[:2]
    y, x = 100, 200 # dimensions of the logo-image

    #ensure the pasted image doesn't go beyond the bounds of the background
    y_end, x_end = min(y + h, background.shape[0]), min(x + w, background.shape[1])

    background[y:y_end, x:x_end] = img[:y_end - y, :x_end - x]

    return background
#expand the logo-image
big_canny = expand_image(canny_image)

#convert numpy array to PIL image
big_canny = Image.fromarray(big_canny)

#display expanded canny image
big_canny

As you can see, now we have a bigger image with Canny edges of the logo highlighted.

Generate images using Stable Diffusion

Now we will generate a few images with the help of a prompt.

prompt = "((top view from drone)), (an extremely detailed natural landscape), vegetation, plants, hdr, 4k, ((volumetric lights)), digital painting, beautiful, colorful, serene, intricate, slow shutter speed"

negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"

image = pipe(
    prompt,
    big_canny,
    negative_prompt=negative_prompt,
    guidance_scale=9,
    num_images_per_prompt=6,
    num_inference_steps=35).images

Let’s define a function to display the images in a grid format.

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows * cols

    w, h = imgs[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid
#display generated images in 3 x 2 grid
image_grid(image,3,2)
controlnet using python

As you can see, the logo has naturally merged with these colorful images generated from the dreamlike-photoreal-2.0 model. We can further refine these images by using a better prompt and fine-tuning the hyperparameters like the number of inference steps and the guidance scale.

Leave a Reply

Your email address will not be published. Required fields are marked *