Now Access Google's Gemini Pro Models in Kaggle Notebooks For Free

Good news folks! Google’s Gemini Pro and Gemini Pro Vision models are available on Kaggle for Free. That means you can perform several AI tasks in Kaggle Notebooks for free.

Gemini models are top-of-the-line generative AI models right now from Google. Gemini models have exceptional image, audio, video, and text understanding capabilities making them a great choice for a wide range of use cases.

I will show you how to use Gemini models in a Kaggle Notebook in this article.

Table of Contents

Get Gemini API key

To access Gemini Pro and Gemini Pro Vision models, you need to first get access to a Gemini API key.

To get your API key, visit makersuite.google.com/app/apikey and sign in with your account. Then click on “Create API key in new project” to generate an API key and copy this key somewhere.

Save Gemini API key in Kaggle Secrets

Kaggle provides quite a useful feature in its Notebooks called Secrets.

Secrets allow you to store sensitive information, like API keys, passwords, and other secret information, securely.

So, open a fresh Kaggle Notebook go to the Add-ons option, and select Secrets from the dropdown menu.

Then store your Gemini API key and assign it a label. I am using “GEMINI_API_KEY” label for my API key.

Gemini Pro in Action – Text Generation

Let’s use the Gemini Pro model to generate some text. Before that let’s configure Gemini Pro API.

from PIL import Image
from kaggle_secrets import UserSecretsClient
import google.generativeai as genai

user_secrets = UserSecretsClient()
apiKey = user_secrets.get_secret("GEMINI_API_KEY")

genai.configure(api_key = apiKey)

Now we can use the Gemini Pro large language model to perform text generation based on a prompt.

model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("How to make Fresh Lime Soda?")

# print LLM response
response.text

Output:

'Ingredients:

- 1/2 cup fresh lime juice (from about 4 limes)

- 1/2 cup sugar\n- 2 cups sparkling water

- Lime slices, for garnish

Instructions:

1. In a small saucepan, combine the lime juice and sugar. Bring to a simmer over medium heat, stirring occasionally to dissolve the sugar.

2. Remove from heat and let cool for 5 minutes.

3. Pour the lime juice mixture into a large pitcher or serving container.

4. Add the sparkling water and stir to combine.

5. Garnish with lime slices and serve immediately.

Tips:

- For a sweeter soda, add more sugar to taste.

- For a more tart soda, add more lime juice.

- You can also add other flavors to your soda, such as mint, ginger, or cucumber.

- To make a fizzy lime soda, use sparkling water that has not been opened.

- Serve your soda immediately for the best flavor.'

You can see that the response of Gemini Pro LLM is of high quality and quite detailed just like OpenAI models.

Gemini Pro Vision in Action – Image Understanding

With the same API, we can also access the Gemini Pro Vision model. I will show you an interesting use case where we can use this model for visual question-answering.

I have an image of a restaurant bill that I will pass to the Gemini Pro Vision model and ask a few questions.

Let’s load the image.

# load image
bill_image = Image.open("sample_bill.JPG")

# display image
bill_image

Load visual language model (VLM).

model = genai.GenerativeModel('gemini-pro-vision')

Now I will use this vision model to fetch two pieces of information –

(a) I want it to list all the items in the image along with their prices only.

(b) Find the VAT amount in the bill image.

instructions = """List down all the items and the total amount for each item. 
                  Ignore Qty and Rate.
                  What is the VAT amount?"""

response = model.generate_content([instructions, bill_image])

print(response.text)

Output:
 - Deviled Crab - 296.00
- Fresh Lime Soda - 130.00
- Fish Diana - 378.00
- Surf & Turf - 493.00
- P. A. Souffle - 133.00

**VAT amount** = 207.35

Awesome! This is what true democratization of AI looks like.

If you want to take this to the next level, try using Gemini Pro models to perform question-answering on our custom data using RAG technique.

Now Access Google’s Gemini Pro Models in Kaggle Notebooks For Free

Get Gemini API key

Save Gemini API key in Kaggle Secrets

Gemini Pro in Action – Text Generation

Gemini Pro Vision in Action – Image Understanding

Prateek

WhisperSpeech – New Text-To-Speech Model In Town

New Vision LLM: Large Language and Vision Assistant (LLaVA)

Using OpenAI GPT-4 Function Calling for NER

1 Comment

Leave a Reply Cancel reply

Get Gemini API key

Save Gemini API key in Kaggle Secrets

Gemini Pro in Action – Text Generation

Gemini Pro Vision in Action – Image Understanding

Prateek

Related Posts

1 Comment

Leave a Reply Cancel reply