Automatically Generate MCQs using ChatGPT and LangChain

Introduction

Creating multiple choice questions (MCQs) plays a significant role in education and assessments due to their numerous advantages and widespread use. MCQs provide an objective means of evaluating a learner’s knowledge and understanding of a subject. Since there is only one correct answer, grading is consistent and less prone to subjective biases, ensuring a fair assessment of all students.

Moreover, multiple choice questions are easy to grade, especially with the help of technology. Automated grading systems can quickly and accurately score MCQs, reducing the workload for educators and providing immediate feedback to students.

However, manually creating quality quiz questions is a time-consuming and labor-intensive process. Generative AI, such as ChatGPT or GPT-4, can significantly reduce the time and effort required to generate a large number of questions for quiz or exams, allowing educators to focus on other aspects of teaching and learning.

So, if we are able to generate MCQs using AI then it will benefit the educators, students, and the overall educational process. In this article, we will use ChatGPT API and LangChain to generate multiple choice questions from text data.

Prerequisites

Before diving into the process of generating multiple choice questions using ChatGPT and LangChain, it’s essential to have a basic understanding of the following prerequisites:

  1. Python Programming Knowledge: Familiarity with Python programming is necessary, as the code examples and implementation will be in Python. You should be comfortable with basic programming concepts, such as variables, loops, functions, and libraries.
  2. Natural Language Processing (NLP): A basic understanding of NLP concepts and techniques will be helpful, as the project involves working with text data and language models. Familiarity with tokenization, text preprocessing, and regular expressions will be beneficial.
  3. Access to OpenAI API: To generate multiple choice questions using GPT, you’ll need access to a ChatGPT model’s API by OpenAI. This may require obtaining an API key and setting up authentication to use the model in your Python code.

What is LangChain?

LangChain is a framework for developing applications powered by language models. We will use the LangChain framework with OpenAI API and Python to generate objective questions with multiple choices from text content. LangChain will help us in generating the output in a consistent format that could be used in another downstream task.

Implementation of Question Generation using LangChain

Setting Up the Environment

We will use Google Colab for the implementation of question generation. However, you can also use your local system if you have the latest libraries installed.

Download the full code from here.

In Colab, we first need to install the langchain and openai libraries.

!pip install langchain
!pip install openai

Import Libraries

Next, import the required libraries and modules. Specify your OpenAI API secret key as well. You can get your OpenAI API key from https://platform.openai.com/account/api-keys.

import os
import re
import json

# To help construct our Chat Messages
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate

# We will be using ChatGPT model (gpt-3.5-turbo)
from langchain.chat_models import ChatOpenAI

# To parse outputs and get structured data back
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

# Enter your API Key
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"

Input Text to Create Questions

The following input text will be used with the OpenAI API. The MCQs will be generated from this text passage.

# prompt
text = """
Adoption of 3D printing has reached critical mass as those who have yet 
to integrate additive manufacturing somewhere in their supply chain are 
now part of an ever-shrinking minority. Where 3D printing was only suitable 
for prototyping and one-off manufacturing in the early stages, it is now 
rapidly transforming into a production technology.

Most of the current demand for 3D printing is industrial in nature. 
Acumen Research and Consulting forecasts the global 3D printing market 
to reach $41 billion by 2026.

As it evolves, 3D printing technology is destined to transform almost 
every major industry and change the way we live, work, and play in the future.
"""

Specify Output Format of Questions using LangChain

The benefit of LangChain is that it allows you to define the format of the output of the large language models like ChatGPT. We will first define the response schema by naming different components of the multiple choice questions like the question, its options, and the correct answer.

We have also added descriptions for each of the components. These descriptions would help the ChatGPT model in understanding the role of each component.

# The schema I want out
response_schemas = [
    ResponseSchema(name="question", description="A multiple choice question generated from input text snippet."),
    ResponseSchema(name="option_1", description="First option for the multiple choice question. Use this format: 'a) option'"),
    ResponseSchema(name="option_2", description="Second option for the multiple choice question. Use this format: 'b) option'"),
    ResponseSchema(name="option_3", description="Third option for the multiple choice question. Use this format: 'c) option'"),
    ResponseSchema(name="option_4", description="Fourth option for the multiple choice question. Use this format: 'd) option'"),
    ResponseSchema(name="answer", description="Correct answer for the question. Use this format: 'd) option' or 'b) option', etc.")
]

# The parser that will look for the LLM output in my schema and return it back to me
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# The format instructions that LangChain makes. Let's look at them
format_instructions = output_parser.get_format_instructions()

print(format_instructions)

Output:

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"question": string  // A multiple choice question generated from input text snippet.
	"option_1": string  // First option for the multiple choice question. Use this format: 'a) option'
	"option_2": string  // Second option for the multiple choice question. Use this format: 'b) option'
	"option_3": string  // Third option for the multiple choice question. Use this format: 'c) option'
	"option_4": string  // Fourth option for the multiple choice question. Use this format: 'd) option'
	"answer": string  // Correct answer for the question. Use this format: 'd) option' or 'b) option', etc.
}
```

Define Prompt Template

Let’s instantiate the ChatGPT model. We will use gpt-3.5-turbo model to generate the questions.

# create ChatGPT object
chat_model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

Now let’s create the prompt template. It allows the users to simply change the input text in the prompt without specifying the instructions again and again.

# The prompt template that brings it all together
prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("""Given a text input, generate multiple choice questions 
        from it along with the correct answer. 
        \n{format_instructions}\n{user_prompt}""")  
    ],
    input_variables=["user_prompt"],
    partial_variables={"format_instructions": format_instructions}
)

Generate MCQs using ChatGPT API

user_query = prompt.format_prompt(user_prompt = text)
user_query_output = chat_model(user_query.to_messages())

print(user_query_output.content)

Output:

```json
{
	"question": "What is the current status of adoption of 3D printing?",
	"option_1": "a) It is still in the early stages of development",
	"option_2": "b) It has reached critical mass",
	"option_3": "c) It is only suitable for prototyping",
	"option_4": "d) It is rapidly transforming into a production technology",
	"answer": "b) It has reached critical mass"
}
{
	"question": "What is the forecasted size of the global 3D printing market by 2026?",
	"option_1": "a) $41 million",
	"option_2": "b) $41 billion",
	"option_3": "c) $41 trillion",
	"option_4": "d) $41 quadrillion",
	"answer": "b) $41 billion"
}
{
	"question": "What is the potential impact of 3D printing technology?",
	"option_1": "a) It will only impact a few industries",
	"option_2": "b) It will transform almost every major industry",
	"option_3": "c) It will have no impact on our lives",
	"option_4": "d) It will only impact the way we play",
	"answer": "b) It will transform almost every major industry"
}
```

So, we have successfully generated multiple choice questions and their choices with the help of ChatGPT API. However, the output is in markdown. We will have to convert it into a list of dictionaries to be able to use these questions for any downstream task.

Post Processing of Output Text

Now we will use regex and json libraries to transform the ChatGPT output of MCQs into desired format.

def parse_json_like(text):
    # Regular expression pattern to match JSON objects
    object_pattern = r'\{.*?\}'
    # Regular expression pattern to capture key-value pairs
    pair_pattern = r'"(.*?)":\s*"(.*?)"'

    # Find all JSON-like objects in the text
    objects = re.findall(object_pattern, text, re.DOTALL)

    # List to store dictionaries
    dict_list = []

    # Iterate over each object
    for obj in objects:
        # Find all key-value pairs within the object
        pairs = re.findall(pair_pattern, obj)

        # Convert pair tuples into a dictionary
        d = {key: value for key, value in pairs}
        dict_list.append(d)

    return dict_list

# Parse the string into list of dictionaries
parsed_questions = parse_json_like(user_query_output.content)

parsed_questions

Output:

[{'question': 'What is the current status of adoption of 3D printing?',
  'option_1': 'a) It is still in the early stages of development',
  'option_2': 'b) It has reached critical mass',
  'option_3': 'c) It is only suitable for prototyping',
  'option_4': 'd) It is rapidly transforming into a production technology',
  'answer': 'b) It has reached critical mass'},
 {'question': 'What is the forecasted size of the global 3D printing market by 2026?',
  'option_1': 'a) $41 million',
  'option_2': 'b) $41 billion',
  'option_3': 'c) $41 trillion',
  'option_4': 'd) $41 quadrillion',
  'answer': 'b) $41 billion'},
 {'question': 'What is the potential impact of 3D printing technology?',
  'option_1': 'a) It will only impact a few industries',
  'option_2': 'b) It will transform almost every major industry',
  'option_3': 'c) It will have no impact on our lives',
  'option_4': 'd) It will only impact the way we play',
  'answer': 'b) It will transform almost every major industry'}]

We can use much larger text documents as input and generate more questions. The quality of the questions can further be improved by using more descriptive prompts or instructions.

Download the entire notebook from here.

Question Generation using OpenAI’s Function Calling

Now there is another method that we can use to generate content in fixed format, known as Function calling. It is a new feature of the OpenAI API and can be used with GPT-4 and GPT-3.5-turbo models.

If you use this method then you don’t have to install LangChain.

Let’s use Function Calling to generate multiple choice questions for the same input text.

import openai
import json

# Set OpenAI API key
openai.api_key = "YOUR OpenAI API KEY"

text = """
Adoption of 3D printing has reached critical mass as those who have yet 
to integrate additive manufacturing somewhere in their supply chain are 
now part of an ever-shrinking minority. Where 3D printing was only suitable 
for prototyping and one-off manufacturing in the early stages, it is now 
rapidly transforming into a production technology.

Most of the current demand for 3D printing is industrial in nature. 
Acumen Research and Consulting forecasts the global 3D printing market 
to reach $41 billion by 2026.

As it evolves, 3D printing technology is destined to transform almost 
every major industry and change the way we live, work, and play in the future.
"""

As you can see below, we have defined the structure of the function that will be used in Function calling along with the GPT-4 model.

The name of the function is “create_mcq” and its description is “Create multiple choice questions from the input text with four candidate options. Three options are incorrect and one is correct. Indicate the correct option after each question.”

Adding description is quite important here. It tells the GPT-4 model what your function is supposed to do or what’s the objective of this function.

The “parameters” field contains the format of the output that we want GPT-4 to follow while generating its response.

mcq_function = [
        {
            "name": "create_mcq",
            "description": "Create multiple choice questions from the input text with four candidate options. Three options are incorrect and one is correct. Indicate the correct option after each question.",
            "parameters": {
                "type": "object",
                "properties": {
                    "questions": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "question":{"type": "string", 
                                            "description": "A multiple choice question extracted from the input text."},
                                
                                "options":{"type": "array",
                                           "items":{"type":"string",
                                                    "description": "Candidate option for the extracted multiple choice question."} 
                                            },
                                
                                "answer":{"type": "string", 
                                          "description": "Correct option for the multiple choice question."}
                            }                            
                          }
                        }
                    }
                },
                "required": ["questions"]
            }
        ]

Now we will pass the input text passage and the function defined in the previous section to extract the questions along with their candidate options and the correct answers in a consistent format.

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": text}],
        functions=mcq_function,
        function_call={"name": "create_mcq"},
)

print(response['choices'][0]['message']['function_call']['arguments'])

Output:

{
  "questions": [
    {
      "question": "What is one area where 3D printing is increasingly being used?",
      "options": [
        "Food processing",
        "Fashion design",
        "Production technology",
        "Event planning"
      ],
      "answer": "Production technology"
    },
    {
      "question": "What is the projected global market value of 3D printing by 2026, according to Acumen Research and Consulting?",
      "options": [
        "$30 billion",
        "$41 billion",
        "$50 billion",
        "$60 billion"
      ],
      "answer": "$41 billion"
    },
    {
      "question": "Which statement best describes the evolution of 3D printing?",
      "options": [
        "It has not evolved beyond prototyping.",
        "Its adoption is declining.",
        "It is transforming nearly all major industries.",
        "It has no future impact."
      ],
      "answer": "It is transforming nearly all major industries."
    }
  ]
}

As you can see, the final output is a well structured set of multiple-choice questions with answers in a JSON format.

Conclusion

In this article, we explored the process of generating multiple choice questions automatically from text data using ChatGPT/OpenAI API, LangChain and Function Calling. By leveraging AI, educators can save time and resources while creating personalized, diverse, and adaptive assessments for students.

The step-by-step process outlined in the article to generate quiz questions serves as a foundation for further exploration and improvement in the field of automatic content generation. By experimenting with the provided code and techniques, you can contribute to the development of more advanced and effective educational tools that harness the power of generative AI and large language models (LLMs).

References

Leave a Reply

Your email address will not be published. Required fields are marked *