In this tutorial, we will explore the process of building a Multi-Hop question-answering system using DSPy. Our journey will involve leveraging Qdrant DB as the data source for our multi-hop QA. We will demonstrate how to load a PDF file, specifically an ebook on Global Warming, into our vector database. By the end of this tutorial, we'll be able to answer complex questions based on the comprehensive information contained within the book.
What is DSPy?
DSPy, short for Declarative Self-Improving Language Programs, is an LLM framework that merges Large Language Models (LLMs) and Retrieval Models (RMs) to handle complex tasks. This framework uses a programming-focused method to improve how we develop language model-based processes. Instead of relying on traditional prompting techniques, it highlights structured and efficient programming practices. DSPy aims to make these applications more reliable by advocating a programming-first approach. This allows the entire process to be dynamically recompiled for specific tasks, without needing constant manual adjustments to prompts. You can read more about DSPy here.
Now let's try to understand what is Multi-Hop Question Answering.
Multi-Hop Question Answering
Multi-Hop Question Answering is a task in natural language processing (NLP) where the system needs to find an answer by connecting information from multiple pieces of text or data source. Unlike single-hop QA, where the answer can be found directly in one passage, multi-hop QA requires reasoning across different sources or parts of the text to arrive at the correct answer.
For example, to answer the question "What is the capital of the country where the Eiffel Tower is located?", the system needs to first identify that the Eiffel Tower is in France and then determine that the capital of France is Paris. This involves two steps, or "hops," connecting multiple pieces of information to provide the correct answer.
To build a multi-hop QA we can use the Baleen system. Baleen enhances multi-hop QA systems by promoting a modular pipeline development approach, where distinct modules handle specific tasks like fact retrieval, reasoning, and answer synthesis, enabling reusability and dynamic reconfiguration.
Its dynamic recompilation engine adapts the processing pipeline in real-time based on the specific question, improving efficiency and reducing processing time. By emphasizing structured programming practices, Baleen reduces the fragility of traditional prompt-based systems and improves error handling, making the system more reliable.
Additionally, Baleen integrates seamlessly with large language models (LLMs), supporting advanced reasoning and information retrieval, and helps scale multi-hop QA systems to manage more queries and complex tasks effectively.
Our multi-hop QA RAG will be using the Qdrant vector database to retrieve information. Qdrant is an open-source vector database and vector search engine that provides a fast and scalable vector similarity search service. It is designed to handle high-dimensional vectors for performance and massive-scale AI applications. You can learn more about the Qdrant vector database here.
Now let's dive into the code to understand the implementation.
Code Implementation
Following are the dependencies that we need to install:
pip install dspy-ai[qdrant]
pip install pypdf langchain_community
We are installing pypdf
to load pdf files and we are installing langchain_community
to use the PyPDFLoader
tool.
We will begin with loading the PDF file and splitting it down into smaller chunks called documents.
from langchain_community.document_loaders import PyPDFLoader
# loading and splitting PDF
loader = PyPDFLoader("global_warming.pdf")
docs = loader.load_and_split()
Initializing Qdrant client
Next, we will initialize the Qdrant client which will be used to store and fetch the documents. For initializing the Qdrant client we will need the QDRANT_URL
and the QDRANT_API_KEY
.
import dspy
from dspy.retrieve.qdrant_rm import QdrantRM
from qdrant_client import QdrantClient
from google.colab import userdata
turbo = dspy.OpenAI(model="gpt-3.5-turbo")
qdrantClient = QdrantClient(
url=userdata.get('QDRANT_URL'),
prefer_grpc=True,
api_key=userdata.get('QDRANT_API_KEY'))
Storing data in Qdrant DB
After initializing the QdrantClient, we will store the data in the DB. For that first, we will extract the text from each document and store it in the doc_contents
string array. Next, we’ll call the QdrantClient add
function to save the contents of doc_contents
. The name of our Qdrant collection is global_warming
.
doc_contents = [doc.page_content for doc in docs]
doc_ids = list(range(1, len(docs) + 1))
qdrantClient.add(
collection_name="global_warming",
documents=doc_contents,
ids=doc_ids,
)
After executing the above code, we can find the new collection global_warming
along with the documents in the Qdrant collection dashboard.
Retrieving documents from Qdrant DB
Let's try to perform a simple retrieval from the Qdrant DB.
from google.colab import userdata
import dspy
from dspy.retrieve.qdrant_rm import QdrantRM
turbo = dspy.OpenAI(model="gpt-3.5-turbo")
qdrant_retriever_model = QdrantRM("global_warming", qdrantClient)
dspy.settings.configure(lm=turbo, rm=qdrant_retriever_model)
retrieve = dspy.Retrieve(k=3)
question = "What is radiation balance?"
topK_passages = retrieve(question).passages
print(f"Top {retrieve.k} passages for question: {question} \n", "\n")
for idx, passage in enumerate(topK_passages):
print(f"{idx+1}]", passage, "\n")
For our question What is radiation balance?
, we were able to retrieve the top 3 relevant passages from the Qdrant vector DB.
Building Multi-Hop Question Answering Pipeline
So far we have learned to initialize the Qdrant client, store data in the vector DB, and fetch documents from the DB. Now let's dive into the code of Multi-Hop Question Answering.
First, we will define 2 DSPy signatures: GenerateSearchQuery
and GenerateAnswer
.
The GenerateSearchQuery
signature is an intermediate question generation step for the hop behavior. The context and the question are sent as input to the Signature, and the output is a search query to find missing information.
The GenerateAnswer
signature is used to generate answers from the given context. The question and the context are passed to the signature and it outputs the answer to the question.
class GenerateSearchQuery(dspy.Signature):
"""Write a simple search query that will help answer a complex question."""
context = dspy.InputField(desc="may contain relevant facts")
question = dspy.InputField()
query = dspy.OutputField()
class GenerateAnswer(dspy.Signature):
"""Answer questions with short factoid answers."""
context = dspy.InputField(desc="may contain relevant facts")
question = dspy.InputField()
answer = dspy.OutputField(desc="often between 10 and 15 words")
The above 2 signatures will be used to build the Baleen pipeline. Baleen is designed as a framework to enhance the development of LLM-based applications by prioritizing a programming-first approach over traditional prompt-based techniques. It can significantly enhance the development and performance of multi-hop QA systems.
Here we have defined a simple module called SimplifiedBaleen. Inside the __init__
method of our module, we have defined a few sub-modules. The generate_query
uses the ChainOfThought
module to perform step-by-step reasoning to generate the query; it uses the GenerateSearchQuery
signature to generate the query for each hop. The dspy.Retrieve
method will fetch the top k passages from the vector DB. The generate_answer
variable uses the ChainOfThought
module to get the answer to the question; it uses the GenerateAnswer signature to produce the answer.
from dsp.utils import deduplicate
class SimplifiedBaleen(dspy.Module):
def __init__(self, passages_per_hop=3, max_hops=2):
super().__init__()
self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
self.retrieve = dspy.Retrieve(k=passages_per_hop)
self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
self.max_hops = max_hops
def forward(self, question):
context = []
for hop in range(self.max_hops):
query = self.generate_query[hop](context=context, question=question).query
passages = self.retrieve(query).passages
context = deduplicate(context + passages)
pred = self.generate_answer(context=context, question=question)
return dspy.Prediction(context=context, answer=pred.answer)
Now let's run our code to get an answer from our multi-hop QA.
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
# Question asked to multi-hop QA RAG.
my_question = "What is causing global warming? And what are the different components of the reason?"
# Get the prediction. This contains `pred.context` and `pred.answer`.
uncompiled_baleen = SimplifiedBaleen() # uncompiled (i.e., zero-shot) program
pred = uncompiled_baleen(my_question)
# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")
For the given question: What is causing global warming? And what are the different components of the reason?
We got the answer: Greenhouse gases, like carbon dioxide, methane, and nitrous oxide, trap heat.
Below are the full output logs:
Question: What is causing global warming? And what are the different
components of the reason?
Predicted Answer: Greenhouse gases, like carbon dioxide, methane,
and nitrous oxide, trap heat.
Retrieved Contexts (truncated): ['Chapter 1\nGlobal warming and climate change\nThe phrase ‘global warming’ has become familiar to many people as\none of the important environmental issues of our day. Many opinions\nhave been expressed co...', 'Notes 75\nexplanation which involve the deep ocean circulation. It is also clear\nfrom paleodata that large changes have occurred at different times in thepast in the formation of deep water and in the ...', 'Notes 13\n2 Make up a simple questionnaire about climate change, global warming and\nthe greenhouse effect to find out how much people know about these sub-\njects, their relevance and importance. Analyse...', 'Chapter 2\nThe greenhouse effect\nThe basic principle of global warming can be understood by considering\nthe radiation energy from the Sun that warms the Earth’s surface and\nthe thermal radiation from t...', 'Chapter 3\nThe greenhouse gases\nThe greenhouse gases are those gases in the atmosphere which, by\nabsorbing thermal radiation emitted by the Earth’s surface, have a blan-\nketing effect upon it. The most...']
To understand the reasoning it used to generate the answer, run the below command:
turbo.inspect_history(n=3)
Below is the log trace of the multi-hop QA execution. I have removed the vector DB fetched document text from the logs to keep it clean and readable.
The placeholder <context_fetched_from_Qdrant>
is where the DB-fetched documents were present.
Write a simple search query that will help answer a complex question.
---
Follow the following format.
Context: may contain relevant facts
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the query}. We ...
Query: ${query}
---
Context:
<context_fetched_from_Qdrant>
Question: What is causing global warming? Give details about the different components resulting in global warming?
Reasoning: Let's think step by step in order to produce the query. We need to identify the different components contributing to global warming in order to understand its causes.
Query: What are the greenhouse gases and their global warming potential (GWP) as listed in the Kyoto Protocol?
Answer questions with short factoid answers.
---
Follow the following format.
Context: may contain relevant facts
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: often between 10 and 15 words
---
Context:
<context_fetched_from_Qdrant>
Question: What is causing global warming? Give details about the different components resulting in global warming?
Reasoning: Let's think step by step in order to Answer: Human activities, such as burning fossil fuels, deforestation, and industrial processes.
Answer questions with short factoid answers.
---
Follow the following format.
Context: may contain relevant facts
Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: often between 10 and 15 words
---
Context:
<context_fetched_from_Qdrant>
Question: What is causing global warming? Give details about the different components resulting in global warming?
Reasoning: Let's think step by step in order to Answer: Human activities, such as burning fossil fuels, deforestation, and industrial processes.
Answer: Human activities like burning fossil fuels and deforestation contribute to global warming.
I hope you found this blog on building a multi-hop QA using DSPy useful. If you have any questions regarding the topic, please don't hesitate to ask in the comment section. I will be more than happy to address them.
I regularly create similar content on LangChain, LlamaIndex, Vector databases, and other RAG topics. If you'd like to read more articles like this, consider subscribing to my blog.
If you're in the Generative AI space or LLM application domain, let's connect on Linkedin! I'd love to stay connected and continue the conversation. Reach me at: linkedin.com/in/ritobrotoseth