Retrieval and QA are important parts of LLM because they allow LLMs to answer questions about a wide range of topics. It is important for industries as it can be used in a variety of applications, such as customer service, education, and healthcare.
Retrieval is the process of finding the most relevant documents from a large corpus of text.
QA is a task in which a system is given a question and must return an answer that is the most relevant to the question. QA systems typically use a combination of retrieval and reasoning to answer questions. Retrieval is used to find the most relevant documents, and then reasoning is used to extract the answer from those documents. In this blog, we will see how we can get answers to our questions from our personal data.
Below the first image illustrates how personal data is stored in vector db and the second image illustrates retrieval and answering to user queries.
Similarity Search
To understand how retrieval works we need to first understand what is similarity search. Similarity search compares the vectors of two documents. The vectors are created by representing each document as a point in a high-dimensional space, where each dimension corresponds to a word in the document. The similarity between two documents is then calculated by measuring the distance between their vectors. The closer the vectors are, the more similar the documents are.
There are a number of different ways to calculate the distance between two vectors. One common method is to use the cosine similarity measure. The cosine similarity measure is calculated by taking the dot product of the two vectors and dividing it by the product of their norms. The dot product of two vectors is the sum of the products of their corresponding elements. The norm of a vector is the square root of the sum of the squares of its elements.
Another common method for calculating the distance between two vectors is to use the Euclidean distance measure. The Euclidean distance measure is calculated by taking the square root of the sum of the squared differences between the corresponding elements of the two vectors.
The similarity search algorithm is used to find documents that are similar to a given document. It can also be used to find documents that are similar to a set of documents.
When conducting a query on Vector DB using similarity search, it yields a ranked list of answers ordered by relevance, with the most relevant answer located at index zero. The 'k' parameter is utilized during querying to determine the desired result list size.
To optimize selection, we can employ Maximum Marginal Relevance (MMR), an algorithm aimed at enhancing relevance. Below, I provide a concise explanation of MMR, along with a sample Python code snippet for reference.
MMR (Maximum Marginal Relevance)
Maximum Marginal Relevance (MMR) is a technique used to iteratively identify dissimilar documents in order to improve the performance of retrievals. The basic idea is to start with a set of documents that are relevant to the query and then to remove documents that are similar to those documents. This process is repeated until the desired number of documents has been removed. MMR has been shown to be effective in improving the performance of retrieval systems.
The trade-off between relevance and diversity is controlled by the mmr_threshold parameter. A higher mmr_threshold value (close to 1) prioritizes relevance, while a lower value (close to 0) emphasizes diversity.
A document is deemed to possess high marginal relevance if it satisfies two criteria: relevance to the query and minimal similarity to previously chosen documents. The primary objective is to maximize marginal relevance in tasks involving retrieval and summarization.
import os
import openai
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
openai.api_key = os.environ['OPENAI_API_KEY']
embedding = OpenAIEmbeddings()
texts = [
"""Siddhartha Gautama, most commonly referred to as the Buddha was a wandering ascetic and religious teacher who lived in South Asia during the 6th or 5th century and founded Buddhism.""",
"""According to Buddhist tradition, he was born in Lumbini, in what is now Nepal, to royal parents of the Shakya clan, but renounced his home life to live as a wandering ascetic. After leading a life of mendicancy, asceticism, and meditation, he attained enlightenment at Bodh Gaya in what is now India. The Buddha thereafter wandered through the lower Indo-Gangetic Plain, teaching and building a monastic order. He taught a Middle Way between sensual indulgence and severe asceticism, leading to Nirvana, that is, freedom from ignorance, craving, rebirth, and suffering. His teachings are summarized in the Noble Eightfold Path, a training of the mind that includes ethical training and meditative practices such as sense restraint, kindness toward others, mindfulness, and jhana/dhyana (meditation proper). He died in Kushinagar, attaining parinirvana. The Buddha has since been venerated by numerous religions and communities across Asia.""",
"""A couple of centuries after his death, he came to be known by the title Buddha, which means "Awakened One" or "Enlightened One". His teachings were compiled by the Buddhist community in the Vinaya, his codes for monastic practice, a compilation of teachings based on his discourses. These were passed down in Middle Indo-Aryan dialects through an oral tradition. Later generations composed additional texts, such as systematic treatises known as Abhidharma, biographies of the Buddha, collections of stories about his past lives known as Jataka tales, and additional discourses, i.e., the Mahayana sutras.""",
]
smalldb = Chroma.from_texts(texts, embedding=embedding)
question = "Where was Buddha born?"
smalldb.similarity_search(question, k=2)
smalldb.max_marginal_relevance_search(question,k=2, fetch_k=3)
The output of the above code returns the portion of the text that has the answer to the question. Following is the above code output:
[Document(page_content='According to Buddhist tradition, he was born in Lumbini, in what is now Nepal, to royal parents of the Shakya clan, but renounced his home life to live as a wandering ascetic. After leading a life of mendicancy, asceticism, and meditation, he attained enlightenment at Bodh Gaya in what is now India. The Buddha thereafter wandered through the lower Indo-Gangetic Plain, teaching and building a monastic order. He taught a Middle Way between sensual indulgence and severe asceticism, leading to Nirvana, that is, freedom from ignorance, craving, rebirth, and suffering. His teachings are summarized in the Noble Eightfold Path, a training of the mind that includes ethical training and meditative practices such as sense restraint, kindness toward others, mindfulness, and jhana/dhyana (meditation proper). He died in Kushinagar, attaining parinirvana. The Buddha has since been venerated by numerous religions and communities across Asia.', metadata={}),
Document(page_content='Siddhartha Gautama, most commonly referred to as the Buddha was a wandering ascetic and religious teacher who lived in South Asia during the 6th or 5th century and founded Buddhism.', metadata={})]
Now that we have an understanding of how retrieval works let's see how answering questions works. For answering we are using RetrievalQA chain, which under the hood calls load_qa_chain
.
The load_qa_chain
is a function that is used to load a question-answering chain in the Langchain library. It is used to create a chain that can answer questions based on a given context or document. The chain is trained to understand the context and generate relevant answers to questions asked about that context.
Below I have shared a sample code to fetch the
Understanding RetrievalQA chain
What is the Retrieval QA chain?
The Retrieval QA chain is a type of chain that combines retrieval and question answering to provide answers to questions. It uses a multi-retrieval approach, where it dynamically selects the most relevant retrieval system based on the question being asked. The chain is designed to answer specific types of questions by utilizing different retrieval systems.
What is a retrieval system?
A retrieval system is an interface that returns documents based on an unstructured query. It is a type of retriever that can be used to retrieve relevant documents for a given query. The retrieval system does not need to store the documents, but it should be able to return or retrieve them. In the context of the Retrieval QA chain, different retrieval systems are used to retrieve relevant information based on the question being asked.
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
llm_name = "gpt-3.5-turbo-0301"
llm = ChatOpenAI(model_name=llm_name, temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=smalldb.as_retriever()
)
question = "Where was Buddha born?"
result = qa_chain({"query": question})
result["result"]
It gave the following answer:
'According to Buddhist tradition, Buddha was born in Lumbini, which is now located in Nepal.'