Deep Dive into the Internals of Langchain Vector Store Retriever

In this blog post we will be going through the internals of these 3 Vector store retrievers:

Simple Retriever
MultiQuery Retriever
ContextualCompression Retriever

We are using an in-memory FAISS Vector store in our example. For populating the Vector store we scraped data from a few web pages, split the scraped text into chunks, embedded them, and passed the embeddings to the Vector store.

The Vector store is populated with data scraped from these URLs:

I have created a custom function called scrape that can be used for scraping content for a given URL. The scraped web page content is stored in the content_list variable.

content_list = []
for url in urls:
    content = scrape(url)
    content_list.append(content)

The next step is to feed all the fetched content to the FAISS Vector store.

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n"], chunk_size=1000, chunk_overlap=150)
docs = text_splitter.create_documents(content_list)

embeddings = OpenAIEmbeddings()
faiss_db = FAISS.from_documents(docs, embeddings)

Now that we have loaded data to the in-memory Vector store, it's time to try out the retrievers.

Simple Retriever

As the name suggests, Simple Retriever is a basic retriever provided by Langchain to query the Vector store.

simple_retriever = faiss_db.as_retriever(search_kwargs={"k": 2})
question = "What are vector embeddings?"
result = simple_retriever.get_relevant_documents(question)

Here we have created a FAISS DB retriever with k value as 2. The k factor determines how many documents should be returned by the retriever. For retrieving documents, we have used the get_relevant_documents method. Below are the two retrieved documents:

[
Document(page_content='Vector embeddings?\n\nAt this point, we’ve defined an unfamiliar concept with an even more unfamiliar concept, but don’t be concerned. Vector embeddings are really just a simplified numerical representation of complex data, used to make it easier to run generic machine-learning algorithms on sets of that data. By taking real-world objects and translating them to vector embeddings — numerical representations — those numbers can be fed into machine learning algorithms to determine semantic similarity.\n\nFor example, let’s consider the phrase “one in a million.” Is this phrase more similar to “once in a lifetime” or more similar to “a million to one”? You might have some intuition about which pairing is more similar. By creating a vector embedding for each of these phrases, machine learning goes beyond human intuition to generate actual metrics to quantify that similarity.', metadata={}),

Document(page_content='Vector embeddings are incredibly powerful, and this is by no means an exhaustive list — head to our example apps section to go deeper. You can also read more about the basics of vector search to see how Pinecone can help you wrangle vector embeddings.\n\nShare via: \n\nRoie Schwaber-Cohen\n\nDeveloper Advocate\n\nWhat problem are we trying to solve?\n\nWhat are vector embeddings?\n\nWhat can I do with vector embeddings?', metadata={})
]

MultiQuery Retriever

Here, we start by using LLM to create various questions based on different views of the user's question. For each question, we find related documents from the Vector store. Then, we check for duplicates among all the documents to make a special list of unique ones.

from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever
import logging

llm = ChatOpenAI(temperature=0)
multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=faiss_db.as_retriever(), llm=llm
)

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

question = "What are vector embeddings?"
multi_query_result = multi_query_retriever.get_relevant_documents(query=question)

When we run the retriever it generated 3 queries from the given query:

How do vector embeddings work?
Can you explain the concept of vector embeddings?
What is the purpose of vector embeddings?

The related documents for all the queries are fetched and returned back to the user post removing the duplicates. Below are the documents returned for the above 3 queries.

[
Document(page_content='While this visualization represents only three dimensions of the embeddings, it can help us understand how the embedding model works. There are multiple data points highlighted in the visualization, each representing a vector embedding for a word. As the name suggests, word2vec embeds words. Words that appear close to one another are semantically similar, while far-apart words have different semantic meanings.\n\nOnce trained, an embedding model can transform our raw data into vector embeddings. That means it knows where to place new data points in the vector space.\n\nAs we saw with word2vec, within the context of the model, vectors that are close together have a contextual similarity, whereas far-apart vectors are different from one another. That’s what gives our vector meaning — its relationship with other vectors in the vector space depends on how the embedding model “understands” the domain it was trained on.\n\nWhat can I do with vector embeddings?', metadata={}),

Document(page_content='What can I do with vector embeddings?\n\nVector embeddings are an incredibly versatile tool and can be applied in many domains. Generally speaking, an application would use a vector embedding as its query and produce other vector embeddings which are similar to it, with their corresponding values. The difference between applications of each domain is the significance of this similarity.\n\nHere are some examples:\n\nSemantic Search - search engines traditionally work by searching for overlaps of keywords. By leveraging vector embeddings, semantic search can go beyond keyword matching and deliver based on the query’s semantic meaning.\n\nQuestion-answering applications - by training an embedding model with pairs of questions and corresponding answers, we can create an application that would answer questions that have not been seen before.', metadata={}),

Document(page_content='Vector embeddings?\n\nAt this point, we’ve defined an unfamiliar concept with an even more unfamiliar concept, but don’t be concerned. Vector embeddings are really just a simplified numerical representation of complex data, used to make it easier to run generic machine-learning algorithms on sets of that data. By taking real-world objects and translating them to vector embeddings — numerical representations — those numbers can be fed into machine learning algorithms to determine semantic similarity.\n\nFor example, let’s consider the phrase “one in a million.” Is this phrase more similar to “once in a lifetime” or more similar to “a million to one”? You might have some intuition about which pairing is more similar. By creating a vector embedding for each of these phrases, machine learning goes beyond human intuition to generate actual metrics to quantify that similarity.', metadata={}),

Document(page_content='The challenge of working with vector embeddings is that traditional scalar-based databases can’t keep up with the complexity and scale of such data, making it difficult to extract insights and perform real-time analysis. That’s where vector databases come into play – they are intentionally designed to handle this type of data and offer the performance, scalability, and flexibility you need to make the most out of your data.\n\nWith a vector database, we can add advanced features to our AIs, like semantic information retrieval, long-term memory, and more. The diagram below gives us a better understanding of the role of vector databases in this type of application:\n\nLet’s break this down:\n\nFirst, we use the embedding model to create vector embeddings for the content we want to index.\n\nThe vector embedding is inserted into the vector database, with some reference to the original content the embedding was created from.', metadata={}),

Document(page_content='In those cases, we can use vector embeddings as a form of automatic feature engineering. Instead of manually picking the required features from our data, we apply a pre-trained machine learning model that will produce a representation of this data that is more compact while preserving what’s meaningful about the data.\n\nWhat are vector embeddings?\n\nBefore we delve into what vector embeddings are, let’s talk about vectors. A vector is a mathematical structure with a size and a direction. For example, we can think of the vector as a point in space, with the “direction” being an arrow from (0,0,0) to that point in the vector space.\n\nAs developers, it might be easier to think of a vector as an array containing numerical values. For example:\n\n.4\n\nWhen we look at a bunch of vectors in one space, we can say that some are closer to one another, while others are far apart. Some vectors can seem to cluster together, while others could be sparsely distributed in the space.', metadata={}),

Document(page_content='We’ll soon explore how these relationships between vectors can be useful.\n\nVectors are an ideal data structure for machine learning algorithms — modern CPUs and GPUs are optimized to perform the mathematical operations needed to process them. But our data is rarely represented as vectors. This is where vector embedding comes into play. It’s a technique that allows us to take virtually any data type and represent it as vectors.\n\nBut it isn’t as simple as just turning data into vectors. We want to ensure that we can perform tasks on this transformed data without losing the data’s original meaning. For example, if we want to compare two sentences — we don’t want just to compare the words they contain but rather whether or not they mean the same thing. To preserve the data’s meaning, we need to understand how to produce vectors where relationships between the vectors make sense.', metadata={})
]

Internal Workings

The MultiQueryRetriever uses the below prompt to generate similar relevant queries:

You are an AI language model assistant. Your task is 
to generate 3 different versions of the given user 
question to retrieve relevant documents from a vector  database. 
By generating multiple perspectives on the user question, 
your goal is to help the user overcome some of the limitations 
of distance-based similarity search. Provide these alternative 
questions separated by newlines. Original question: {question}

Below is the low-level design of its working, first the MultiQueryRetriever generate different variants of the user query by calling the LLM model. We iterate through each of the generated queries one by one and fetch the relevant documents from the in-memory FAISS Vector store. We merge all the documents fetched from the Vector store and deduplicate the list.

ContextualCompression Retriever

In ContextualCompressionRetriever the retrieved documents from the Vector store are compressed using the context of the given query. It filters out the irrelevant content from the retrieved document.

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=faiss_db.as_retriever())

question = "What are vector embeddings?"
compressed_docs = compression_retriever.get_relevant_documents(question)

For the above code, the contextual compressor returned 4 documents and we can see that the document returned this time is very short compared to the previous retriever output.

[
Document(page_content='Vector embeddings are really just a simplified numerical representation of complex data, used to make it easier to run generic machine-learning algorithms on sets of that data. By taking real-world objects and translating them to vector embeddings — numerical representations — those numbers can be fed into machine learning algorithms to determine semantic similarity.', metadata={}),

Document(page_content='Vector embeddings are incredibly powerful, and this is by no means an exhaustive list — head to our example apps section to go deeper.', metadata={}),

Document(page_content='In those cases, we can use vector embeddings as a form of automatic feature engineering. Instead of manually picking the required features from our data, we apply a pre-trained machine learning model that will produce a representation of this data that is more compact while preserving what’s meaningful about the data.\n\nBefore we delve into what vector embeddings are, let’s talk about vectors. A vector is a mathematical structure with a size and a direction. For example, we can think of the vector as a point in space, with the “direction” being an arrow from (0,0,0) to that point in the vector space.\n\nAs developers, it might be easier to think of a vector as an array containing numerical values. For example:\n\n.4\n\nWhen we look at a bunch of vectors in one space, we can say that some are closer to one another, while others are far apart. Some vectors can seem to cluster together, while others could be sparsely distributed in the space.', metadata={}),

Document(page_content='Vector embeddings are one of the most fascinating and useful concepts in machine learning. They are central to many NLP, recommendation, and search algorithms. If you’ve ever used things like recommendation engines, voice assistants, language translators, you’ve come across systems that rely on embeddings.', metadata={})
]

Internal Workings

ContextualCompressionRetriever uses LLMChainExtractor to extract the relevant content from the doc. LLMChainExtractor internally calls the LLM model to extract content. Following is the prompt template used by LLMChainExtractor .

Given the following question and context, extract any part of the context *AS IS* that is relevant to answer the question. If none of the context is relevant return {no_output_str}. 

Remember, *DO NOT* edit the extracted parts of the context.

> Question: {{question}}
> Context:
>>>
{{context}}
>>>
Extracted relevant parts:

For our example, below is one of the prompts being sent to the LLM model.

Given the following question and context, extract any part of the context *AS IS* that is relevant to answer the question. If none of the context is relevant return NO_OUTPUT. 

Remember, *DO NOT* edit the extracted parts of the context.

> Question: What are vector embeddings?
> Context:
>>>
What can I do with vector embeddings?\n\nVector embeddings are an incredibly versatile tool and can be applied in many domains. Generally speaking, an application would use a vector embedding as its query and produce other vector embeddings which are similar to it, with their corresponding values. The difference between applications of each domain is the significance of this similarity.\n\nHere are some examples:\n\nSemantic Search - search engines traditionally work by searching for overlaps of keywords. By leveraging vector embeddings, semantic search can go beyond keyword matching and deliver based on the query’s semantic meaning.\n\nQuestion-answering applications - by training an embedding model with pairs of questions and corresponding answers, we can create an application that would answer questions that have not been seen before.
>>>
Extracted relevant parts:

The LLD diagram illustrates the flow, where first the list of documents is fetched from the Vector store for the user query. Post that, using the LLMChainExtractor documents are compressed so that it contains only the relevant answer to the user question.

I trust this blog provided a better understanding of the working of Vector store retrievers. If you have any questions regarding the topic, please don't hesitate to ask in the comment section. I will be more than happy to address them. I regularly create similar content on Langchain, LLM, and AI topics. If you'd like to receive more articles like this, consider subscribing to my blog.

If you're in the Langchain space or LLM domain, let's connect on Linkedin! I'd love to stay connected and continue the conversation. Reach me at: linkedin.com/in/ritobrotoseth

Deep Dive into the Internals of Langchain Vector Store Retriever

Table of contents

Simple Retriever

MultiQuery Retriever

Internal Workings

ContextualCompression Retriever

Internal Workings

Did you find this article valuable?