Unveiling the Secrets of Langchain Memory: How LLM Models Retain the Past for Future Conversations

What is memory and why do we need it?

Memory is a kind of storage that is used to retain information from previous interactions. We need memory whenever we want to derive some output from the LLM based on the previous interactions.

Deriving context from previous interactions is something we have seen in ChatGPT too. If you see the below example where I have asked to write a Java program to print 5 numbers.

In the next interaction without giving any content I ask it to print only the even numbers and it automatically updates the existing Java code.

Memory is a very essential component of the ChatMessages application. Memory involves maintaining a concept of state throughout the user's and language model interactions.

LLM models like OpenAI don't have the capability to retain the previous conversation. Let's see that with an example. I will be using the same command that I have used above but this time I will be calling the OpenAI API and will try to fetch the response.

In the first interaction, it returned a Java code to print 5 numbers but in the second interaction since it didn't have the context it returned a Python code that prints only the even numbers.

Understanding the concept of chaining

Let's have a look at the below interaction:

Human: Add two number 16 and 8 and give me the result
AI: The sum of 16 and 8 is 24.
Human: Now divide the result with 4
AI: If we divide the result, which is 24, by 4, the quotient is 6.
Human: Now multiply the result with 15
AI: If we multiply the result, which is 6, by 15, the product is 90.

This is an example of a chained conversation, here each interaction with the model can be thought of as a chain.

Chain 1:

Human: Add two number 16 and 8 and give me the result
AI: The sum of 16 and 8 is 24.

Chain 2:

Human: Now divide the result with 4
AI: If we divide the result, which is 24, by 4, the quotient is 6.

Now let's have a look at the same above example but from the perspective of Prompt. Chains too operate in a stateless manner, treating each incoming query independently, similar to the underlying LLMs. So now the question is if the LLMs and chat models are stateless how are they able to derive the previous state interactions?

The answer to this lies in the prompt:

When we made the first request to the chat model, our prompt was:

Human: Add two number 16 and 8 and give me the result

But when we made the second request to the chat model we need to send the previous interaction too in the prompt so that it can take an appropriate decision. The prompt for the second interaction will be:

Human: Add two number 16 and 8 and give me the result
AI: The sum of 16 and 8 is 24.
Human: Now divide the result with 4

I hope this gives a better understanding of the internal working of chat models. Now next let's look at the different kinds of memories that are used to persist interactions in Langchain.

There are a few common types of memory that one will frequently use in Langchain:

ConversationBufferMemory
ConversationBufferWindowMemory
ConversationTokenBufferMemory
ConversationSummaryMemory

ConversationBufferMemory

import os
from langchain.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

os.environ["OPENAI_API_KEY"] = "sk-<your-openAI-key>"

memory = ConversationBufferMemory()

conversation_with_summary = ConversationChain(
    llm=OpenAI(temperature=0.6), 
    memory=memory, 
    verbose=True
)
conversation_with_summary.predict(input="Hi, what's up?")

ConversationBufferMemory is the simplest form of memory. We will use this memory to store the interactions between the human and AI. We first assigned the ConversationBufferMemory to the memory variable. Then we initialized the ConversationChain where we have passed the llm model to be used, the memory information and have set verbose equal to true. By setting verbose equals true we will be able to see the conversations that are persisted in the memory.

Below is when we have set verbose equal true:

Below is when we have set verbose equal to false:

ConversationBufferWindowMemory

ConversationBufferWindowMemory keeps a list of the interactions of the conversation over time. It only uses the last K interactions. This can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

import os
from langchain import ConversationChain
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferWindowMemory

os.environ["OPENAI_API_KEY"] = "sk-<your-openAI-key>"

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template("You are an answering bot, whatever question I ask you, you need to answer them in a sentence."),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{input}")
])

memory = ConversationBufferWindowMemory(k=3, return_messages=True)
memory.save_context(
    {"input": "What is AI?"},
    {"output": "AI, or artificial intelligence, refers to the development of computer systems that can perform tasks typically requiring human intelligence, such as problem-solving, learning, and decision-making."},
)
memory.save_context(
    {"input": "Give name of 5 AI companies?"},
    {"output": "Five AI companies are Google, Microsoft, IBM, Amazon, and Tesla."},
)
memory.save_context(
    {"input": "How AI makes decision?"},
    {"output": "AI makes decisions by using algorithms and data to analyze information and generate an output or action based on predefined rules or patterns it has learned during its training process."},
)

conversation_with_summary = ConversationChain(
    llm=OpenAI(temperature=0.6),
    memory=memory,
    prompt=prompt,
    verbose=True
)

print( conversation_with_summary.predict(input="Tell me something about the first AI company in the List") )
print( conversation_with_summary.predict(input="How to train an AI?") )
print( conversation_with_summary.predict(input="Tell me something about the fifth AI company in the List") )

In the above code, we have defined a system message in the prompt that says the LLM to answer the question in one sentence.

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template("You are an answering bot, whatever question I ask you, you need to answer them in a sentence."),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{input}")
])

Next, we have defined a ConversationBufferWindowMemory and set the value of k as 3, which means it will retain only the last 3 conversations in its memory.

memory = ConversationBufferWindowMemory(k=3, return_messages=True)

We have set some pre-conversations into the memory so that it has the context of the conversation that we want to have further. One important conversation that is set, is the list of the names of the 5 AI companies which are: Google, Microsoft, IBM, Amazon, and Tesla.

memory.save_context(
    {"input": "What is AI?"},
    {"output": "AI, or artificial intelligence, refers to the development of computer systems that can perform tasks typically requiring human intelligence, such as problem-solving, learning, and decision-making."},
)
memory.save_context(
    {"input": "Give name of 5 AI companies?"},
    {"output": "Five AI companies are Google, Microsoft, IBM, Amazon, and Tesla."},
)
memory.save_context(
    {"input": "How AI makes decision?"},
    {"output": "AI makes decisions by using algorithms and data to analyze information and generate an output or action based on predefined rules or patterns it has learned during its training process."},
)

When we call the LLM with our first question: "Tell me something about the first AI company in the List", it goes through the list of companies in the list and picks the first company which is Google.

In the above screenshot, we can also see that the memory has all the 3 conversations that we have set in the context.

When we ask the next question "How to train an AI?", we can see that the previous question "Tell me something about the first AI company in the List" is now set in the memory, and the first question which was set as context is removed from the memory.

When we ask the third and last question which is "Tell me something about the fifth AI company in the List", by then the list of companies is removed from the memory because we have set the retain conversation size as 3. So the LLM now has no idea of the list of AI companies we are referring to in our conversation. So it randomly picks the company Microsoft and describes it.

ConversationTokenBufferMemory

import os
from langchain.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationTokenBufferMemory

os.environ["OPENAI_API_KEY"] = "sk-<your-openAI-key>"

conversation_with_summary = ConversationChain(
    llm=OpenAI(),
    memory=ConversationTokenBufferMemory(llm=OpenAI(), max_token_limit=60),
    verbose=True,
)
conversation_with_summary.predict(input="Hi, what's up?")

ConversationTokenBufferMemory is almost the same as the ConversationBufferWindowMemory the only difference is it uses token length rather than number of interactions to determine when to flush interactions. In the above example while initializing the memory we have used the max_token_limit variable which has the size of the token length.

ConversationSummaryMemory

import os
from langchain.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory

os.environ["OPENAI_API_KEY"] = "sk-<your-openAI-key>"

llm = OpenAI(temperature=0)
conversation_with_summary = ConversationChain(
    llm=llm, 
    memory=ConversationSummaryMemory(llm=OpenAI()),
    verbose=True
)
conversation_with_summary.predict(input="Hi, what's up?")

ConversationSummaryMemory keeps on summarising the conversation between the Human and AI over time. It summarizes the conversation as it happens and stores the current summary in memory. This memory is most useful for longer conversations, where keeping the exact word-to-word past message history is not required because that would take up too many tokens.

But still, we need to be careful when using this memory, this memory may not be very suitable in all cases. There is an edge case which I have shared below:

I first greeted the model and then asked to solve the below math puzzle:

A person was born on May 14, 40 B.C. and died on May 14, 30 A.D. How many years did this person live?

The model tried to solve it and came up with an answer, then I asked the model to solve the same problem step by step.

Are you sure about this, try to solve this problem step by step

Before processing the above instruction the model had already summarised the previous conversation and there it had missed to store the problem. Below is the summarisation that the model created.

The human asks the AI how it is doing and the AI responds that it is doing 
great and helping a customer with a technical issue. The human then requests 
help from the AI, to which the AI agrees and asks what it can do to assist 
the human. The human then requests that the AI solve a maths puzzle for them, 
to which the AI agrees and asks for more information on the type of maths puzzle 
needed to be solved. The AI then solves the maths puzzle and concludes that the 
person lived for 80 years.

Since it didn't have the problem in its memory anymore it totally built its own set of a problem to match the answer.

I hope you found this blog post on LLM memory useful. I regularly create similar content on Langchain, LLM, and AI topics. If you'd like to receive more articles like this, consider subscribing to my blog.

If you're in the Langchain space or LLM domain, let's connect on Linkedin! I'd love to stay connected and continue the conversation. Reach me at: linkedin.com/in/ritobrotoseth