Skip to main content

Command Palette

Search for a command to run...

Building Narad: A Stateful AI Chatbot That Actually Remembers

Updated
10 min read
A
Exploring developments in AI and building things that I learn. Currently diving deep into LLMs, RAG systems, and Agentic AI. Here to document what i learn.

Most AI chatbot demos share the same fatal flaw — refresh the page and the conversation is gone. The "memory" is an illusion, held together by session state that evaporates the moment the process restarts. These demos look impressive in a Jupyter notebook. They fall apart the moment you deploy them.

Narad is my attempt to build something genuinely different: a multi-session chatbot with real persistence, streaming responses, and a clean thread management UI — built on LangGraph and deployed on Render with a PostgreSQL backend.

This post is about the decisions I made, the things that broke, and what I learned building it.


What Narad Does (and Why It Matters)

Narad is a deployed conversational AI with the following properties:

  • Multiple independent conversation threads, each with isolated memory

  • Persistence that survives process restarts and redeployments

  • Streaming responses via LangGraph's message stream

  • Thread management — create, resume, and delete conversations from the sidebar

  • LangSmith observability, organized per thread

  • Deployed live at narad-chat.onrender.com

The stack is intentionally simple: LangGraph + GPT-4o + Streamlit + PostgreSQL + LangSmith + Render.


The Problem with Typical Approaches

The standard AI chatbot uses ConversationBufferMemory or just appends messages to a list. This works locally for a single session. It fails in production for three reasons.

First, it's stateless across restarts. The moment your server restarts — which happens constantly in cloud deployments — all conversation history is gone.

Second, it doesn't support multiple sessions. If two users hit the same endpoint, their conversations collide. There's no concept of thread isolation.

Third, there's no way to restore a past conversation. The history exists only in RAM, invisible to anything outside the current process.

Some projects solve this with st.session_state in Streamlit, which is slightly better — at least the history persists across rerenders within a browser tab. But it still dies on refresh, and it's fundamentally UI state being used as application state.


Architecture

My first instinct was to keep things simple — a single chat loop with in-memory state. That worked until I needed multiple conversations with isolated history and persistence across restarts. At that point, the limitations of a linear chain became obvious.

Most tutorials use simple chains or wrapper abstractions, and they work — until you need persistent state, tool loops, or multi-step control flow. At that point, you either build your own orchestration layer or use something designed for it.

LangGraph gave me that control without forcing me into manual state management — which is notoriously error-prone.

The core of Narad is a LangGraph StateGraph with a single chat_node:

START → chat_node → END

On paper, the graph looks trivial. The value isn't the graph structure — it's the state management LangGraph gives you around it. That's what enables persistence, and it's implemented through checkpointing.

Every time the graph executes, LangGraph writes a checkpoint. For a single turn, that's three checkpoints: the initial empty state, the state after the user message is added, and the state after the AI response. These checkpoints are keyed by thread_id, which is how LangGraph isolates conversations.

The ChatState is minimal:

class ChatState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]

The add_messages reducer handles message accumulation automatically — you return a new message, it gets appended to the list. No manual state management required.


Streaming

The difference between invoke and streaming is the difference between a chatbot that feels slow and one that feels responsive. With invoke, the user stares at a blank screen until the entire response is generated. With streaming, tokens appear as they're generated which mimics human conversations and thus a much better UX.

LangGraph's stream_mode="messages" returns (message_chunk, metadata) tuples. Each chunk is an AIMessageChunk with a .content field. Passing this to st.write_stream renders tokens progressively:

ai_message = st.write_stream(
    chunk.content for chunk, metadata in chatbot.stream(
        {'messages': [HumanMessage(content=user_input)]},
        config=CONFIG,
        stream_mode="messages"
    )
)

st.write_stream also returns the full assembled string, which gets appended to message_history for rendering on subsequent rerenders.


The Resume Chat Feature

The multi-session UI was the most interesting part of the frontend work. Each conversation is identified by a thread_id — a UUID generated when the chat starts. The sidebar lists all threads, labeled by the user's first message. Clicking a thread loads that conversation and resumes it.

Under the hood, load_conversation() calls chatbot.get_state() to fetch the LangGraph checkpoint for that thread, converts the HumanMessage and AIMessage objects into the {"role", "content"} format the UI expects, and adds it to st.session_state['message_history'].

On app startup, retrieve_all_threads() calls checkpointer.list() to rebuild the sidebar from whatever threads exist in the database — so the history survives a full app restart, not just a page reload.

A bug worth documenting

When implementing thread recovery on startup, the natural approach was to call load_conversation(thread_id) inside the checkpointer.list() loop — iterate over threads, fetch messages for each, extract the first message as a label.

This caused the app to hang silently after processing the first checkpoint.

The root cause was a database deadlock. checkpointer.list() holds a read lock. Calling get_state() inside that loop tries to acquire another lock on the same connection. SQLite's single-writer model doesn't allow this — the second call blocks indefinitely.

The fix was to stop calling get_state() inside the loop entirely. Instead, I read the message data directly from the checkpoint object that checkpointer.list() already provides:

messages = checkpoint.checkpoint.get("channel_values", {}).get("messages", [])

One read operation, no nested database calls, no deadlock.


The Two-Layer Memory Model

One architectural detail worth understanding: the app maintains conversation history in two places simultaneously.

st.session_state['message_history'] is a list of {"role": ..., "content": ...} dicts used exclusively for rendering the chat UI. Streamlit rerenders the entire page on every interaction, so this list needs to exist in session state.

LangGraph's checkpointer holds the actual [HumanMessage, AIMessage, ...] list used as LLM context on each turn.

These are not redundant. They serve different purposes. The UI layer formats messages for display. The LangGraph layer formats messages for the model. Conflating them would mean either losing Streamlit's rendering control or leaking UI concerns into the graph state.

When a user switches threads, load_conversation() fetches the LangGraph state and converts it into the UI format, populating message_history from the checkpointer.


The Persistence Journey

This is where the project evolved the most.

And it matters more than demos suggest. If a chatbot can't remember conversations across sessions, it can't be used for support systems, personal assistants, or any long-running workflow. Persistence isn't a feature — it's a requirement for real-world use.

I started with MemorySaver — LangGraph's in-memory checkpointer. It worked perfectly in the notebook. The chatbot remembered names across turns, exactly as expected. The resume chat feature even worked — switching threads loaded the right history.

Then I restarted the app. Everything was gone.

MemorySaver is just a Python dict. It lives in RAM. The moment the Python process dies, so does every conversation.

The natural upgrade is SQLite via SqliteSaver. SQLite writes to a file on disk, so conversations survive process restarts. I verified this — kill the app, restart it, history still there. SQLite also has no external dependencies, which makes local development simple.

But SQLite has a deployment problem. On Render's free tier, web services run in containers that get replaced on every redeploy. The SQLite file lives inside the container. When the container is replaced, the file is gone. You'd lose all conversations on every deploy.

The fix: move state to a managed external service. PostgreSQL on Render runs as a separate service with its own persistent volume. It survives container replacements. It survives redeployments. This is what production persistence looks like.

The code change was minimal:

# SQLite
conn = sqlite3.connect('chatbot.db', check_same_thread=False)
checkpointer = SqliteSaver(conn)

# Postgres
conn = psycopg.connect(DB_URI, autocommit=True, prepare_threshold=0)
checkpointer = PostgresSaver(conn)
checkpointer.setup()

Three lines changed. The graph, the nodes, the Streamlit UI — nothing else touched. That's the payoff of good architecture: swappable components.

The persistence hierarchy in practice

Checkpointer Survives restart? Survives redeploy? Use case
MemorySaver No No Development / notebooks
SqliteSaver Yes No Local testing
PostgresSaver Yes Yes Production

Every LangGraph tutorial starts with MemorySaver. Almost none explain why you can't ship it.


Thread Deletion

An underappreciated detail: LangGraph's checkpointers don't expose a delete method. Deleting a thread from the UI only removes it from st.session_state — on the next restart, checkpointer.list() brings it back.

The fix is direct SQL. PostgreSQL's checkpoint tables are just regular tables. Deleting rows by thread_id permanently removes a thread:

def delete_thread(thread_id):
    conn.execute("DELETE FROM checkpoints WHERE thread_id = %s", (str(thread_id),))
    conn.execute("DELETE FROM checkpoint_writes WHERE thread_id = %s", (str(thread_id),))
    conn.execute("DELETE FROM checkpoint_blobs WHERE thread_id = %s", (str(thread_id),))
    conn.commit()

The table names differ between SQLite (writes) and Postgres (checkpoint_writes, checkpoint_blobs). Worth checking with SELECT tablename FROM pg_tables WHERE schemaname = 'public' before writing the delete queries.


Observability with LangSmith

LangSmith integration is two environment variables:

LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_key

No code changes. LangGraph sends traces automatically.

What makes it genuinely useful here is that traces are organized by thread_id. Every turn in a conversation produces a trace tagged with its thread — you can follow the full history of a specific conversation, see every message, every token count, every latency measurement.

I also added metadata to the config to make traces more readable in LangSmith:

CONFIG = {
    "configurable": {"thread_id": st.session_state['thread_id']},
    "metadata": {"thread_id": st.session_state['thread_id']},
    "run_name": "chat_turn",
}

For debugging, this is invaluable. When something goes wrong in a specific conversation, you can pull up that thread's trace history and see exactly what happened — without adding a single line of logging code.


What Makes Narad Different

Most chatbot demos optimize for simplicity. Narad optimizes for realism.

Conversations survive restarts. Multiple users interact with isolated threads. System behavior is observable through LangSmith traces. Threads can be created, resumed, and permanently deleted.

These aren't extra features — they're baseline requirements for anything resembling a production system. That shift in mindset is what informed my every choice in this project.


Lessons Learned

MemorySaver is not a persistence solution. It's a development convenience. The moment you care about data surviving a process restart, you need a real checkpointer. Plan for this from the start rather than migrating later.

Containerized deployments kill local file state. SQLite feels like persistence until your container gets replaced. Any state that needs to survive deployment must live outside the container.

Streamlit's rendering model requires discipline. Streamlit reruns the entire script on every interaction. Every stateful value has to live in st.session_state, initialized defensively, and updated in the right order. The bugs that come from Streamlit are almost always ordering bugs.

checkpointer.list() is a lazy iterator with a lock. Don't make database calls inside it. Read everything you need from the checkpoint objects it yields.

LangGraph's checkpointer tables are just SQL tables. You can query them directly, join them, delete from them. Don't treat the checkpointer as a black box.


What's Next

In the next post, I'll cover adding tool augmentation to Narad — web search, a calculator, and real-time stock data — and the specific bugs that only surface when you add a tool loop to a LangGraph graph.

Narad is live at narad-chat.onrender.com. The source is on GitHub.

15 views