Supercharging RAG: Building an Agentic Analyst with Nanonets DocStrange & LangGraph

Leverage Nanonets DocStrange API for high-quality PDF-to-Markdown conversion and LangGraph for an intelligent, self-correcting agentic RAG pipeline. Build advanced Q&A systems that truly understand your complex documents.

Looking for the UI? Try docstrange.nanonets.com.
API endpoint: https://extraction-api.nanonets.com/extract

Retrieval-Augmented Generation (RAG) has become the standard for chatting with documents. But for those building RAG pipelines for real-world applications, a stark reality emerges: standard RAG often fails catastrophically on complex, unstructured documents.

We've identified two critical bottlenecks that hinder advanced RAG implementations:

The Bottlenecks:

Poor Data Quality from Ingestion: Complex PDFs (financial reports with nested tables, technical manuals, legal contracts) are often flattened into unreadable text by traditional OCR and PDF loaders. This "garbage in" leads directly to "garbage out" in your RAG system, as the LLM lacks the structured context to understand the document's true meaning.
Dumb RAG Pipelines: Most RAG pipelines are linear: Query -> Retrieve -> Generate. They lack the intelligence to assess retrieval quality, adapt to challenging queries, or self-correct when initial attempts fail. This leads to brittle systems that struggle with nuanced or multi-step questions.

Our Agentic Solution:

To overcome these, we will build an Agentic RAG Analyst that doesn't just "chat" but actively reasons, plans, and self-corrects. Our solution addresses each bottleneck head-on:

For Data Quality: We’ll leverage Nanonets DocStrange to transform complex PDFs into clean, LLM-optimized Markdown. This preserves critical structural elements like tables and headers, ensuring our agent receives high-fidelity data.
For Intelligent Pipelines: We’ll use LangGraph to construct a "Self-RAG" agent. This agent will intelligently grade its own retrieval quality and, if necessary, re-strategize by transforming the query or attempting alternative retrieval methods before generating an answer.

The Stack

Ingestion: Nanonets DocStrange API (for flawless PDF-to-Markdown conversion)
Vector Store: ChromaDB (for robust document indexing and retrieval)
Embeddings: HuggingFace Embeddings (for open-source embedding models)
LLMs: OpenAI Chat Models (GPT-4.1 for grading, generation, and query transformation)
Framework: LangChain (for retrievers, chains, and prompt templating)
Orchestration: LangGraph (for stateful, cyclical agentic flows)

Part 1: The Foundation – Perfect Data with DocStrange

An agent is only as good as its observations. Standard OCR tools flatten documents, destroying the structural context an agent needs to understand that a specific row of numbers belongs to "Q3 2023 Revenue" and not "Q4 2022 Expenses."

Markdown is the ideal format for RAG because it natively preserves this hierarchy (headers, lists, tables) in a way that both embedding models and LLMs inherently understand.

We'll use Nanonets DocStrange to handle the heavy lifting of converting a messy PDF into clean Markdown.

Step 0: Install Dependencies

First, ensure all necessary Python libraries are installed. You can do this by running the following commands in your terminal or within your development environment:

pip install langchain_community chromadb langchain_openai langchain-huggingface langchain-chroma langgraph requests

Step 1: Document Ingestion and Markdown Conversion with Nanonets DocStrange

This block demonstrates how to use the Nanonets DocStrange API to convert a PDF document into LLM-ready Markdown.

Key aspects:

API_KEY: Replace "YOUR_DOCSTRANGE_API_KEY" with your actual key obtained from the Nanonets dashboard.
url: https://extraction-api.nanonets.com/extract

import requests
import os # Added for better path handling if needed later

API_KEY = "YOUR_DOCSTRANGE_API_KEY"   # replace with your actual key, can get from https://docstrange.nanonets.com/ -> menu -> Api documentation
url = "https://extraction-api.nanonets.com/extract" # Verify this is the correct endpoint for LLM-ready Markdown

# Ensure 'samples/annual_report.pdf' exists or create a dummy file for testing.
# In a production setup, you'd manage file paths and potentially retrieve documents from storage.
try:
    with open("samples/annual_report.pdf", "rb") as f:
        files = {"file": f}
        data = {"output_type": "markdown"}  # options: markdown, json, csv, html

        headers = {"Authorization": f"Bearer {API_KEY}"}

        resp = requests.post(url, headers=headers, files=files, data=data, timeout=300)
        resp.raise_for_status() # Raise an exception for HTTP errors (4xx or 5xx)

        raw_markdown_data = resp.json()["content"]
        print("✅ Successfully extracted Markdown from PDF.")
except FileNotFoundError:
    print("Error: 'samples/annual_report.pdf' not found. Please ensure the file exists.")
    raw_markdown_data = "" # Initialize empty to prevent downstream errors
except requests.exceptions.RequestException as e:
    print(f"Error during DocStrange API call: {e}")
    raw_markdown_data = ""
except KeyError:
    print(f"Error: 'content' key not found in DocStrange response. Response: {resp.json()}")
    raw_markdown_data = ""

Step 2: Structure-Aware Splitting and Chunking

After obtaining the raw Markdown, we need to split it into manageable chunks suitable for embedding and retrieval. Instead of naive character splitting, we leverage Markdown's inherent structure.

Key aspects:

MarkdownHeaderTextSplitter: This LangChain splitter understands Markdown headers (#, ##, etc.) and splits the document based on these semantic boundaries. It also adds metadata to each chunk indicating its parent headers, preserving context.
RecursiveCharacterTextSplitter: After the initial header-based split, some sections might still be too large. This splitter recursively breaks them down further, trying to keep semantically related text together.
chunk_size / chunk_overlap: These parameters control the size of your document chunks and how much they overlap, crucial for balancing retrieval precision and context retention.

from langchain_text_splitters import MarkdownHeaderTextSplitter, RecursiveCharacterTextSplitter
from langchain_core.documents import Document # Ensure Document is imported for final_splits

if raw_markdown_data:
    # 1. Split by logical sections first using Markdown headers
    headers_to_split_on = [
        ("#", "Header 1"),
        ("##", "Header 2"),
        ("###", "Header 3"),
    ]

    markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
    md_header_splits = markdown_splitter.split_text(raw_markdown_data)
    print(f"Split into {len(md_header_splits)} chunks based on Markdown headers.")

    # 2. Further split large sections if necessary, keeping structure context
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1024, # Maximum size of each text chunk
        chunk_overlap=128 # Overlap between chunks to maintain context
    )
    final_splits = text_splitter.split_documents(md_header_splits)
    print(f"Further split into {len(final_splits)} final document chunks.")
else:
    final_splits = []
    print("No markdown data available; skipping document splitting.")

# For production readiness:
# - Experiment with `chunk_size` and `chunk_overlap` to optimize for your specific document types and query patterns.
# - Consider different splitting strategies for highly specialized documents (e.g., code, legal contracts).

Step 3: Embeddings and Vector Store Initialization

Once we have our well-structured text chunks, we convert them into numerical vectors (embeddings) and store them in a vector database. We're using HuggingFace embeddings for flexibility and ChromaDB for its ease of use and persistence.

Key aspects:

HuggingFaceEmbeddings: Loads a pre-trained embedding model (sentence-transformers/all-MiniLM-L6-v2). This model converts text into dense vector representations.
Chroma.from_documents: Initializes ChromaDB by taking our final_splits (LangChain Document objects) and embedding them using the specified embeddings model.

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.documents import Document # Ensure Document is imported for final_splits

if final_splits:
    # Create embeddings using a HuggingFace model
    # "sentence-transformers/all-MiniLM-L6-v2" is a good balance of performance and speed.
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    print("✅ HuggingFace Embeddings model loaded.")

    # Define persistence for ChromaDB
    persist_dir = "indexes/annual_report_chroma"
    collection_name = "annual_report_2023"

    # Build Chroma vector index from the processed documents
    # If the directory exists and contains data, it will load it.
    # Otherwise, it will create a new one.
    vectorstore = Chroma.from_documents(
        documents=final_splits,
        embedding=embeddings,
        persist_directory=persist_dir,
        collection_name=collection_name,
    )

    print(f"✅ Built/Loaded Chroma index '{collection_name}' at '{persist_dir}' with {len(final_splits)} documents.")
else:
    vectorstore = None # Ensure vectorstore is None if no documents were processed
    print("No documents were processed; ChromaDB will not be initialized.")

# For production readiness:
# - Choose an embedding model that best fits your domain and performance requirements by checking the mteb leaderboad.
# - Implement versioning for your embedding models and vector indices.

Part 2: The Brain – Building the Agent with LangGraph

Standard RAG is a straight line: Query -> Retrieve -> Generate.

Agentic RAG is a loop: Query -> Plan -> Retrieve -> Grade -> (Maybe Re-Retrieve) -> Generate -> Verify.

We will implement a simplified Self-RAG architecture using LangGraph. Our agent will have the ability to "reflect" on whether the documents it found are actually relevant before trying to answer.

Step 4: Define Agent State

LangGraph agents operate on a shared state. This TypedDict defines the structure of the information that gets passed between different nodes in our graph.

Key aspects:

question: The user's current query. This might be rewritten by the agent.
generation: The LLM's generated answer.
documents: A list of Document objects retrieved from the vector store.

from typing import Annotated, List, Dict
from typing_extensions import TypedDict
from langchain_core.documents import Document

class AgentState(TypedDict):
    question: str
    generation: str
    documents: List[Document]
    # We can add more state here, like 'retry_count' for advanced flows

# For production readiness:
# - Expand the AgentState to include more complex state management (e.g., chat history, number of retrieval attempts, specific tool outputs).

Step 5: Define Agent Nodes (Retrieve, Grade, Generate, Transform)

These Python functions represent the "nodes" in our LangGraph. Each node takes the AgentState as input, performs an action (e.g., retrieve documents, grade them, generate an answer), and returns an updated AgentState.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel, Field
import os # Ensure os is imported for API key access

# Initialize our main LLM for grading, generation, and query transformation
# Replace "YOUR_OPENAI_API_KEY" with your actual OpenAI API key.
# It's highly recommended to use environment variables for API keys in production.
llm = ChatOpenAI(model="gpt-4.1", api_key=os.environ.get("OPENAI_API_KEY", "YOUR_OPENAI_API_KEY_HERE"), temperature=0)
if not os.environ.get("OPENAI_API_KEY") and llm.api_key == "YOUR_OPENAI_API_KEY_HERE":
    raise ValueError("OPENAI_API_KEY environment variable not set or placeholder key used for LLM.")

# --- Node: Retrieve Documents ---
def retrieve_node(state: AgentState):
    """
    Retrieve documents from the vector store based on the question.
    """
    print("---RETRIEVE DOCUMENTS---")
    question = state["question"]
    
    if vectorstore is None:
        print("Warning: Vector store not initialized. Cannot retrieve documents.")
        return {"documents": [], "question": question}

    # Use the initialized ChromaDB retriever
    retriever = vectorstore.as_retriever()
    documents = retriever.invoke(question)
    
    return {"documents": documents, "question": question}


# --- Node: Grade Documents ---
# Defines the expected structured output from the LLM for grading.
class GradeDocuments(BaseModel):
    """Binary score for relevance check on retrieved documents."""
    binary_score: str = Field(description="Documents are relevant to the question, 'yes' or 'no'")

# Configure the LLM to output in the `GradeDocuments` Pydantic format.
structured_llm_grader = llm.with_structured_output(GradeDocuments)

system_grader = """You are a grader assessing relevance of a retrieved document to a user question. \n
    If the document contains keyword(s) or semantic meaning related to the question, grade it as relevant. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""

grade_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_grader),
        ("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),
    ]
)

# Create the grading chain: prompt -> LLM (with structured output)
retrieval_grader = grade_prompt | structured_llm_grader

def grade_documents(state: AgentState):
    """
    Determines whether the retrieved documents are relevant to the question.
    If any document is not relevant, we will filter it out.
    """
    print("---CHECK DOCUMENT RELEVANCE---")
    question = state["question"]
    documents = state["documents"]

    filtered_docs = []
    for d in documents:
        # Invoke the LLM grader to score the document's relevance
        score = retrieval_grader.invoke({"question": question, "document": d.page_content})
        grade = score.binary_score
        if grade == "yes":
            print(f"---GRADE: DOCUMENT RELEVANT - Chunk starting with: {d.page_content[:50]}...")
            filtered_docs.append(d)
        else:
            print(f"---GRADE: DOCUMENT NOT RELEVANT - Chunk starting with: {d.page_content[:50]}...")
            continue
            
    return {"documents": filtered_docs, "question": question}


# --- Node: Generate Answer ---
system_generator = """You are an assistant for question-answering tasks. 
    Use the following retrieved context to answer the question. 
    If you don't know the answer, just say that you don't know. 
    Keep the answer concise and accurate.
    """
generate_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_generator),
        ("human", "Retrieved context: \n\n {documents} \n\n Question: {question}"),
    ]
)
# Create the generation chain: prompt -> LLM -> string output parser
generation_chain = generate_prompt | llm | StrOutputParser()

def generate_node(state: AgentState):
    """
    Generate an answer using the retrieved documents and the question.
    """
    print("---GENERATE ANSWER---")
    question = state["question"]
    documents = state["documents"]
    
    if not documents:
        print("No relevant documents for generation.")
        return {"generation": "I couldn't find enough relevant information to answer this question.", "documents": documents, "question": question}

    # Format documents for the prompt by concatenating their content
    docs_content = "\n\n".join([d.page_content for d in documents])
    
    generation = generation_chain.invoke({"documents": docs_content, "question": question})
    
    return {"generation": generation, "documents": documents, "question": question}


# --- Node: Transform Query ---
system_transform_query = """You are a query rewrite expert. Your goal is to rewrite a user's question 
    to be more effective for document retrieval, especially if initial retrieval failed. 
    Consider synonyms, broader or narrower terms, or breaking down complex queries.
    Return only the rewritten query, nothing else.
    """
transform_query_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_transform_query),
        ("human", "Original question: {question}"),
    ]
)
# Create the query transformation chain: prompt -> LLM -> string output parser
transform_query_chain = transform_query_prompt | llm | StrOutputParser()

def transform_query_node(state: AgentState):
    """
    Rewrite the user's original query to improve retrieval.
    """
    print("---TRANSFORM QUERY---")
    question = state["question"]
    
    rewritten_query = transform_query_chain.invoke({"question": question})
    print(f"---Rewritten Query: {rewritten_query}---")
    
    # Update the state with the new, transformed query
    return {"question": rewritten_query}

# For production readiness:
# - Fine-tune LLM prompts for each node (grader, generator, transformer) with specific examples for your domain.
# - Implement logging for each node's input, output, and decision path.
# - Add cost tracking for LLM calls.
# - Consider caching for LLM responses.
# - Implement more sophisticated error handling within nodes (e.g., fallbacks if an LLM call fails).

Step 6: Build the LangGraph Workflow

This section defines the flow of our agent using LangGraph. We're creating a state machine where each node represents a step, and conditional edges determine the path based on the agent's current state.

Key aspects:

StateGraph(AgentState): Initializes the graph, defining the shared state object.
workflow.add_node(...): Adds our previously defined functions (retrieve_node, grade_documents, etc.) as nodes to the graph.
workflow.set_entry_point("retrieve"): Specifies where the graph execution begins.
workflow.add_edge(...): Defines a direct transition from one node to another.
decide_to_generate(state): This is our conditional logic. If no relevant documents are found after grading, it directs the flow to transform_query. Otherwise, it proceeds to generate.
workflow.add_conditional_edges(...): Allows for dynamic routing based on the output of a function (our decide_to_generate).
workflow.add_edge("transform_query", "retrieve"): Creates a loop, allowing the agent to re-attempt retrieval with a refined query.
workflow.add_edge("generate", END): Marks the end of a successful execution path.
app = workflow.compile(): Compiles the graph into an executable agent.

from langgraph.graph import END, StateGraph

# Initialize the StateGraph with our defined AgentState
workflow = StateGraph(AgentState)

# Define Nodes: Associate our functions with symbolic names in the graph
workflow.add_node("retrieve", retrieve_node) # Standard vector store retrieval
workflow.add_node("grade_documents", grade_documents) # LLM-powered document relevance grading
workflow.add_node("generate", generate_node) # LLM-powered answer generation
workflow.add_node("transform_query", transform_query_node) # LLM-powered query rewriting

# Define Flow: Set the entry point and edges between nodes

# 1. Start by retrieving documents
workflow.set_entry_point("retrieve")

# 2. After retrieval, move to grading the documents
workflow.add_edge("retrieve", "grade_documents")

# 3. Conditional edge: Decide whether to generate an answer or transform the query
def decide_to_generate(state: AgentState):
    """
    This function acts as a conditional router.
    It checks if relevant documents were found after grading.
    """
    print("---DECIDE TO GENERATE---")
    if not state["documents"]:
        # If no relevant documents remain after grading, try to transform the query
        print("---DECISION: ALL DOCUMENTS IRRELEVANT, TRANSFORM QUERY---")
        return "transform_query"
    else:
        # If relevant documents are available, proceed to generate an answer
        print("---DECISION: GENERATE---")
        return "generate"

# Add the conditional edges based on the `decide_to_generate` function's return
workflow.add_conditional_edges(
    "grade_documents", # The node from which this conditional decision is made
    decide_to_generate, # The function that makes the decision
    {
        "transform_query": "transform_query", # If "transform_query" is returned, go to 'transform_query' node
        "generate": "generate",               # If "generate" is returned, go to 'generate' node
    },
)

# 4. If the query was transformed, loop back to retrieval with the new query
workflow.add_edge("transform_query", "retrieve")

# 5. If an answer was generated, end the graph execution
workflow.add_edge("generate", END)

# Compile the graph into an executable agent
app = workflow.compile()
print("✅ LangGraph agent compiled successfully!")

# For production readiness:
# - Implement more complex branching, including multiple retry attempts for query transformation, or escalating to human review.
# - Integrate more tools for the agent (e.g., a calculator tool for financial analysis, a search tool for external knowledge).

Step 7: Querying the Agent

Now that our agent is compiled, we can interact with it by invoking its app object with a question. The agent will then execute the defined workflow to provide an answer.

# Example query for your agent
question = "Summarize key revenue insights from the 2023 section." 

print(f"\n--- Starting Agent Query for: '{question}' ---")

# Invoke the agent with the initial question
final_state = app.invoke({"question": question})

print("\n--- Final Agent Output ---")
print(f"Original Question: {question}")
print(f"Final Question (may be transformed): {final_state['question']}")
print(f"Answer: {final_state['generation']}")

print("\n--- Testing a query that might need transformation ---")
# This second query tests the agent's ability to transform the query if initial retrieval fails
question_two = "What are the primary risks associated with the company's operations?"
print(f"\n--- Starting Agent Query for: '{question_two}' ---")
final_state_two = app.invoke({"question": question_two})

print("\n--- Second Query Final Agent Output ---")
print(f"Original Question: {question_two}")
print(f"Final Question (may be transformed): {final_state_two['question']}")
print(f"Answer: {final_state_two['generation']}")

# For production readiness:
# - Build a user-friendly interface to interact with the agent.
# - Integrate the agent into existing workflows (e.g., customer support, internal knowledge management).
# - Monitor agent performance (accuracy, latency, cost) and continuously improve prompts and models.
# - Implement human-in-the-loop mechanisms for edge cases or when the agent expresses uncertainty.

Why this wins

Imagine a user asks: "What was the Q3 adjusted EBITDA margin deviation compared to the previous fiscal year?"

Scenario A (Standard RAG):

The vector DB finds a paragraph mentioning "EBITDA" from Q1 and a table from Q4 because they superficially match the keywords. The LLM, forced to answer, hallucinates a deviation based on irrelevant data.

Scenario B (Our Agentic RAG):

Ingest: DocStrange perfectly captured the Q3 table as Markdown, preserving its structure.
Retrieve: We fetch standard chunks using our ChromaDB retriever.
Grade: The agent (an LLM in the grade_documents node) recognizes retrieved Q1 data is irrelevant to the specific "Q3" question and discards it.
Loop (if needed): If, after grading, no relevant documents remain, the agent invokes the transform_query_node to rewrite the query (e.g., to "Q3 fiscal year 2023 adjusted EBITDA table") and re-attempts retrieval. This self-correction mechanism is crucial.
Generate: Once truly relevant documents are retrieved and graded, the generate_node uses an LLM to produce the precise calculation or summary requested, based only on the high-quality, relevant context.

Conclusion

By combining Nanonets DocStrange's ability to accurately extract structured data from complex PDFs into LLM-friendly Markdown with LangGraph's power to orchestrate intelligent, self-correcting agent workflows, we've elevated RAG from a simple search tool to a reliable and robust research assistant. This agentic approach unlocks true understanding from complex document collections, driving accuracy, reducing hallucinations, and making your LLM applications genuinely intelligent.