treerag

A lightweight structural RAG library — index documents as hierarchical trees, query them with any LangChain LLM. No vector database required.

Most RAG systems split documents into fixed-size chunks and find relevant chunks through embedding similarity search. This works for many cases, but it loses the natural structure of documents — context gets cut mid-paragraph, related sections get separated, and the retrieval step has no awareness of how the document is actually organized.

treerag takes a different approach. It preserves the document's structure by building a hierarchical tree from its headers, generates a summary and entity tags for each section, and uses an LLM to navigate that tree outline when answering questions — exactly the way a human expert would scan a table of contents before diving into the relevant section.

Python ≥ 3.10 MIT OpenAI Anthropic Gemini Ollama

pip install "treerag[openai]"

      
      from treerag import index_document, make_summarizer, ask, make_retriever
from langchain_openai import ChatOpenAI

llm        = ChatOpenAI(model="gpt-4o")
search_llm = ChatOpenAI(model="gpt-4o-mini")

# Index once
doc = index_document("my_doc.pdf", summarizer=make_summarizer(search_llm))

# Ask
result = ask("What does this cover?", doc, make_retriever(llm=llm, search_llm=search_llm))
print(result.content)
print(result.references)
    
      for chunk in ask("What does this cover?", doc, retriever, stream=True):
    if isinstance(chunk, dict):
        refs = chunk["__references__"]
    else:
        print(chunk, end="", flush=True)
    
      from treerag import aask, make_async_retriever

retriever = make_async_retriever(llm)
result = await aask("What does this cover?", doc, retriever)
print(result.content)

Start building

Quickstart

Index your first document and start asking questions in minutes

How it works

Understand the indexing and Q&A pipeline architecture

make_summarizer()

Generate accurate summaries and entity tags in parallel

Streaming

Stream answers token-by-token for real-time UI updates

Multi-document Q&A

Search across multiple indexed documents simultaneously

Production

Skip local storage and integrate with PostgreSQL, Redis, MongoDB

Installation

treerag requires Python 3.10 or higher. Install it with the LLM provider of your choice — each provider is an optional dependency so you only install what you need.

Requirements

treerag requires Python 3.10 or higher. The core library has no mandatory LLM dependencies — provider packages are installed as optional extras so you only pull in what you actually use.

With pip

Choose the provider that matches your LLM. If you're just getting started, openai is the recommended choice.

# OpenAI (recommended for most users)
pip install "treerag[openai]"

# Anthropic Claude
pip install "treerag[anthropic]"

# Google Gemini
pip install "treerag[gemini]"

# Ollama — run models locally for free
pip install "treerag[ollama]"

# Install support for all providers at once
pip install "treerag[all]"

With uv

If you're using uv as your package manager, use uv add instead.

uv add "treerag[openai]"

Base install (no LLM provider)

You can install treerag without any LLM provider if you want to use the indexing pipeline standalone or bring your own LangChain model separately.

pip install treerag

Setting up your API key

treerag reads your API key from environment variables. The easiest way is to create a .env file in your project root and load it with python-dotenv.

# .env
OPENAI_API_KEY=sk-...

# For Anthropic
ANTHROPIC_API_KEY=sk-ant-...

# For Google Gemini
GOOGLE_API_KEY=...

from dotenv import load_dotenv
load_dotenv()  # call this before creating any LLM instances

Quickstart

Get up and running in under 5 minutes. You'll index a document, ask a question, and see references to the exact sections that were used to answer it.

1. Index your document

Indexing is a one-time operation per document. It reads the file, splits it into sections by header depth, calls your LLM to generate a summary and entity tags for each section in parallel, then saves the result to a local registry file. The whole process typically takes 10–60 seconds depending on document size and LLM speed., splits it into sections, generates a summary and entity tags for each section using an LLM, then saves everything to a local registry file. This only needs to happen once per document.

We recommend using a cheaper model like gpt-4o-mini for summarization — it's fast, accurate, and costs a fraction of the price.

from treerag import index_document, make_summarizer
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()

# Use a cheaper model for summarization
search_llm = ChatOpenAI(model="gpt-4o-mini")

doc = index_document(
    "my_doc.pdf",
    summarizer=make_summarizer(search_llm)
)
# Saved to indexed_docs.json automatically
# doc["doc_id"]  ← save this UUID for later

2. Ask a question

Asking a question triggers two LLM calls: first the search LLM reads the tree outline to identify which sections are relevant, then the answer LLM reads the full text of those sections and generates a response. The returned object is a standard LangChain AIMessage with an extra .references attribute attached. and call ask(). The retriever first searches the tree outline to find relevant sections, then generates an answer from the full section text. By default it returns a LangChain AIMessage with a .references attribute showing exactly which sections were used.

from treerag import ask, make_retriever

# Use a smarter model for final answers
llm = ChatOpenAI(model="gpt-4o")
retriever = make_retriever(llm=llm, search_llm=search_llm)

result = ask("What does this document cover?", doc, retriever)

print(result.content)            # the answer text
print(result.references)         # [{node_id, title, path, file_name}]
print(result.response_metadata)  # token usage, model name, finish reason

The document is saved to indexed_docs.json automatically. On every subsequent run you can skip index_document() and just load the saved doc with get_document_by_id() — which is instant.

3. Load on subsequent runs

Once a document is indexed, its tree structure and section content are saved to indexed_docs.json. On every subsequent run you can skip indexing entirely and load the saved document directly. Loading is instant — it's just a dictionary lookup by UUID with no LLM calls., you never need to re-index unless the document changes. Load the saved document by its UUID for O(1) lookup, or browse all indexed docs with list_documents().

from treerag import list_documents, get_document_by_id

# See what's in the registry
for d in list_documents():
    print(d["name"], "→", d["doc_id"], f"({d['total_sections']} sections)")

# Load by UUID (instant — no re-indexing)
doc = get_document_by_id("your-uuid-here")
result = ask("Your question", doc, retriever)

How it works

treerag uses a two-stage pipeline. The first stage indexes your document into a hierarchical tree. The second stage uses that tree to answer questions — without any vector database or embedding model.

Why a tree instead of chunks?

Traditional RAG systems split documents into fixed-size text chunks — typically 500–1000 tokens — and retrieve relevant chunks using embedding similarity search. This approach has a fundamental limitation: chunking ignores document structure. A chunk might span two unrelated topics. A key answer might be split across a chunk boundary. The retriever has no idea whether section A is a child of section B or completely unrelated to it.

treerag indexes documents as they are actually written — as hierarchical sections with parent-child relationships. The structure becomes searchable metadata, not lost noise. splits documents into fixed-size chunks and retrieves them by embedding similarity. This breaks the natural structure of documents — a chunk might span two unrelated topics, or a key section might be split across two chunks.

treerag preserves the document's natural hierarchy. Sections stay together, their parent-child relationships are maintained, and the LLM navigates the tree structure to find relevant content — just like a human expert scanning a table of contents.

Indexing pipeline

When you call index_document(), six steps run in sequence. The most expensive step is summarization, which makes one LLM API call per section. For a 30-section document with max_workers=10, all 30 calls fire in parallel and typically complete in 5–10 seconds total.

↓read_file()reads .txt .md .pdf .docx or scrapes a URL

↓parse_sections()splits text by markdown headers h1–h6 into flat section list

↓make_summarizer(llm)parallel LLM calls — one per section, generates summary + entity tags

↓build_hierarchy()assembles parent → child tree from header depth levels

↓flatten_tree()creates a {node_id: content} dict for fast lookup by ID

↓save_registry()persists the indexed document to indexed_docs.json, keyed by UUID

Q&A pipeline

Each call to ask() makes exactly two LLM calls — one for tree search and one for answer generation. The tree search step only reads a compact text outline (not the full document), so it completes quickly and cheaply. The answer generation step receives only the relevant sections, not the entire document, keeping context concise and costs low.

↓user query

↓tree searchsearch LLM reads tree outline + tags, returns a list of relevant node IDs

↓fetch contentfull section text retrieved from flat_nodes by node ID — instant, no re-reading the file

↓answer generationanswer LLM reads the fetched sections and generates a grounded response

↓AIMessagereturned with .references showing exactly which sections were used

The tree outline

The outline is a compact indented text representation of the entire document structure. It is what the search LLM reads when deciding which sections to retrieve. Each line contains the section ID (used to fetch content), the title, a short summary, and a list of entity tags. The combination of summary and tags ensures that both topical and entity-based queries find the right sections. of the full document structure. It's what the search LLM reads to decide which sections are relevant. Each entry includes the section's ID, title, summary, and entity tags.

- [0000] Introduction — Overview of the library and its core concepts | Tags: rag, tree, indexing, vectorless
  - [0001] Installation — Steps to install via pip or uv | Tags: pip, uv, Python, 3.10
    - [0002] Requirements — Minimum Python version and dependencies | Tags: Python, version, langchain
  - [0003] Usage — How to index and query documents | Tags: index_document, ask, make_retriever
- [0004] Advanced — Configuration and customisation options | Tags: prompts, providers, production

Entity tags are critical for accurate search. Without them, a question like "who was Ghulam Sarwar?" would fail because the summary says "Discussion of conspiracy" — the name only appears deep in the section text. Tags force the LLM to read the entire section before generating, so no entity is missed.

index_document()

The main entry point for indexing. Reads a file or URL, runs the full indexing pipeline, and saves the result to the local registry. Returns the indexed document dict which can be passed directly to ask().

You typically call index_document() once per document. The result is saved to the local registry automatically so you can reload it instantly on future runs using get_document_by_id() without any LLM calls. If your document changes, pass overwrite=True to re-index and replace the old entry. to indexed_docs.json so you can reload it instantly on future runs without re-indexing.

index_document(
    file_path: str,
    summarizer: SummarizerFn = null_summarizer,
    registry_path: str = "indexed_docs.json",
    overwrite: bool = False,
    persist: bool = True,
    verbose: bool = True,
) -> dict

Parameter	Default	Description
`file_path`	—	Path to file (.txt .md .pdf .docx) or a URL starting with http/https
`summarizer`	null_summarizer	Function that generates summaries and entity tags per section. Use `make_summarizer(llm)` for AI-powered summaries.
`registry_path`	"indexed_docs.json"	Path where the local registry is stored. Change this to use a different file or location.
`overwrite`	False	If True, re-indexes the document even if it already exists in the registry. Useful when the document has been updated.
`persist`	True	If False, skips saving to the local registry. Use this in production when you manage storage yourself (PostgreSQL, Redis, etc).
`verbose`	True	Prints step-by-step progress to stdout. Set False for cleaner output in production.

Return value

Returns a dict containing everything needed to ask questions. The key fields are tree_outline (used by the search LLM) and flat_nodes (used to fetch full section text).

{
  "doc_id":         "3f7a1c2d-9b4e-...",  # UUID — use this to reload later
  "file_name":      "my_doc.pdf",
  "file_path":      "/path/to/my_doc.pdf",
  "indexed_at":     "2026-04-10 12:00:00",
  "total_sections": 28,
  "tree_outline":   "...",  # indented outline with summaries + tags
  "flat_nodes":     {...},  # {node_id: {title, text, summary, tags, path}}
  "tree":           [...],  # full nested tree structure
}

Examples

# PDF with AI summaries and entity tags
doc = index_document("report.pdf", summarizer=make_summarizer(llm))

# Index a URL (scrapes the page and parses HTML headings as sections)
doc = index_document("https://docs.example.com/guide", summarizer=make_summarizer(llm))

# No LLM — fast, zero cost, less accurate for entity search
doc = index_document("notes.md")

# Re-index after the document has changed
doc = index_document("report.pdf", overwrite=True)

# Production — don't save to local file, handle storage yourself
doc = index_document("report.pdf", persist=False)
redis.set(doc["doc_id"], json.dumps(doc))

Supported formats

treerag reads five types of input — four file formats and any URL. All inputs are converted to a common structure before parsing, so the same tree-building logic works for all of them.

Format	Extensions	How it is parsed
Markdown	`.md` `.markdown`	Headers are used directly — `#` becomes depth 1, `##` depth 2, and so on. This is the richest format because the structure is already explicit.
Plain text	`.txt`	If no markdown headers are found, the text is chunked by paragraphs (~500 words per section). Section titles are auto-generated as "Section 1", "Section 2", etc.
PDF	`.pdf`	Text is extracted page by page via PyMuPDF. Each page is labelled with a `--- PAGE N ---` marker so sections can be traced back to specific pages.
Word	`.docx`	Paragraphs extracted via python-docx. Heading styles (Heading 1, Heading 2) are mapped to corresponding markdown header depths.
URL	`http://` `https://`	The page is scraped via requests + BeautifulSoup. HTML `h1`–`h6` tags are converted to `#`–`######` automatically before parsing.

All five input types produce the same output — a raw text string with markdown-style headers. This string is passed to parse_sections(), which splits it into a flat list of sections. The same tree-building, summarization, and Q&A logic applies regardless of the original file format.

URLs: JavaScript-rendered pages (React/Vue/Angular SPAs) are not supported because the scraper only sees the pre-render HTML. For JS-heavy sites, use crawl4ai or Playwright to extract the rendered content first, then pass the extracted markdown string to parse_sections() directly.

Using pipeline steps directly

You can call each step in the indexing pipeline individually if you need more control — for example to parse text you have already extracted, or to inspect the section structure before running a full index.

from treerag import read_file, parse_sections, build_hierarchy, flatten_tree

text     = read_file("my_doc.md")
sections = parse_sections(text)
tree     = build_hierarchy(sections)
flat     = flatten_tree(tree)

# Inspect any section
print(flat["0010"]["text"])     # full section text
print(flat["0010"]["tags"])     # entity tags list
print(flat["0010"]["path"])     # breadcrumb: "Chapter 2 > Section 11"
print(flat["0010"]["summary"])  # AI-generated summary

make_summarizer()

Creates a summarizer that generates a short summary and extracts all named entity tags for each section. Sections are processed in parallel — one dedicated LLM call per section — giving the model focused attention on each piece of content for maximum accuracy.

The reason treerag processes sections one at a time rather than bundling them is accuracy. When all sections go into a single large prompt, the LLM often misattributes tags across sections or generates generic summaries. A focused per-section call produces significantly more reliable entity extraction, especially for long documents with many named people, places, and organizations.

make_summarizer(
    llm,
    system_prompt: str = DEFAULT_SYSTEM_PROMPT,
    user_prompt_template: str = DEFAULT_USER_PROMPT,
    max_workers: int = 10,
) -> SummarizerFn

Parameter	Description
`llm`	Any LangChain BaseChatModel
`system_prompt`	Override for domain-specific summarization
`user_prompt_template`	Must contain `{node_id}`, `{title}`, `{text}`
`max_workers`	Parallel threads. Reduce to 3–5 if rate limited

Use gpt-4o-mini for summarization and gpt-4o for answers — cuts indexing cost by ~10x.

Domain-specific prompts

# Historical documents
summarizer = make_summarizer(llm, system_prompt=
    "You are a historian. Extract ALL named people, places, organizations "
    "and events. Read the entire text — every name matters."
)

# Legal documents
summarizer = make_summarizer(llm, system_prompt=
    "You are a legal expert. Extract all parties, clauses, "
    "obligations and key terms precisely."
)

# Rate limit control
summarizer = make_summarizer(llm, max_workers=3)

ask()

The main Q&A function. Runs a two-step pipeline: first the search LLM reads the tree outline to identify which sections are relevant, then the answer LLM reads those sections and generates a grounded response. Returns a LangChain AIMessage with a .references list showing the exact sections that were used.

You can control the response format with return_raw (default True for raw AIMessage, False for plain dict) and enable live streaming with stream=True. Use extra_context to pass question-specific hints that help the search LLM navigate to the right sections.

ask(
    query: str,
    document: dict,
    retriever: RetrieverFn,
    extra_context: str = "",
    stream: bool = False,
    return_raw: bool = True,
    verbose: bool = False,
) -> AIMessage | dict | Generator

Response modes

      
      
      
    
result = ask("Who was Ghulam Sarwar?", doc, retriever)

result.content            # answer text
result.references         # [{node_id, title, path, file_name}, ...]
result.response_metadata  # {token_usage: {...}, model_name: "gpt-4o"}
result = ask("What is this?", doc, retriever, return_raw=False)
result["answer"]
result["references"]
result = ask(
    "Who was Ghulam Sarwar?",
    doc, retriever,
    extra_context="Historical document about Gandhi's assassination trial."
)

make_retriever()

Creates a sync retriever function that powers the two-step Q&A pipeline: tree navigation to find relevant sections, followed by answer generation from the retrieved content.

The retriever accepts two different LLM instances — a cheaper one for tree search and a smarter one for answer generation. The tree search step is simple (read an outline, return node IDs) so a fast cheap model is ideal. The answer generation step requires reasoning, so a more capable model produces better results there.

make_retriever(
    llm,
    search_llm = None,
    search_system_prompt: str = DEFAULT,
    search_user_prompt_template: str = DEFAULT,
    answer_system_prompt: str = DEFAULT,
) -> RetrieverFn

Cost-optimized setup

retriever = make_retriever(
    llm=ChatOpenAI(model="gpt-4o"),           # answers
    search_llm=ChatOpenAI(model="gpt-4o-mini"),  # tree search (cheap)
)

Custom answer prompt

retriever = make_retriever(
    llm,
    answer_system_prompt="""You are a legal assistant.
Answer using ONLY the provided context.
Always respond in markdown. Cite section names."""
)

Streaming

Stream answer tokens as they are generated rather than waiting for the complete response. Useful for building chat interfaces where you want the answer to appear incrementally, giving users immediate feedback instead of a blank screen followed by a full answer.

When stream=True, ask() returns a generator. Each item is either a string chunk (part of the answer text) or a dict containing the section references. The reference dict is always the very last item yielded.

for chunk in ask("What is this?", doc, retriever, stream=True):
    if isinstance(chunk, dict):
        refs = chunk["__references__"]
    else:
        print(chunk, end="", flush=True)

Streamlit

stream = ask(question, doc, retriever, stream=True, return_raw=False)
placeholder = st.empty()
full_answer = ""

for chunk in stream:
    if isinstance(chunk, dict):
        references = chunk["__references__"]
    else:
        full_answer += chunk
        placeholder.markdown(full_answer + "▌")

placeholder.markdown(full_answer)

Async

treerag provides full async support through aask(), aask_multi(), and make_async_retriever(). The async API is identical to the sync API — the only differences are using make_async_retriever() and awaiting the calls.

Use async when building FastAPI endpoints, LangGraph pipelines, or any application where you need to handle multiple concurrent queries without blocking. The async retriever uses ainvoke() and astream() internally for both the search and answer generation steps.

from treerag import aask, make_async_retriever

retriever = make_async_retriever(llm)
result = await aask("What is this?", doc, retriever)
print(result.content)

# Streaming
async for chunk in await aask("What is this?", doc, retriever, stream=True):
    if not isinstance(chunk, dict):
        print(chunk, end="", flush=True)

FastAPI SSE

@app.get("/ask")
async def ask_endpoint(q: str, doc_id: str):
    doc = get_document_by_id(doc_id)

    async def generate():
        async for chunk in await aask(q, doc, retriever, stream=True):
            if not isinstance(chunk, dict):
                yield chunk

    return StreamingResponse(generate(), media_type="text/plain")

Multi-document Q&A

Query multiple indexed documents simultaneously and receive a single unified answer. Each reference in the response includes the source file_name so you always know which document each piece of information came from.

ask_multi() queries each document independently to find relevant sections, merges all the results, then runs a single answer generation call over the combined context. This means the final answer can draw from any combination of the provided documents.

from treerag import ask_multi

doc1 = get_document_by_id("uuid-1")
doc2 = get_document_by_id("uuid-2")

result = ask_multi(
    "What are the key findings?",
    [doc1, doc2],
    retriever
)
print(result.content)
print(result.references)  # includes file_name per section

None entries in the list are silently skipped — safe to pass results from get_document_by_id() that may return None.

When to use multi-document

Use ask_multi() when you have a collection of related documents — multiple chapters of a book, quarterly reports, a set of specification docs, or a support knowledge base. For a single focused document, ask() is faster and cheaper since it skips the per-document query overhead.

Async version

Use aask_multi() with make_async_retriever() for async multi-document queries in FastAPI or LangGraph pipelines. The interface is identical to ask_multi() but all internal LLM calls use ainvoke().

from treerag import aask_multi

result = await aask_multi(
    "What changed between v1 and v2?",
    [doc1, doc2], make_async_retriever(llm)
)

Custom prompts

Both the summarizer and retriever accept custom system prompts. Tailoring prompts to your document type is one of the most impactful improvements you can make — it directly improves the quality of entity tags at index time, which in turn improves search accuracy at query time.

The default prompts are intentionally generic. For specialized document types — historical biographies, legal contracts, medical records, financial reports — a domain-specific prompt will produce dramatically better results with the same underlying model.

Summarizer prompt

# Historical
summarizer = make_summarizer(llm, system_prompt=
    "You are a historian. Extract ALL named people, places, "
    "organizations and events. Every name matters."
)

# Medical
summarizer = make_summarizer(llm, system_prompt=
    "You are a medical expert. Extract diagnoses, treatments, "
    "medications and dosages."
)

Retriever answer prompt

retriever = make_retriever(
    llm,
    answer_system_prompt="""Answer using ONLY the provided context.
Respond in markdown. Cite section names for each claim."""
)

Per-question context

result = ask(
    "What were the key provisions?",
    doc, retriever,
    extra_context="1947 Indian partition document. Focus on Bengal clauses."
)

Providers

treerag is provider-agnostic — it works with any LangChain-compatible chat model. You can use different providers for different steps of the pipeline, for example a local Ollama model for summarization (free) and a hosted model for final answer generation (better quality).

All providers are installed as optional extras so you only pull in what you need. The same API works identically across all providers — switching from OpenAI to Anthropic is a single line change in your LLM instantiation.

For local development and testing, Ollama lets you run models like Llama 3 or Mistral on your own machine with no API key and no cost. For production, a hosted provider typically gives better quality and reliability.

OpenAI

from langchain_openai import ChatOpenAI

retriever = make_retriever(
    llm=ChatOpenAI(model="gpt-4o"),
    search_llm=ChatOpenAI(
        model="gpt-4o-mini"),
)

Anthropic

from langchain_anthropic import ChatAnthropic

retriever = make_retriever(
    ChatAnthropic(
        model="claude-haiku-4-5-20251001"
    )
)

Google Gemini

from langchain_google_genai import \
    ChatGoogleGenerativeAI

retriever = make_retriever(
    ChatGoogleGenerativeAI(
        model="gemini-2.0-flash")
)

Ollama — local, free

from langchain_ollama import ChatOllama

# No API key required
retriever = make_retriever(
    ChatOllama(model="llama3")
)

Production

By default treerag saves indexed documents to a local JSON file. In production you will typically want to store documents in a proper database or cache. Use persist=False to skip local storage and manage the document dict yourself.

The indexed document dict is plain JSON-serializable Python — no special objects or dependencies. Store it in PostgreSQL as a JSONB column, Redis as a string, MongoDB as a document, or anywhere else that accepts JSON. When you need to answer a question, just load the dict and pass it directly to ask().

# Index without saving locally
doc = index_document("my_doc.pdf", persist=False)

# Store in your system
db.execute("INSERT INTO docs VALUES (?, ?)", doc["doc_id"], json.dumps(doc))
redis.set(doc["doc_id"], json.dumps(doc))

# Load and ask
doc = json.loads(db.get("uuid"))
result = ask("Your question", doc, retriever)

When to use persist=False

Use persist=False when you are building a multi-user application where different users index different documents, when you need to store documents in an existing database alongside other application data, or when you want full control over document lifecycle — expiry, versioning, access control. The local registry has no concept of users or permissions, so for any production application with more than one user, managing storage yourself is the right approach.

ask() only needs tree_outline, flat_nodes, and file_name. Strip everything else before storing to reduce size.

Registry

The local registry is a JSON file that stores all indexed documents. Each document is keyed by a UUID generated at index time — not the filename. This means you can rename or move the original file without affecting your ability to reload the indexed document.

The registry is designed for development and single-user applications. For production systems with multiple users or large document collections, use persist=False and manage storage in your own database where you have control over access patterns and scaling.

from treerag import list_documents, get_document, get_document_by_id, delete_document

# List all
for d in list_documents():
    print(d["name"], d["doc_id"], d["total_sections"])

# Load by UUID — O(1)
doc = get_document_by_id("3f7a1c2d-9b4e-4f8a-b2d1-6e5c3a9f0e12")

# Load by file name — O(n)
doc = get_document("my_doc.pdf")

# Delete
delete_document("3f7a1c2d-9b4e-4f8a-b2d1-6e5c3a9f0e12")

Always prefer get_document_by_id() over get_document(). UUID lookup is a direct dictionary key access (O(1)). Filename lookup iterates over all registry values (O(n)) to find a matching file_name field.

Custom registry path

By default the registry is written to indexed_docs.json in the current working directory. You can change this per call by passing registry_path. This is useful for keeping separate registries for different projects or environments.

# Use a project-specific registry
doc = index_document("my_doc.md", registry_path="/data/project_a/registry.json")

# List from a custom path
docs = list_documents(registry_path="/data/project_a/registry.json")

API reference

Complete list of all public functions and their signatures. For detailed usage examples and explanations, see the individual documentation pages.

All functions are importable directly from treerag:

from treerag import (
    index_document, make_summarizer, null_summarizer,
    ask, aask, ask_multi, aask_multi,
    make_retriever, make_async_retriever,
    read_file, parse_sections, build_hierarchy,
    flatten_tree, build_tree_outline,
    list_documents, get_document, get_document_by_id, delete_document,
    load_registry, save_registry,
)

Indexing

Functions for reading, parsing, and indexing documents. index_document() is the main entry point that orchestrates the full pipeline. The individual functions below it let you use each step in isolation.

Function	Description
`index_document(file_path, ...)`	Index a file or URL
`make_summarizer(llm, ...)`	Parallel summarizer with entity tags
`null_summarizer`	No-op — skips LLM, good for testing
`read_file(file_path)`	Read file or URL to raw string
`parse_sections(text)`	Parse text into flat section list
`build_hierarchy(sections)`	Build nested tree
`flatten_tree(tree)`	Flatten to `{node_id: content}`
`build_tree_outline(tree)`	Build outline string with tags

Q&A

Functions for querying indexed documents. Use ask() / aask() for a single document and ask_multi() / aask_multi() to search across multiple documents simultaneously. Always create the retriever once and reuse it across multiple ask() calls.

Function	Description
`ask(query, doc, retriever, ...)`	Sync Q&A — single document
`aask(query, doc, retriever, ...)`	Async Q&A — single document
`ask_multi(query, docs, retriever, ...)`	Sync Q&A — multiple documents
`aask_multi(query, docs, retriever, ...)`	Async Q&A — multiple documents
`make_retriever(llm, ...)`	Create sync retriever
`make_async_retriever(llm, ...)`	Create async retriever

Registry

Functions for managing the local indexed_docs.json registry. Use get_document_by_id() for fast O(1) lookup by UUID. The lower-level load_registry() and save_registry() functions give you direct access to the registry dict if you need to migrate data or build tooling on top of it.

Function	Description
`list_documents()`	List all indexed documents
`get_document_by_id(doc_id)`	Get by UUID — O(1)
`get_document(file_name)`	Get by file name — O(n)
`delete_document(doc_id)`	Delete by UUID
`load_registry(path?)`	Load registry JSON from disk
`save_registry(registry, path?)`	Save registry JSON to disk