Vectorless RAG for Agents: PageIndex Is Why Their Demo Works and Yours Needs "Context"

When I'm building a RAG apps, the thing that quietly kills quality (and budgets) is turning a document into a pile of "semantic confetti"…

Agent Native

~5 min read · January 31, 2026 (Updated: January 31, 2026) · Free: No

When I'm building RAG apps, the thing that quietly kills quality (and budgets) is turning a document into a pile of "semantic confetti", i.e. chunk everything, embed everything, then hope an ANN search surface the right bits.

If your document base is considerably large, retrieval can easily turn into semantic hide-and-seek across 1200K chunks, and you can spend thousands of tokens confidently hallucinating near the answer.

Approaches like PageIndex might be a better solution for your case and worth an experiment, hence the point of this article.

So instead of vectorizing chunks and doing ANN lookup, you can represent the document as a hierarchical tree, which is basically an LLM-optimized table of contents.

The model reasons its way down the tree (e.g. "we're in Risk Factors, then Liquidity, then Covenant breaches…") and pulls context from the exact branch it needs.

This will eliminate embedding drifts, "close enough" chunks and accidental detours into unrelated sections that just happen to share a few buzzwords.

Mafin 2.5 is a reasoning-based RAG system for financial document analysis, powered by PageIndex. It achieved a state-of-the-art 98.7% accuracy on the FinanceBench benchmark, significantly outperforming traditional vector-based RAG systems.

Disclaimer: I'm not affiliated with PageIndex in any capacity.

At a high level, here's how PageIndex works:

(1) Tree generation (indexing): PDF is converted into a hierarchy of nodes (sections/subsections), each with metadata like title, node_id, page_index, and associated text.

(2) Reasoning-based retrieval: LLM chooses which nodes to open next (tree search), optionally producing a rationale + a list of node_ids to consult.

This is important for applications, especially ones that have to behave in front of users, auditors, or myself at 2 am:

Precision without brute force: You are not paying to embed and re-embed every paragraph or to ship huge top-k results downstream.
Better UX and explainability: "I went to Management's Discussion and Analysis, and Results of Operations" is a story you can show users which is way more defensible than "the embedding returned these chunks."
Lower latency + lower token burn: Tree navigation can keep context tight. Tight context means faster responses and fewer "let me restate the entire document back to you" moments.
Structure-aware truthfulness: In complex financial docs like SEC filings, earnings releases, footnotes, where something appears is half the meaning. PageIndex can treat that structure as the retrieval primitive, not an afterthought.

I want you to give it a try and let me know if it was more performant in your case or not, this is truly promising approach.

Repository has a very simple structure:

run_pageindex.py: entry-point script to run indexing on a PDF (local usage).
pageindex/: the core library package used by the runner script.
cookbook/: example notebooks / demos (often the fastest way to understand intended workflows).
tutorials/: guided examples.
tests/: sample PDFs + expected/generated outputs (useful for regression checks).

It's also very easy to setup.

Make sure to grab the agentic SaaS patterns winning in 2026 First issue goes out tomorrow!

Let's continue!

Getting Started with PageIndex

(1) Create a virtualenv + install deps

python -m venv .venv
source .venv/bin/activate   # (Windows: .venv\Scripts\activate)
pip install --upgrade -r requirements.txt

(2) Create a .env file in the repo root with

CHATGPT_API_KEY=your_openai_key_here

(3) Run the entry script

python run_pageindex.py

You can customize the processing with additional optional arguments:

--model                 OpenAI model to use (default: gpt-4o-2024-11-20)
--toc-check-pages       Pages to check for table of contents (default: 20)
--max-pages-per-node    Max pages per node (default: 10)
--max-tokens-per-node   Max tokens per node (default: 20000)
--if-add-node-id        Add node ID (yes/no, default: yes)
--if-add-node-summary   Add node summary (yes/no, default: yes)
--if-add-doc-description Add doc description (yes/no, default: yes)

There is also a markdown support for PageIndex.

You can use the -md_path flag to generate a tree structure for a markdown file.

python3 run_pageindex.py --md_path /path/to/your/document.md

How the indexing + retrieval flow fits together

Here's the end-to-end flow you should design around:

(1) Indexing / tree generation

Input: PDF (current support is PDF-only)
Output: hierarchical tree of nodes (TOC-like).

(2) Retrieval (tree search)

Provide (query + tree structure)
Ask for JSON output containing "thinking" and "node_list"

This is the core mechanism for reasoning-based navigation.

(3) Answer synthesis

gather node text (and/or page-level content),
feed it into your final answer prompt.

Production integration options (SDK + HTTP APIs)

Even if you run local indexing, you may want the official SDK/API patterns for production workflows

(1) Python SDK (hosted service)

from pageindex import PageIndexClient
pi_client = PageIndexClient(api_key="YOUR_API_KEY")
result = pi_client.submit_document("./2023-annual-report.pdf")
doc_id = result["doc_id"]

Then:

check status: pi_client.get_document(doc_id)
fetch tree: pi_client.get_tree(doc_id) (supports node_summary)
OCR: pi_client.get_ocr(doc_id, format="page|node|raw")

(2) Direct REST endpoints (hosted service)

POST https://api.pageindex.ai/doc/: Upload PDF; returns doc_id.
GET https://api.pageindex.ai/doc/{doc_id}/?type=tree : Retrieve processing status and the tree; summary toggles node summaries.
GET https://api.pageindex.ai/doc/{doc_id}/?type=ocr&format=page|node|raw : Retrieve OCR results in different formats.
POST https://api.pageindex.ai/chat/completions: Supports messages, optional doc_id (string or array), optional stream.

Multi-document search patterns

By default, PageIndex focuses on reasoning-based RAG within a single document, and there are 3 recommended multi-doc workflows:

Search by Metadata (when docs are distinguishable by metadata)
Search by Semantics (when docs differ by topic/content)
Search by Description (lightweight approach for a small number of docs)

In practice, this often becomes:

Pick candidate doc(s) (metadata/semantic/description stage)
Run tree search inside the chosen doc(s)
Synthesize an answer

Here are other resources where you can dive deeper:

Cookbooks: hands-on, runnable examples and advanced use cases.
Tutorials: practical guides and strategies, including Document Search and Tree Search.
Blog: technical articles, research insights, and product updates.
MCP setup & API docs: integration details and configuration options.

Good luck in your experiements!

Bonus Articles

Local LLMs That Can Replace Claude Code

Small team of engineers can easily burn >$2K/mo on Anthropic’s Claude Code (Sonnet/Opus 4.5). As budgets are tight, you…

medium.com

Fully Local Agentic Coding on localhost: Claude Code, Codex, llama.cpp, Ollama and Docker MR

This is the setup I wish I had the first time I tried to run Claude Code and Codex-style workflows against local…

medium.com

Kimi K2.5 + Agent Swarms Beat US AI Labs: Open Source vs OpenAI, Anthropic & Google

If you missed this week’s Kimi K2.5 announcement, you have to read on.

medium.com

PersonaPlex 7B: “Wait… That Was a Bot?”

PersonaPlex-7B is NVIDIA’s open-source, real-time full-duplex speech-to-speech conversational model.

medium.com

#pageindex #retrieval-augmented-gen #retrieval-augmented #agentic-rag #information-retrieval

Vectorless RAG for Agents: PageIndex Is Why Their Demo Works and Yours Needs "Context"

When I'm building a RAG apps, the thing that quietly kills quality (and budgets) is turning a document into a pile of "semantic confetti"…

Getting Started with PageIndex

How the indexing + retrieval flow fits together

Production integration options (SDK + HTTP APIs)

Multi-document search patterns

Bonus Articles

Local LLMs That Can Replace Claude Code

Small team of engineers can easily burn >$2K/mo on Anthropic’s Claude Code (Sonnet/Opus 4.5). As budgets are tight, you…

Fully Local Agentic Coding on localhost: Claude Code, Codex, llama.cpp, Ollama and Docker MR

This is the setup I wish I had the first time I tried to run Claude Code and Codex-style workflows against local…

Kimi K2.5 + Agent Swarms Beat US AI Labs: Open Source vs OpenAI, Anthropic & Google

If you missed this week’s Kimi K2.5 announcement, you have to read on.

PersonaPlex 7B: “Wait… That Was a Bot?”

PersonaPlex-7B is NVIDIA’s open-source, real-time full-duplex speech-to-speech conversational model.

Reporting a Problem