Unified indexing centralizes enterprise data into one corpus before retrieval, while federated search brokers queries out to source systems and merges their pre-ranked results. For Retrieval-Augmented Generation, strict federation is a failure mode: it offers zero-latency freshness but is mathematically incapable of the global corpus statistics that relevance normalization and semantic vector search require. The modern standard is a unified "System of Context" that applies Reciprocal Rank Fusion (RRF) and late-binding security over one index.
1. The Engineering Hypothesis: Contextual Dissonance
In a RAG architecture the LLM relies entirely on the precision of the retrieved context. The hypothesis: stateless federation is incompatible with semantic search. When a search engine acts as a broker, it receives pre-ranked lists from source APIs. Jira might return a relevance score of 0.9 from a keyword-frequency model; SharePoint might return 0.9 from a probabilistic model. To the broker these scores look identical, yet they represent vastly different relevance, and merging them creates a poisoned context window where low-value documents displace high-value answers. Federation also can't support vector search unless every source exposes an embedding API, which remains rare. The fix is a shift from a "Search" mental model to a "Knowledge Graph" one, the same move that structured data nesting makes for commerce agents.
2. Forensic Evidence
2.1 The mathematics of hybrid fusion (RRF)
Linear combinations of scores fail because lexical (BM25) and semantic (cosine) scores follow different distributions. The industry standard is Reciprocal Rank Fusion, which ignores absolute scores and ranks documents by their position in multiple result sets, smoothing outliers. Modern engines (Elasticsearch 8.14+) fuse these signals natively via "Retrievers," avoiding client-side complexity.
2.2 The security latency penalty: the HNSW-ACL conflict
Zero Trust forces the engine to filter results by user permissions, which creates tension in late-binding (query-time) resolution. HNSW graphs work by hopping between nearest neighbors. If a pre-filter (ACL) removes nodes before the search, the graph becomes disconnected, the algorithm gets trapped in a local neighborhood (an island of accessible nodes), and it can't traverse to the true nearest neighbor. The metric is stark: naive pre-filtering on sparse permissions can cut recall by 40-60%, or balloon latency by orders of magnitude if it forces a brute-force scan.
3. The Unique Insight: Markdown Is the AEO Schema for Enterprise RAG
Most engineers obsess over the embedding model (OpenAI vs Cohere). The silent killer of Intranet RAG is the parsing pipeline. The 1,500-site audit showed that unstructured data acts as a blocker for AI intelligibility, and the same applies internally. Standard OCR extracts text linearly (Row1 Col1 Row1 Col2), destroying the spatial relationships in tables, so when an LLM receives flattened table data it hallucinates the associations. Just as client-side rendering blocks external agents by obfuscating the DOM, PDF flattening blocks internal RAG agents by destroying the layout. The fix is layout-aware Markdown as the canonical storage format: LLMs are pre-trained on GitHub and inherently understand Markdown, so a # H1 signals a topic shift and a | Col | Col | preserves the grid. Converting PDFs to Markdown before embedding is, in effect, performing AEO on your internal corpus, the same case made in the token tax / Markdown-vs-HTML study.
4. Reproduction Steps / The Fix
To build a federated-style unified RAG system that respects ACLs and structure, work through four steps. Step 1, layout-aware ingestion: replace standard text extractors with a vision-based parser (LlamaParse, Azure Document Intelligence) targeting Markdown output; for complex tables, use visual retrieval (ColPali) that embeds the page image directly, bypassing text-extraction failures, the same logic behind structuring complex data for agents. Step 2, parent-child indexing: don't chunk arbitrarily; split Markdown into small child chunks (256 tokens) for high-precision vector retrieval, store the surrounding parent window (the full section), match on the child and return the parent to the context window. Step 3, optimized late-binding security: implement bitset caching for ACLs (pre-calculate a bitset of document access for high-frequency groups like "All Engineering") and use filtered HNSW or "HoneyBee" role-based partitioning to prevent graph disconnection during vector search. Step 4, decay-function scoring: apply a Gaussian decay (origin now, scale 30d, or 7d for fast-moving Slack data) so older authoritative documents stay valid while ancient status updates are treated as noise.
Is your corpus readable before it's retrievable?
The same readability and structure signals that decide AI search visibility decide internal RAG quality. Run the audit to see how machine-ingestible your content actually is.
Audit your machine-readability →The contrarian point that cuts against the entire "federated search" sales pitch: the freshness advantage everyone buys federation for is the one thing RAG can't actually use. A broker that returns the newest possible document with the wrong relevance score feeds the LLM a confident wrong answer faster, and speed-to-hallucination is not a feature. The boring, slightly stale unified index that normalizes its scores wins every time the answer has to be correct rather than merely recent.
5. Reference Sources
- Glean: Is MCP + federated search killing the index?
- Elastic: Weighted Reciprocal Rank Fusion (RRF)
- arXiv: HoneyBee: Efficient Role-based Access Control for Vector Databases via Dynamic Partitioning (Zhong et al., 2025)
- arXiv: ColPali: Efficient Document Retrieval with Vision Language Models
- Unstructured.io: Preserving Table Structure for Better Retrieval

