Problem Statement
Organisations accumulate large volumes of internal knowledge — policies, procedures, technical documentation, meeting notes, and historical decisions — across files, wikis, and email threads. Finding accurate answers requires either knowing exactly where to look or spending time searching through multiple sources. General-purpose language models cannot access this private data and often fabricate answers when asked about domain-specific topics. The challenge was to build an accurate, trustworthy question-answering system grounded in real organisational knowledge.
Key Challenges:
- Grounding model answers in private data without fine-tuning
- Effective retrieval from large, heterogeneous document collections
- Minimising hallucination — ensuring answers cite actual source material
- Keeping the knowledge base current as documents are updated
- Handling multi-step questions requiring reasoning over multiple retrieved passages
System Architecture
Documents are ingested, chunked, and embedded into a vector database. At query time, the user's question is embedded and used to retrieve the most semantically relevant chunks. These chunks are injected into an LLM prompt as context, constraining the model to answer only from provided evidence. Source citations are returned alongside the answer for verification.
Document Ingestion Pipeline
Documents are parsed, split into semantically coherent chunks with overlap, embedded using a sentence embedding model, and stored in the vector database with metadata (source, version, section) for filtering.
Retrieval Engine
Hybrid retrieval combining dense vector search (semantic similarity) with sparse BM25 keyword matching. Retrieval results are re-ranked by relevance to the specific query before being passed to the generation stage.
Grounded Generation
Retrieved chunks are injected into the LLM prompt with explicit instructions to answer only from provided context and to state when the answer is not found — reducing fabrication to near zero for in-scope questions.
Knowledge Base Maintenance
Incremental update pipeline monitors document sources for changes, re-embeds modified documents, and removes stale vectors — keeping retrieval results accurate without full re-indexing.
Key Engineering Challenges
Chunking Strategy
Challenge: Poor chunking (too small or splitting across logical boundaries) degrades retrieval quality significantly.
Solution: Sentence-boundary-aware chunking with configurable overlap, preserving paragraph structure and section headings as metadata for context-aware retrieval filtering.
Hallucination Suppression
Challenge: LLMs tend to fabricate plausible-sounding answers when retrieved context is insufficient.
Solution: Explicit prompt instructions, confidence estimation based on retrieval similarity scores, and fallback responses for low-confidence retrievals directing users to contact a human expert.
Multi-Document Reasoning
Challenge: Some questions require synthesising information across several retrieved passages rather than answering from a single chunk.
Solution: Iterative retrieval with a reasoning step that identifies sub-questions, retrieves for each, and synthesises a final answer — implemented using a lightweight agentic loop.
Retrieval Precision vs. Recall
Challenge: Retrieving too few results misses relevant content; too many introduces noise that confuses the model.
Solution: Hybrid dense + sparse retrieval followed by a cross-encoder re-ranker selecting the top-k most relevant passages, balancing precision and coverage.
Solutions Implemented
- Hybrid Retrieval: Combined vector similarity search with BM25 keyword matching for robust coverage across semantic and lexical query types.
- Cross-Encoder Re-ranking: Secondary relevance model scoring each retrieved passage against the query before passing to the LLM, improving answer grounding.
- Source Attribution: Every answer includes citations to the source documents and sections used, enabling users to verify and explore further.
- Incremental Indexing: Change-detection pipeline that re-embeds only modified documents, keeping the knowledge base fresh without expensive full re-indexing.
- API Layer: REST API exposing query, document ingestion, and knowledge base management endpoints for integration with internal tools and portals.
Outcome & Impact
On in-scope queries
Without human escalation
Knowledge discovery time
On every answer returned