GraphRAG for VLM: Explainable Policy & Compliance Assistant

A GraphRAG system designed for Vision-Language Models (VLMs) to answer employee questions from policy/legal documents and visually rich PDFs. It combines vector retrieval over multimodal chunks with a knowledge graph to improve scope correctness, reduce wrong-but-similar retrieval, and produce explainable answers with citations and reasoning paths.

Problem

RAG for VLMs in enterprise policy/compliance is challenging because evidence often lives in visually rich PDFs (tables, forms, headings, footnotes) where meaning depends on layout and cross-references. Plain vector RAG can retrieve text that is semantically similar but contextually wrong (wrong section, outdated clause, missing definitions/exceptions). VLMs can read images, but without structured retrieval they still receive incomplete or inconsistent evidence, making answers hard to justify. The goal is to deliver accurate, scope-consistent answers with clear citations and a traceable explanation of why those sources were used.

Approach

Indexing (Offline): Ingest documents (PDF/Word/Wiki) and parse both text + layout signals. Chunk by section/clause (and, for scanned PDFs, by detected regions such as table blocks or form fields). Create embeddings for each chunk and store them in a vector index for semantic retrieval.
Graph Construction: From each chunk, automatically extract entities (e.g., policy topics, roles, systems, terms, definitions) and relationships (e.g., doc->has_part->section, section->mentions->entity, entity->related_to->entity). Normalize aliases so the graph links the same concept across documents. Store the graph in a Graph DB / property graph.
Query-time Retrieval (Online): When a user asks a question, first interpret intent/entities, then use the graph as a lightweight filter to narrow candidate documents/sections. Run vector search within that scope to retrieve the strongest evidence chunks. Expand via the graph (1-2 hops) to pull linked definitions, exceptions, and referenced sections that VLM answers often miss.
Evidence Pack + Generation: Merge and rerank vector + graph-expanded evidence, deduplicate, and build a compact multimodal evidence pack (text snippets + relevant page crops for tables/figures). Feed the pack to the VLM to generate an answer with citations, plus an optional reasoning path derived from the graph edges.

Results

Compared to (a) no-RAG VLM prompting and (b) plain vector RAG, GraphRAG improves: (1) precision of evidence selection by filtering out wrong-but-similar sections, (2) completeness by reliably pulling linked definitions/exceptions via graph expansion, and (3) trustworthiness by producing a verifiable reasoning trace (graph path) alongside citations. In practice, this yields fewer inconsistent answers, fewer missed exceptions, and clearer justifications that are easier to audit and present to employees and compliance stakeholders.