An Empirical Study of Knowledge Graph-Enhanced RAG for Information Security Compliance
Journal
Information
Date Issued
2026-04-20
Author(s)
Jovanovski, Dimitar
Stojcheva, Marija
Dodevska, Mila
DOI
10.3390/info17040389
Abstract
Information security compliance has become critical for organizations worldwide, with the ISO/IEC 27000 family serving as the most widely adopted framework for establishing information security management systems. Despite their global acceptance, these standards present significant interpretation challenges due to their formal language, abstract structure, and extensive cross-referencing across 97 documents. Traditional retrieval-augmented generation (RAG) systems, which rely on independent text chunking and dense vector retrieval, prove inadequate for such highly interconnected regulatory materials, often fragmenting contextual relationships and reducing accuracy. This study introduces a privacy-preserving RAG framework that integrates LightRAG, a knowledge graph-based retrieval system, with locally hosted open-source language models. Unlike chunk-based RAG systems that treat document segments independently, the system in this study constructs a semantic knowledge graph that explicitly models relationships between clauses through typed edges representing cross-references, semantic similarity, and hierarchical dependencies. To enable rigorous evaluation, we developed a curated benchmark dataset of 222 multiple-choice questions with authoritative ground-truth answers, systematically constructed from official ISO standards, certification preparation materials, and academic sources. Through systematic evaluation on this benchmark, we show that knowledge graph-based retrieval achieves higher accuracy than chunk-based RAG and non-retrieval LLM baselines within the evaluated setup. The analysis indicates that embedding model quality is strongly associated with system performance, that hybrid retrieval modes combining local and global graph traversal tend to yield better accuracy, and that mid-sized open-source models paired with strong retrievers can approach the performance of larger proprietary systems. The best configuration achieves 90.54% accuracy, demonstrating the promising effectiveness of graph-structured retrieval for multiple-choice regulatory questions.
