Submodule 4.3 — LLM + KG integration · Sensemaking Semantic Web

01 · GraphRAG patterns

Beyond naive RAG.

Standard RAG (Retrieval Augmented Generation) retrieves text chunks by vector similarity to the query and feeds them into an LLM context window. It works well for single-hop questions: "What is Itachi's canonical motivation?" fails on any question that requires traversing multiple relationships: "Which characters across all three arcs have a sensei lineage that connects to Naruto?" Vector similarity can find chunks mentioning Naruto, but cannot follow the senseiOf chain without structural guidance.

Microsoft's GraphRAG paper (Edge et al., 2024) formalizes this problem and introduces a solution: build a community graph over the text corpus (not just a semantic index), then use graph traversal to assemble context from related entities rather than from similar chunks. The curriculum's hybrid approach is more targeted: we have an explicit, hand-designed ontology (the Naruto ontology from Module 2) rather than a derived community graph, which means the traversal is more precise and the explanations are more interpretable.

Three GraphRAG patterns

Entity-centric retrieval. Identify entities mentioned in the query, use graph traversal to find related entities, bias vector retrieval toward chunks mentioning those entities. This is the primary pattern in TwinKit Semantic v2.0.
Community summarization. (Microsoft GraphRAG) Pre-compute summaries of entity clusters in the knowledge graph; retrieve the relevant cluster summary as context. Good for global questions; requires pre-computation.
Subgraph extraction. For a given query entity, extract a k-hop subgraph as structured context alongside retrieved text chunks. Best for dense relational questions; expensive to compute at runtime for large graphs.

Why this matters for the positioning

The ability to articulate when graph augmentation helps and when it does not — backed by actual eval numbers — is rare. Most practitioner blog posts advocate for GraphRAG without running a comparison. The capstone's honest evaluation is what differentiates it from marketing content. "I ran 20 multi-hop questions and graph augmentation helped on 14, made no difference on 4, and was worse on 2. Here is why." That is the senior engineer's voice.

02 · Hybrid retrieval pipeline

Five steps per query.

Step 1

Entity extraction

LLM call: extract named entities from the user query. "Which characters trained under Jiraiya?" → entities: [Jiraiya, Naruto Uzumaki (likely)].

Step 2

Graph traversal

SPARQL query: given extracted entities, find related entities 1–2 hops away. senseiOf, rivalOf, appearsInArc expand the entity set.

Step 3

Biased retrieval

ChromaDB query: retrieve chunks filtered to those mentioning the expanded entity set. Structural graph results bias the vector retrieval.

Step 4

Context composition

Merge: structured SPARQL results (graph facts) + retrieved text chunks. Graph facts go first — they are more precise for relational questions.

# The full hybrid retrieval pipeline in pseudocode

# Step 1: Extract entities from query
entities = llm_extract_entities(user_query)
# → ["Jiraiya", "naruto:NarutoUzumaki"]

# Step 2: SPARQL graph traversal
related = sparql_expand(entities, hops=2)
# → ["naruto:KakashiHatake", "naruto:SasukeUchiha", "naruto:Team7", ...]

# Step 3: ChromaDB retrieval biased toward expanded entities
chunks = chromadb.query(
    query_embeddings=embed(user_query),
    where={"entities": {"$in": related}},
    n_results=5
)

# Step 4: Compose context
context = sparql_results_as_text(related) + "\n\n" + format_chunks(chunks)
answer = llm_generate(user_query, context)

03 · TwinKit Semantic v2.0 architecture

What changes from v1.0.

TwinKit v1.0: markdown ingestion → ChromaDB → LLM generation. TwinKit Semantic v2.0 adds the semantic layer as an optional module that activates when a domain ontology is available.

TwinKit Semantic v2.0: parallel ingestion (text → ChromaDB; structured data → SPARQL endpoint) and hybrid retrieval (graph expansion + biased vector search) before LLM generation.

What the Naruto demo adds to the framework

The Naruto Knowledge Graph is the demo dataset that makes the framework tangible to non-practitioners. Instead of "a knowledge graph of your domain," it is a clickable, queryable graph of 87 characters, three arcs, and an ontology anyone who watched the show can verify. The Gradio or D3.js Explorer app (Exercise 4.3, step 5) is the front door — it is what people click on before reading the case study.

The two-birds strategy: TwinKit Semantic is the consulting artifact; the Naruto demo is the attention hook. They deploy together with one architecture.

04 · SPARQL queries for hybrid retrieval

The graph traversal queries in the hybrid pipeline.

q01

Entity expansion: given a character name, find all related entities 2 hops away.

Pattern: hybrid retrieval Step 2 · property path expansion · entity set for ChromaDB filter

PREFIX schema: <https://schema.org/>
PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>

# Step 2 of the hybrid pipeline: given entity ?seed (Jiraiya),
# return all entities reachable via any naruto: property within 2 hops.
SELECT DISTINCT ?relatedEntity ?relatedName WHERE {
  # Seed entity lookup by name
  ?seed naruto:canonicalName ?seedName .
  FILTER (CONTAINS(LCASE(?seedName), "jiraiya"))

  # 1-2 hop expansion via any naruto: property (forward + inverse)
  {
    ?seed ?p1 ?hop1 .
    FILTER (STRSTARTS(STR(?p1),
           "https://sensemaking-ai.com/ns/naruto#"))
    OPTIONAL { ?hop1 ?p2 ?relatedEntity .
      FILTER (STRSTARTS(STR(?p2),
             "https://sensemaking-ai.com/ns/naruto#"))
    }
    BIND(COALESCE(?relatedEntity, ?hop1) AS ?relatedEntity)
  }
  OPTIONAL { ?relatedEntity naruto:canonicalName ?relatedName . }
}
ORDER BY ?relatedName

How this feeds into the hybrid pipeline

The result set of ?relatedEntity IRIs becomes the filter for the ChromaDB query in Step 3. Any text chunk whose metadata includes entities in this set is prioritized in retrieval. The graph traversal surfaces Naruto (Jiraiya's student), Kakashi (fellow Jonin), Team 7 (Naruto's team), and the Pain's Assault arc (where Jiraiya appears) — entities a vector-only search on "Jiraiya" might miss entirely.

In production, the property path would be parameterized to control hop count and property coverage. Expanding via all naruto: properties is broad; a more targeted version might restrict to only senseiOf, memberOfTeam, and appearsInArc.

q02

Multi-hop question answering: which characters share a sensei lineage with Naruto?

Pattern: multi-hop traversal · the class of questions that breaks vector-only RAG

PREFIX schema: <https://schema.org/>
PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>

# Find all characters who share a common sensei ancestor with Naruto.
# This requires 3+ hops: Naruto ← senseiOf ← Kakashi → senseiOf → Team7.
# Pure vector search cannot traverse senseiOf chains reliably.
SELECT DISTINCT ?co_studentName ?sharedSenseiName WHERE {
  # Naruto's sensei lineage (upward)
  ?sensei naruto:senseiOf naruto:NarutoUzumaki .

  # Other students of the same sensei
  ?sensei naruto:senseiOf ?co_student .
  FILTER (?co_student != naruto:NarutoUzumaki)

  ?sensei naruto:canonicalName ?sharedSenseiName .
  ?co_student naruto:canonicalName ?co_studentName .
}
ORDER BY ?sharedSenseiName ?co_studentName

Expected output and why this demonstrates the value

Sasuke Uchiha / Kakashi Hatake · Sakura Haruno / Kakashi Hatake (and Naruto studentOf Jiraiya after u01 runs)

This is an example of the 20 multi-hop questions for the eval set (Exercise 4.3, step 7). A vector-only RAG system, given the question "which characters share a sensei lineage with Naruto?", would retrieve chunks mentioning Naruto and hope that nearby text mentions Kakashi and Team 7. The graph traversal answers it structurally — no luck required.

The eval framework compares: (a) baseline vector retrieval with this question, (b) hybrid retrieval with this SPARQL query's results as the entity expansion. Score each on accuracy (is the answer correct?) and latency (how long did it take?).

q03

Generate the SPARQL for the Naruto KG Explorer — natural language to graph query.

Pattern: NL-to-SPARQL · LLM-generated query · Explorer app backend

# This is the query the Naruto KG Explorer generates when a user asks:
# "Who are Itachi's known masters?"
# The LLM generates SPARQL; Oxigraph executes it; the UI renders results.

# LLM system prompt excerpt (TwinKit Semantic v2.0):
# "You are a SPARQL query generator for the Naruto Knowledge Graph.
#  The ontology uses PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>.
#  Available properties: senseiOf, studentOf, rivalOf, memberOfTeam,
#  memberOfVillage, hasJutsu, hasRank, appearsInArc, familyOf.
#  Generate only valid SPARQL SELECT queries."

# LLM-generated query for "Who are Itachi's known masters?":

PREFIX schema: <https://schema.org/>
PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>

SELECT ?masterName WHERE {
  {
    # Direct: someone who is senseiOf Itachi
    ?master naruto:senseiOf naruto:ItachiUchiha ;
            naruto:canonicalName ?masterName .
  }
  UNION
  {
    # Inverse: Itachi is studentOf someone
    naruto:ItachiUchiha naruto:studentOf ?master .
    ?master naruto:canonicalName ?masterName .
  }
}
ORDER BY ?masterName

The NL-to-SPARQL pattern

The Naruto KG Explorer (Exercise 4.3, step 5) takes natural-language questions, passes them to an LLM with the ontology schema as context, and runs the generated SPARQL against Oxigraph. The generated query is shown to the user alongside the results — transparency about the mechanism is a design choice, not an afterthought.

Error handling matters: generated SPARQL may be syntactically invalid or return 0 rows. The Explorer should degrade gracefully: show the raw query if it fails, let the user edit it, fall back to vector-only retrieval if graph retrieval returns nothing.

05 · Evaluation framework

Honesty is the differentiating artifact.

Exercise 4.3 step 7 specifies 20 multi-hop questions evaluated across three conditions. Here is the framework for structuring those results honestly.

The 20-question set design

5 single-hop factual questions (e.g., "What is Naruto's rank?") — vector retrieval should handle these well. Graph augmentation is not expected to help. A baseline win for vector-only.
10 multi-hop relational questions (e.g., "Who are the students of all Jonin-rank characters?") — graph traversal is necessary for precision. Hybrid should win here.
5 ambiguous or contested questions (e.g., "Who is Itachi's true master?" — answers depend on which source you trust) — neither approach handles uncertainty well. Document both failures honestly.

Question type	Metric	Expected winner	Honest finding if wrong
Single-hop factual	Accuracy (correct/incorrect)	Vector-only	"Graph augmentation added latency with no accuracy gain on simple factual questions."
Multi-hop relational	Accuracy + completeness	Hybrid	"Graph traversal helped precision; vector retrieval caught prose context the graph missed."
Ambiguous / contested	Calibration (does it express uncertainty?)	Neither cleanly	"Both systems confidently answer questions where the correct answer is 'it depends on source.' This is a known LLM limitation, not a graph limitation."
Latency (all types)	Wall-clock ms per query	Vector-only	"Hybrid retrieval adds 300–800ms per query for graph traversal. At this scale, acceptable; at production traffic, cache aggressively."

The cases where hybrid is worse

The Module 4 README is explicit: document where hybrid retrieval is worse, not just where it is better. Expect to find: (a) questions where the graph entity expansion adds noise rather than signal — unrelated entities get included, biasing retrieval toward irrelevant chunks; (b) questions where vector retrieval already captures the answer from nearby prose and graph augmentation adds only latency; (c) questions where the ontology is missing the relevant relationship and the graph traversal returns nothing useful. All three are honest, publishable findings.

06 · Exercise 4.3 — the capstone

Where the curriculum lands.

The Module 4 README has the full nine-step capstone plan. The most important sequencing note: do the eval before writing the case study. The eval produces your findings; the case study communicates them. Writing the case study first produces marketing copy, not a technical contribution.

The minimum viable capstone

If time pressure forces a scope reduction: (a) deploy Oxigraph on EC2 with the naruto-ontology-1.0.0.ttl loaded and publicly queryable — this is the deployment credential; (b) build a Gradio app that takes a natural-language question, generates SPARQL, runs it, and shows results — this is the Explorer demo; (c) run the eval on 10 questions instead of 20. The case study documents what you built and what the eval showed. That is a complete, publishable capstone even without TwinKit v2.0 integration.

The deployment test

From the Module 4 README: "Is the Naruto KG Explorer deployed and queryable by anyone with a browser?" The URL should be public — not localhost, not behind a VPN, not a screenshot. Anyone should be able to open it and run a SPARQL query or natural-language question. This is the credential test for the whole curriculum.

The publishable artifacts that come out of the capstone

TwinKit v2.0 GitHub release — tagged, with a README that positions the semantic layer and links to the case study
Naruto KG Explorer — live URL, publicly accessible
Case study on barbhs.com: "Building a hybrid semantic + vector knowledge twin (with Naruto)"
LinkedIn capstone announcement — with the eval results as the hook, not the architecture
Conference abstract for 2027 (Connected Data World, SEMANTiCS, or a RAG-track AI conference)

After shipping: update the LinkedIn About section with the consulting positioning statement from the SYLLABUS closing note. The curriculum is complete. The artifacts are the receipts.

07 · Resources

The final reading list.

The paper

Microsoft GraphRAG (Edge et al. 2024)

The foundational paper for this submodule. Read the full paper, not just the abstract — the community summarization approach and the global vs. local query distinction are where the nuance lives.

Working code

GraphRAG GitHub

Microsoft's open-source implementation. The pipeline differs from TwinKit Semantic v2.0 (community graphs vs. ontology-driven traversal) — use it for comparison, not as a template.

Prior reading

Allemang et al. — Ch 14–15

Chapters 14 and 15 cover enterprise modeling and the knowledge graph production landscape. Chapter 15 is the closest thing in the textbook to a deployment guide.

KG + LLM commentary

Bob DuCharme's blog

Recent posts on SPARQL + LLM integration, NL-to-SPARQL, and the practical LLM+KG landscape. The most current practitioner voice in the curriculum's reading list.

Deployment reference

Submodule 4.2 — Deployment guide

The EC2 deployment steps (Exercise 4.1) must be complete before starting the capstone. The Oxigraph endpoint is what the TwinKit Semantic hybrid retriever queries.

Curriculum home

REFLECTIONS.md

The final curriculum artifact. Write the end-of-Module-4 reflection after the capstone ships — what you built, what you learned, what you would do differently. This closes the curriculum.