Sensemaking AI · Sensemaking Semantic Web
Submodule 4.3 · Shipping — Capstone

LLM + KG integration.

GraphRAG patterns, hybrid vector + graph retrieval, the TwinKit Semantic v2.0 architecture, and an honest evaluation framework. The curriculum culminates here.

Module 4 · Weeks 10–12 Capstone: Exercise 4.3 TwinKit Semantic v2.0 GraphRAG · hybrid retrieval · eval

Beyond naive RAG.

Standard RAG (Retrieval Augmented Generation) retrieves text chunks by vector similarity to the query and feeds them into an LLM context window. It works well for single-hop questions: "What is Itachi's canonical motivation?" fails on any question that requires traversing multiple relationships: "Which characters across all three arcs have a sensei lineage that connects to Naruto?" Vector similarity can find chunks mentioning Naruto, but cannot follow the senseiOf chain without structural guidance.

Microsoft's GraphRAG paper (Edge et al., 2024) formalizes this problem and introduces a solution: build a community graph over the text corpus (not just a semantic index), then use graph traversal to assemble context from related entities rather than from similar chunks. The curriculum's hybrid approach is more targeted: we have an explicit, hand-designed ontology (the Naruto ontology from Module 2) rather than a derived community graph, which means the traversal is more precise and the explanations are more interpretable.

Three GraphRAG patterns

Why this matters for the positioning

The ability to articulate when graph augmentation helps and when it does not — backed by actual eval numbers — is rare. Most practitioner blog posts advocate for GraphRAG without running a comparison. The capstone's honest evaluation is what differentiates it from marketing content. "I ran 20 multi-hop questions and graph augmentation helped on 14, made no difference on 4, and was worse on 2. Here is why." That is the senior engineer's voice.

Five steps per query.

Step 1
Entity extraction

LLM call: extract named entities from the user query. "Which characters trained under Jiraiya?" → entities: [Jiraiya, Naruto Uzumaki (likely)].

Step 2
Graph traversal

SPARQL query: given extracted entities, find related entities 1–2 hops away. senseiOf, rivalOf, appearsInArc expand the entity set.

Step 3
Biased retrieval

ChromaDB query: retrieve chunks filtered to those mentioning the expanded entity set. Structural graph results bias the vector retrieval.

Step 4
Context composition

Merge: structured SPARQL results (graph facts) + retrieved text chunks. Graph facts go first — they are more precise for relational questions.

# The full hybrid retrieval pipeline in pseudocode

# Step 1: Extract entities from query
entities = llm_extract_entities(user_query)
# → ["Jiraiya", "naruto:NarutoUzumaki"]

# Step 2: SPARQL graph traversal
related = sparql_expand(entities, hops=2)
# → ["naruto:KakashiHatake", "naruto:SasukeUchiha", "naruto:Team7", ...]

# Step 3: ChromaDB retrieval biased toward expanded entities
chunks = chromadb.query(
    query_embeddings=embed(user_query),
    where={"entities": {"$in": related}},
    n_results=5
)

# Step 4: Compose context
context = sparql_results_as_text(related) + "\n\n" + format_chunks(chunks)
answer = llm_generate(user_query, context)

What changes from v1.0.

TwinKit v1.0: markdown ingestion → ChromaDB → LLM generation. TwinKit Semantic v2.0 adds the semantic layer as an optional module that activates when a domain ontology is available.

INGESTION Markdown / source data text chunks RDF triples ChromaDB vector index Oxigraph SPARQL endpoint RETRIEVAL (hybrid) Hybrid retriever graph expansion + biased vector search LLM generation
TwinKit Semantic v2.0: parallel ingestion (text → ChromaDB; structured data → SPARQL endpoint) and hybrid retrieval (graph expansion + biased vector search) before LLM generation.

What the Naruto demo adds to the framework

The Naruto Knowledge Graph is the demo dataset that makes the framework tangible to non-practitioners. Instead of "a knowledge graph of your domain," it is a clickable, queryable graph of 87 characters, three arcs, and an ontology anyone who watched the show can verify. The Gradio or D3.js Explorer app (Exercise 4.3, step 5) is the front door — it is what people click on before reading the case study.

The two-birds strategy: TwinKit Semantic is the consulting artifact; the Naruto demo is the attention hook. They deploy together with one architecture.

The graph traversal queries in the hybrid pipeline.

q01
Entity expansion: given a character name, find all related entities 2 hops away.
Pattern: hybrid retrieval Step 2 · property path expansion · entity set for ChromaDB filter
PREFIX schema: <https://schema.org/>
PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>

# Step 2 of the hybrid pipeline: given entity ?seed (Jiraiya),
# return all entities reachable via any naruto: property within 2 hops.
SELECT DISTINCT ?relatedEntity ?relatedName WHERE {
  # Seed entity lookup by name
  ?seed naruto:canonicalName ?seedName .
  FILTER (CONTAINS(LCASE(?seedName), "jiraiya"))

  # 1-2 hop expansion via any naruto: property (forward + inverse)
  {
    ?seed ?p1 ?hop1 .
    FILTER (STRSTARTS(STR(?p1),
           "https://sensemaking-ai.com/ns/naruto#"))
    OPTIONAL { ?hop1 ?p2 ?relatedEntity .
      FILTER (STRSTARTS(STR(?p2),
             "https://sensemaking-ai.com/ns/naruto#"))
    }
    BIND(COALESCE(?relatedEntity, ?hop1) AS ?relatedEntity)
  }
  OPTIONAL { ?relatedEntity naruto:canonicalName ?relatedName . }
}
ORDER BY ?relatedName
q02
Multi-hop question answering: which characters share a sensei lineage with Naruto?
Pattern: multi-hop traversal · the class of questions that breaks vector-only RAG
PREFIX schema: <https://schema.org/>
PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>

# Find all characters who share a common sensei ancestor with Naruto.
# This requires 3+ hops: Naruto ← senseiOf ← Kakashi → senseiOf → Team7.
# Pure vector search cannot traverse senseiOf chains reliably.
SELECT DISTINCT ?co_studentName ?sharedSenseiName WHERE {
  # Naruto's sensei lineage (upward)
  ?sensei naruto:senseiOf naruto:NarutoUzumaki .

  # Other students of the same sensei
  ?sensei naruto:senseiOf ?co_student .
  FILTER (?co_student != naruto:NarutoUzumaki)

  ?sensei naruto:canonicalName ?sharedSenseiName .
  ?co_student naruto:canonicalName ?co_studentName .
}
ORDER BY ?sharedSenseiName ?co_studentName
q03
Generate the SPARQL for the Naruto KG Explorer — natural language to graph query.
Pattern: NL-to-SPARQL · LLM-generated query · Explorer app backend
# This is the query the Naruto KG Explorer generates when a user asks:
# "Who are Itachi's known masters?"
# The LLM generates SPARQL; Oxigraph executes it; the UI renders results.

# LLM system prompt excerpt (TwinKit Semantic v2.0):
# "You are a SPARQL query generator for the Naruto Knowledge Graph.
#  The ontology uses PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>.
#  Available properties: senseiOf, studentOf, rivalOf, memberOfTeam,
#  memberOfVillage, hasJutsu, hasRank, appearsInArc, familyOf.
#  Generate only valid SPARQL SELECT queries."

# LLM-generated query for "Who are Itachi's known masters?":

PREFIX schema: <https://schema.org/>
PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>

SELECT ?masterName WHERE {
  {
    # Direct: someone who is senseiOf Itachi
    ?master naruto:senseiOf naruto:ItachiUchiha ;
            naruto:canonicalName ?masterName .
  }
  UNION
  {
    # Inverse: Itachi is studentOf someone
    naruto:ItachiUchiha naruto:studentOf ?master .
    ?master naruto:canonicalName ?masterName .
  }
}
ORDER BY ?masterName

Honesty is the differentiating artifact.

Exercise 4.3 step 7 specifies 20 multi-hop questions evaluated across three conditions. Here is the framework for structuring those results honestly.

The 20-question set design

Question typeMetricExpected winnerHonest finding if wrong
Single-hop factualAccuracy (correct/incorrect)Vector-only"Graph augmentation added latency with no accuracy gain on simple factual questions."
Multi-hop relationalAccuracy + completenessHybrid"Graph traversal helped precision; vector retrieval caught prose context the graph missed."
Ambiguous / contestedCalibration (does it express uncertainty?)Neither cleanly"Both systems confidently answer questions where the correct answer is 'it depends on source.' This is a known LLM limitation, not a graph limitation."
Latency (all types)Wall-clock ms per queryVector-only"Hybrid retrieval adds 300–800ms per query for graph traversal. At this scale, acceptable; at production traffic, cache aggressively."
The cases where hybrid is worse

The Module 4 README is explicit: document where hybrid retrieval is worse, not just where it is better. Expect to find: (a) questions where the graph entity expansion adds noise rather than signal — unrelated entities get included, biasing retrieval toward irrelevant chunks; (b) questions where vector retrieval already captures the answer from nearby prose and graph augmentation adds only latency; (c) questions where the ontology is missing the relevant relationship and the graph traversal returns nothing useful. All three are honest, publishable findings.

Where the curriculum lands.

The Module 4 README has the full nine-step capstone plan. The most important sequencing note: do the eval before writing the case study. The eval produces your findings; the case study communicates them. Writing the case study first produces marketing copy, not a technical contribution.

The minimum viable capstone

If time pressure forces a scope reduction: (a) deploy Oxigraph on EC2 with the naruto-ontology-1.0.0.ttl loaded and publicly queryable — this is the deployment credential; (b) build a Gradio app that takes a natural-language question, generates SPARQL, runs it, and shows results — this is the Explorer demo; (c) run the eval on 10 questions instead of 20. The case study documents what you built and what the eval showed. That is a complete, publishable capstone even without TwinKit v2.0 integration.

The deployment test

From the Module 4 README: "Is the Naruto KG Explorer deployed and queryable by anyone with a browser?" The URL should be public — not localhost, not behind a VPN, not a screenshot. Anyone should be able to open it and run a SPARQL query or natural-language question. This is the credential test for the whole curriculum.

The publishable artifacts that come out of the capstone

After shipping: update the LinkedIn About section with the consulting positioning statement from the SYLLABUS closing note. The curriculum is complete. The artifacts are the receipts.

The final reading list.

The paper

Microsoft GraphRAG (Edge et al. 2024)

The foundational paper for this submodule. Read the full paper, not just the abstract — the community summarization approach and the global vs. local query distinction are where the nuance lives.

Working code

GraphRAG GitHub

Microsoft's open-source implementation. The pipeline differs from TwinKit Semantic v2.0 (community graphs vs. ontology-driven traversal) — use it for comparison, not as a template.

Prior reading

Allemang et al. — Ch 14–15

Chapters 14 and 15 cover enterprise modeling and the knowledge graph production landscape. Chapter 15 is the closest thing in the textbook to a deployment guide.

KG + LLM commentary

Bob DuCharme's blog

Recent posts on SPARQL + LLM integration, NL-to-SPARQL, and the practical LLM+KG landscape. The most current practitioner voice in the curriculum's reading list.

Deployment reference

Submodule 4.2 — Deployment guide

The EC2 deployment steps (Exercise 4.1) must be complete before starting the capstone. The Oxigraph endpoint is what the TwinKit Semantic hybrid retriever queries.

Curriculum home

REFLECTIONS.md

The final curriculum artifact. Write the end-of-Module-4 reflection after the capstone ships — what you built, what you learned, what you would do differently. This closes the curriculum.