Primary reading
DuCharme — Ch 5–6
Chapter 5 covers federation in depth including the failure modes. Chapter 6 covers real-world applications — read both before starting Exercise 4.2.
The SERVICE keyword, federation failure modes, triplestore comparison, ontology versioning, and a step-by-step EC2 deployment guide. This is where the knowledge graph goes public.
SPARQL 1.1 defines the SERVICE keyword for federated queries: send part of a query to a remote SPARQL endpoint and join the results with local data. The promise is compelling — your local Naruto graph joins seamlessly with Wikidata's 100 million items. The reality involves latency, rate limits, and fragility that production systems mostly avoid.
The basic pattern:
PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX bd: <http://www.bigdata.com/rdf#> PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#> # Local data: characters who appear in the Pain's Assault arc # Wikidata: the arc's Wikipedia article metadata SELECT ?characterName ?articleTitle WHERE { # Local graph pattern ?ninja naruto:canonicalName ?characterName ; naruto:appearsInArc naruto:PainAssaultArc . # Federated pattern — sent to Wikidata's endpoint SERVICE <https://query.wikidata.org/sparql> { wd:Q26530693 wdt:P18 ?image . # Naruto series Wikidata item BIND("Pain's Assault arc" AS ?articleTitle) SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } }
The Fuseki endpoint receives the full query, executes the local pattern against local data, sends the SERVICE subquery to Wikidata, and joins the results. Latency is cumulative — local query time plus Wikidata round-trip time, which can be 2–15 seconds on a cold Wikidata query.
If the remote endpoint is unavailable or times out, a plain SERVICE clause fails the entire query. SERVICE SILENT treats failure as returning no rows from the remote endpoint — the rest of the query continues with the local data:
# SILENT: if Wikidata is down, return local results only SERVICE SILENT <https://query.wikidata.org/sparql> { SELECT ?label WHERE { wd:Q2009573 rdfs:label ?label . FILTER (LANG(?label) = "en") } }
Use SILENT for any remote endpoint you do not control. Wikidata's query service goes down several times per year and rate-limits heavy queries aggressively.
wdt:P18 (image) that runs today may return nothing next month if the property mapping changes. Federated queries have no schema contract.Federation earns its cost when: (a) the data is too large to materialize locally, (b) it must be live/real-time, or (c) you do not have the rights to cache it. ESCO is actually a good candidate — the full dataset is large, it is updated by the EU periodically, and you might prefer to hit the live endpoint rather than maintain a stale copy. For the curriculum, the ESCO stubs in resume-001.ttl are good enough for learning; for a production skill-inference pipeline, federated ESCO makes sense.
The Module 4 README names this as a pain point: "no Postgres-equivalent." Choosing a triplestore means accepting specific tradeoffs. Here are the five most relevant for a solo practitioner or small team deploying on EC2:
| Triplestore | Best for | Operations burden | Notable limitation |
|---|---|---|---|
| Oxigraph | Simple deployments, solo ops, RDF-star support | Very low — single Rust binary, no JVM | Smaller community; fewer enterprise features; no SPARQL Update federation |
| Apache Jena Fuseki | Learning, local dev, reference implementation | Medium — requires JVM, more config files | JVM memory footprint; not designed for high-concurrent-write production loads |
| GraphDB (free tier) | OWL reasoning in production, RDF-star, full-text search | Medium — Docker image, proprietary config | Free tier limits; requires registration; proprietary extensions create lock-in |
| Stardog | Enterprise OWL reasoning, data virtualization | Medium — cloud or self-hosted; good tooling | Not free for production; pricing can be high at scale |
| Blazegraph | Wikidata's stack — proven at massive scale | Medium — JVM, complex configuration | No longer actively developed (Wikidata migrated to Virtuoso for new development) |
For Exercise 4.1, the Module 4 README recommends Oxigraph for first deployment: single binary, no JVM, simple systemd service file, RDF-star support out of the box. Fuseki is the fallback if you prefer the reference implementation you have been using throughout the curriculum.
The Module 4 README calls ontology versioning "unsolved." Standard software version semantics (major.minor.patch) do not map cleanly onto ontology changes:
OWL 2 provides three annotation properties for versioning:
@prefix owl: <http://www.w3.org/2002/07/owl#> . <https://sensemaking-ai.com/ns/naruto> owl:versionInfo "1.0.0" ; owl:priorVersion <https://sensemaking-ai.com/ns/naruto/starter> ; owl:backwardCompatibleWith <https://sensemaking-ai.com/ns/naruto/starter> . # Declare incompatibility explicitly when breaking changes are made # owl:incompatibleWith <...prior-version>
The practical recommendation: use `owl:versionInfo` for display purposes and commit-level version tracking. Treat the Git history as your actual version control — every commit is a point in time you can diff and restore. OWL versioning annotations are supplementary metadata, not a substitute for version control on the file.
These steps assume you already have an EC2 instance with nginx and systemd, following the same pattern as your existing services. Expand each step when ready to execute it.
SSH into your EC2 instance. Download the latest Oxigraph release binary for Linux x86_64 from the GitHub releases page. As of mid-2026, the binary is named oxigraph.
curl -LO https://github.com/oxigraph/oxigraph/releases/latest/download/oxigraph_linux_x86_64 chmod +x oxigraph_linux_x86_64 sudo mv oxigraph_linux_x86_64 /usr/local/bin/oxigraph oxigraph --version
Verify the binary runs. Create the data directory:
sudo mkdir -p /var/lib/oxigraph/naruto sudo chown www-data:www-data /var/lib/oxigraph/naruto
Model after your existing service files. Create /etc/systemd/system/oxigraph.service:
[Unit] Description=Oxigraph SPARQL endpoint — Naruto KG After=network.target [Service] Type=simple User=www-data ExecStart=/usr/local/bin/oxigraph serve \ --location /var/lib/oxigraph/naruto \ --bind 127.0.0.1:7878 Restart=always RestartSec=5 [Install] WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload sudo systemctl enable oxigraph sudo systemctl start oxigraph sudo systemctl status oxigraph
Verify Oxigraph is listening: curl http://127.0.0.1:7878/ should return an HTML page.
Create /etc/nginx/sites-available/sparql.barbhs.com. The CORS headers are required for browser-side SPARQL queries from the Naruto KG Explorer:
server {
server_name sparql.barbhs.com;
location / {
proxy_pass http://127.0.0.1:7878;
proxy_set_header Host $host;
# CORS for browser SPARQL clients
add_header 'Access-Control-Allow-Origin' '*' always;
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always;
add_header 'Access-Control-Allow-Headers'
'Accept, Content-Type, Authorization' always;
if ($request_method = OPTIONS) {
add_header 'Access-Control-Max-Age' 1728000;
add_header 'Content-Type' 'text/plain charset=UTF-8';
add_header 'Content-Length' 0;
return 204;
}
}
# SSL — certbot will add this block automatically
}
sudo ln -s /etc/nginx/sites-available/sparql.barbhs.com \
/etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
sudo certbot --nginx -d sparql.barbhs.com
Upload the TTL file to the EC2 instance and load it via the Oxigraph HTTP API or CLI:
scp modules/02-modeling/artifacts/naruto-ontology/naruto-ontology-1.0.0.ttl \
ec2-user@your-instance:/tmp/
# Load via HTTP
curl -X PUT https://sparql.barbhs.com/store \
-H "Content-Type: text/turtle" \
--data-binary @/tmp/naruto-ontology-1.0.0.ttl
# Verify with a test query
curl "https://sparql.barbhs.com/query?query=SELECT+%2A+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D+LIMIT+10" \
-H "Accept: application/sparql-results+json"
If the query returns results, the store is loaded and queryable.
In your Uptime Kuma instance, add a new HTTP monitor for https://sparql.barbhs.com/query?query=ASK%20%7B%20%3Fs%20%3Fp%20%3Fo%20%7D. The ASK query returns a valid SPARQL response (200 OK with true) whenever the endpoint is up and data is loaded. This is a better health check than just pinging the root path.
Create modules/04-shipping/docs/triplestore-deployment.md documenting: Oxigraph version installed, systemd config decisions, nginx config, how to reload data, how to back up the data directory, and any EC2-specific surprises. This becomes the infrastructure case study artifact.
Run these from your local Fuseki against the public Wikidata endpoint. They demonstrate the federation patterns for Exercise 4.2 and directly expose the failure modes described in Section 2.
PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX bd: <http://www.bigdata.com/rdf#> PREFIX schema: <https://schema.org/> PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#> # Local: which arcs are in the Naruto ontology? # Remote: look up the Naruto series item on Wikidata for metadata SELECT ?arcName ?wikidataLabel WHERE { ?arc a naruto:Arc ; schema:name ?arcName . FILTER (LANG(?arcName) = "en") SERVICE SILENT <https://query.wikidata.org/sparql> { SELECT ?wikidataLabel WHERE { wd:Q2009573 rdfs:label ?wikidataLabel . FILTER (LANG(?wikidataLabel) = "en") } } } ORDER BY ?arcName
PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX bd: <http://www.bigdata.com/rdf#> # Exercise 4.2 specifies using your father's published patents # as an example of real federated data retrieval. # Replace Q_ID with the actual Wikidata ID for the person. # Search at wikidata.org to find the Q-number first. SELECT ?patent ?patentLabel ?date WHERE { SERVICE <https://query.wikidata.org/sparql> { ?patent wdt:P31 wd:Q253623 ; # instance of: patent wdt:P178 wd:REPLACE_WITH_INVENTOR_Q_ID . OPTIONAL { ?patent wdt:P577 ?date . } SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } } ORDER BY DESC(?date)
PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX bd: <http://www.bigdata.com/rdf#> PREFIX schema: <https://schema.org/> # Query Wikidata for items tagged as Naruto characters. # This is what a nightly materialization job would run to # prefetch metadata rather than federating at query time. SELECT ?character ?characterLabel ?gender ?voiceActor WHERE { SERVICE <https://query.wikidata.org/sparql> { ?character wdt:P31 wd:Q15711870 . # instance of: fictional character ?character wdt:P1080 wd:Q2009573 . # from fictional universe: Naruto OPTIONAL { ?character wdt:P21 ?gender . } OPTIONAL { ?character wdt:P725 ?voiceActor . } SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } } ORDER BY ?characterLabel LIMIT 20
Chapter 5 covers federation in depth including the failure modes. Chapter 6 covers real-world applications — read both before starting Exercise 4.2.
The recommended deployment triplestore. Check the releases page for the latest Linux binary. The README covers the HTTP API — this is how you load data, run queries, and manage the store from the CLI.
The full SERVICE keyword specification. Section 4 covers SERVICE SILENT semantics. Read before Exercise 4.2 to understand the behavior you are observing.
The UPDATE operations needed to manage data lifecycle in your deployed triplestore. The INSERT DATA and COPY operations from 4.1 are what you will use to load and refresh data after deployment.
The capstone: TwinKit Semantic v2.0, hybrid retrieval, the Naruto KG Explorer, and the honest evaluation framework. Requires the deployment from Exercise 4.1 to be complete first.
The artifact for Exercise 4.1: document every decision made during the EC2 deployment. This becomes the infrastructure case study and is part of the Module 4 README checklist.