Sensemaking AI · Sensemaking Semantic Web
Submodule 4.2 · Shipping

Federation and deployment.

The SERVICE keyword, federation failure modes, triplestore comparison, ontology versioning, and a step-by-step EC2 deployment guide. This is where the knowledge graph goes public.

Module 4 · Weeks 10–12 Exercise 4.1: EC2 deployment Exercise 4.2: Federation experiment SERVICE · Oxigraph · nginx · CORS

One query, two endpoints.

SPARQL 1.1 defines the SERVICE keyword for federated queries: send part of a query to a remote SPARQL endpoint and join the results with local data. The promise is compelling — your local Naruto graph joins seamlessly with Wikidata's 100 million items. The reality involves latency, rate limits, and fragility that production systems mostly avoid.

The basic pattern:

PREFIX wd:  <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd:  <http://www.bigdata.com/rdf#>
PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>

# Local data: characters who appear in the Pain's Assault arc
# Wikidata: the arc's Wikipedia article metadata
SELECT ?characterName ?articleTitle WHERE {
  # Local graph pattern
  ?ninja naruto:canonicalName ?characterName ;
         naruto:appearsInArc   naruto:PainAssaultArc .

  # Federated pattern — sent to Wikidata's endpoint
  SERVICE <https://query.wikidata.org/sparql> {
    wd:Q26530693 wdt:P18 ?image .  # Naruto series Wikidata item
    BIND("Pain's Assault arc" AS ?articleTitle)
    SERVICE wikibase:label {
      bd:serviceParam wikibase:language "en" .
    }
  }
}

The Fuseki endpoint receives the full query, executes the local pattern against local data, sends the SERVICE subquery to Wikidata, and joins the results. Latency is cumulative — local query time plus Wikidata round-trip time, which can be 2–15 seconds on a cold Wikidata query.

The SILENT modifier

If the remote endpoint is unavailable or times out, a plain SERVICE clause fails the entire query. SERVICE SILENT treats failure as returning no rows from the remote endpoint — the rest of the query continues with the local data:

# SILENT: if Wikidata is down, return local results only
SERVICE SILENT <https://query.wikidata.org/sparql> {
  SELECT ?label WHERE {
    wd:Q2009573 rdfs:label ?label .
    FILTER (LANG(?label) = "en")
  }
}

Use SILENT for any remote endpoint you do not control. Wikidata's query service goes down several times per year and rate-limits heavy queries aggressively.

Why production systems mostly avoid it.

When federation is the right choice

Federation earns its cost when: (a) the data is too large to materialize locally, (b) it must be live/real-time, or (c) you do not have the rights to cache it. ESCO is actually a good candidate — the full dataset is large, it is updated by the EU periodically, and you might prefer to hit the live endpoint rather than maintain a stale copy. For the curriculum, the ESCO stubs in resume-001.ttl are good enough for learning; for a production skill-inference pipeline, federated ESCO makes sense.

Five viable options, each with rough edges.

The Module 4 README names this as a pain point: "no Postgres-equivalent." Choosing a triplestore means accepting specific tradeoffs. Here are the five most relevant for a solo practitioner or small team deploying on EC2:

TriplestoreBest forOperations burdenNotable limitation
Oxigraph Simple deployments, solo ops, RDF-star support Very low — single Rust binary, no JVM Smaller community; fewer enterprise features; no SPARQL Update federation
Apache Jena Fuseki Learning, local dev, reference implementation Medium — requires JVM, more config files JVM memory footprint; not designed for high-concurrent-write production loads
GraphDB (free tier) OWL reasoning in production, RDF-star, full-text search Medium — Docker image, proprietary config Free tier limits; requires registration; proprietary extensions create lock-in
Stardog Enterprise OWL reasoning, data virtualization Medium — cloud or self-hosted; good tooling Not free for production; pricing can be high at scale
Blazegraph Wikidata's stack — proven at massive scale Medium — JVM, complex configuration No longer actively developed (Wikidata migrated to Virtuoso for new development)

For Exercise 4.1, the Module 4 README recommends Oxigraph for first deployment: single binary, no JVM, simple systemd service file, RDF-star support out of the box. Fuseki is the fallback if you prefer the reference implementation you have been using throughout the curriculum.

Semver doesn't quite map.

The Module 4 README calls ontology versioning "unsolved." Standard software version semantics (major.minor.patch) do not map cleanly onto ontology changes:

OWL 2 provides three annotation properties for versioning:

@prefix owl: <http://www.w3.org/2002/07/owl#> .

<https://sensemaking-ai.com/ns/naruto>
    owl:versionInfo           "1.0.0" ;
    owl:priorVersion          <https://sensemaking-ai.com/ns/naruto/starter> ;
    owl:backwardCompatibleWith <https://sensemaking-ai.com/ns/naruto/starter> .

# Declare incompatibility explicitly when breaking changes are made
# owl:incompatibleWith <...prior-version>

The practical recommendation: use `owl:versionInfo` for display purposes and commit-level version tracking. Treat the Git history as your actual version control — every commit is a point in time you can diff and restore. OWL versioning annotations are supplementary metadata, not a substitute for version control on the file.

Add a triplestore to your existing EC2 infrastructure.

These steps assume you already have an EC2 instance with nginx and systemd, following the same pattern as your existing services. Expand each step when ready to execute it.

Oxigraph deployment steps — expand all
1 Install Oxigraph on the EC2 instance expand

SSH into your EC2 instance. Download the latest Oxigraph release binary for Linux x86_64 from the GitHub releases page. As of mid-2026, the binary is named oxigraph.

curl -LO https://github.com/oxigraph/oxigraph/releases/latest/download/oxigraph_linux_x86_64
chmod +x oxigraph_linux_x86_64
sudo mv oxigraph_linux_x86_64 /usr/local/bin/oxigraph
oxigraph --version

Verify the binary runs. Create the data directory:

sudo mkdir -p /var/lib/oxigraph/naruto
sudo chown www-data:www-data /var/lib/oxigraph/naruto
2 Create the systemd service file expand

Model after your existing service files. Create /etc/systemd/system/oxigraph.service:

[Unit]
Description=Oxigraph SPARQL endpoint — Naruto KG
After=network.target

[Service]
Type=simple
User=www-data
ExecStart=/usr/local/bin/oxigraph serve \
  --location /var/lib/oxigraph/naruto \
  --bind 127.0.0.1:7878
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable oxigraph
sudo systemctl start oxigraph
sudo systemctl status oxigraph

Verify Oxigraph is listening: curl http://127.0.0.1:7878/ should return an HTML page.

3 Configure nginx reverse proxy with CORS expand

Create /etc/nginx/sites-available/sparql.barbhs.com. The CORS headers are required for browser-side SPARQL queries from the Naruto KG Explorer:

server {
    server_name sparql.barbhs.com;

    location / {
        proxy_pass http://127.0.0.1:7878;
        proxy_set_header Host $host;

        # CORS for browser SPARQL clients
        add_header 'Access-Control-Allow-Origin' '*' always;
        add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always;
        add_header 'Access-Control-Allow-Headers'
            'Accept, Content-Type, Authorization' always;

        if ($request_method = OPTIONS) {
            add_header 'Access-Control-Max-Age' 1728000;
            add_header 'Content-Type' 'text/plain charset=UTF-8';
            add_header 'Content-Length' 0;
            return 204;
        }
    }

    # SSL — certbot will add this block automatically
}
sudo ln -s /etc/nginx/sites-available/sparql.barbhs.com \
           /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
sudo certbot --nginx -d sparql.barbhs.com
4 Load the Naruto ontology into the deployed store expand

Upload the TTL file to the EC2 instance and load it via the Oxigraph HTTP API or CLI:

scp modules/02-modeling/artifacts/naruto-ontology/naruto-ontology-1.0.0.ttl \
    ec2-user@your-instance:/tmp/

# Load via HTTP
curl -X PUT https://sparql.barbhs.com/store \
  -H "Content-Type: text/turtle" \
  --data-binary @/tmp/naruto-ontology-1.0.0.ttl

# Verify with a test query
curl "https://sparql.barbhs.com/query?query=SELECT+%2A+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D+LIMIT+10" \
  -H "Accept: application/sparql-results+json"

If the query returns results, the store is loaded and queryable.

5 Add to Uptime Kuma and document the deployment expand

In your Uptime Kuma instance, add a new HTTP monitor for https://sparql.barbhs.com/query?query=ASK%20%7B%20%3Fs%20%3Fp%20%3Fo%20%7D. The ASK query returns a valid SPARQL response (200 OK with true) whenever the endpoint is up and data is loaded. This is a better health check than just pinging the root path.

Create modules/04-shipping/docs/triplestore-deployment.md documenting: Oxigraph version installed, systemd config decisions, nginx config, how to reload data, how to back up the data directory, and any EC2-specific surprises. This becomes the infrastructure case study artifact.

Three queries — Exercise 4.2 patterns.

Run these from your local Fuseki against the public Wikidata endpoint. They demonstrate the federation patterns for Exercise 4.2 and directly expose the failure modes described in Section 2.

f01
Join local Naruto arc data with Wikidata's article for the Naruto series.
Pattern: SERVICE federation · local + remote join · latency measurement
PREFIX wd:      <http://www.wikidata.org/entity/>
PREFIX wdt:     <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd:      <http://www.bigdata.com/rdf#>
PREFIX schema:  <https://schema.org/>
PREFIX naruto:  <https://sensemaking-ai.com/ns/naruto#>

# Local: which arcs are in the Naruto ontology?
# Remote: look up the Naruto series item on Wikidata for metadata
SELECT ?arcName ?wikidataLabel WHERE {
  ?arc a naruto:Arc ;
       schema:name ?arcName .
  FILTER (LANG(?arcName) = "en")

  SERVICE SILENT <https://query.wikidata.org/sparql> {
    SELECT ?wikidataLabel WHERE {
      wd:Q2009573 rdfs:label ?wikidataLabel .
      FILTER (LANG(?wikidataLabel) = "en")
    }
  }
}
ORDER BY ?arcName
f02
Retrieve your father's published patents from Wikidata and join with local portfolio data.
Pattern: real-world federation · Wikidata as external data source · the Exercise 4.2 example
PREFIX wd:      <http://www.wikidata.org/entity/>
PREFIX wdt:     <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd:      <http://www.bigdata.com/rdf#>

# Exercise 4.2 specifies using your father's published patents
# as an example of real federated data retrieval.
# Replace Q_ID with the actual Wikidata ID for the person.
# Search at wikidata.org to find the Q-number first.

SELECT ?patent ?patentLabel ?date WHERE {
  SERVICE <https://query.wikidata.org/sparql> {
    ?patent wdt:P31  wd:Q253623 ;  # instance of: patent
            wdt:P178 wd:REPLACE_WITH_INVENTOR_Q_ID .
    OPTIONAL { ?patent wdt:P577 ?date . }
    SERVICE wikibase:label {
      bd:serviceParam wikibase:language "en" .
    }
  }
}
ORDER BY DESC(?date)
f03
Simulate what a materialization refresh job would retrieve — all Naruto character Wikidata items.
Pattern: data materialization planning · what to prefetch to avoid runtime federation
PREFIX wd:      <http://www.wikidata.org/entity/>
PREFIX wdt:     <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd:      <http://www.bigdata.com/rdf#>
PREFIX schema:  <https://schema.org/>

# Query Wikidata for items tagged as Naruto characters.
# This is what a nightly materialization job would run to
# prefetch metadata rather than federating at query time.
SELECT ?character ?characterLabel ?gender ?voiceActor WHERE {
  SERVICE <https://query.wikidata.org/sparql> {
    ?character wdt:P31    wd:Q15711870 .  # instance of: fictional character
    ?character wdt:P1080  wd:Q2009573 .   # from fictional universe: Naruto
    OPTIONAL { ?character wdt:P21 ?gender . }
    OPTIONAL { ?character wdt:P725 ?voiceActor . }
    SERVICE wikibase:label {
      bd:serviceParam wikibase:language "en" .
    }
  }
}
ORDER BY ?characterLabel
LIMIT 20

Reading, tools, and next steps.

Primary reading

DuCharme — Ch 5–6

Chapter 5 covers federation in depth including the failure modes. Chapter 6 covers real-world applications — read both before starting Exercise 4.2.

Triplestore

Oxigraph GitHub

The recommended deployment triplestore. Check the releases page for the latest Linux binary. The README covers the HTTP API — this is how you load data, run queries, and manage the store from the CLI.

Federation spec

W3C SPARQL 1.1 Federated Query

The full SERVICE keyword specification. Section 4 covers SERVICE SILENT semantics. Read before Exercise 4.2 to understand the behavior you are observing.

Prior submodule

Submodule 4.1 — SPARQL UPDATE

The UPDATE operations needed to manage data lifecycle in your deployed triplestore. The INSERT DATA and COPY operations from 4.1 are what you will use to load and refresh data after deployment.

Next submodule

Submodule 4.3 — LLM + KG integration

The capstone: TwinKit Semantic v2.0, hybrid retrieval, the Naruto KG Explorer, and the honest evaluation framework. Requires the deployment from Exercise 4.1 to be complete first.

Deployment docs

triplestore-deployment.md (you write this — Exercise 4.1 deliverable)

The artifact for Exercise 4.1: document every decision made during the EC2 deployment. This becomes the infrastructure case study and is part of the Module 4 README checklist.