Submodule 3.3 — SHACL and vocabulary alignment

01 · SHACL in production

Going deeper than the Module 2 introduction.

Submodule 2.3 introduced SHACL as the closed-world answer to OWL's open-world assumption. This submodule goes deeper: the full constraint component vocabulary, SPARQL-based custom constraints, severity levels, and how SHACL fits into a production data pipeline.

The SHACL data model

SHACL defines two kinds of shapes:

sh:NodeShape — targets specific nodes in the graph (via sh:targetClass, sh:targetNode, sh:targetSubjectsOf, or sh:targetObjectsOf) and applies property constraints to them.
sh:PropertyShape — a reusable constraint on a specific property, which can be referenced from multiple NodeShapes via sh:property.

# A NodeShape targeting all naruto:Ninja nodes
naruto:NinjaShape
    a sh:NodeShape ;
    sh:targetClass naruto:Ninja ;
    sh:property [
        sh:path     naruto:canonicalName ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
        sh:message  "A Ninja must have exactly one canonicalName."@en ;
        sh:severity sh:Violation
    ] ;
    sh:property [
        sh:path     naruto:memberOfVillage ;
        sh:class    naruto:Village ;
        sh:minCount 1 ;
        sh:message  "A Ninja must belong to at least one Village."@en
    ] .

Key constraint components

Component	Constrains	Example
`sh:minCount` / `sh:maxCount`	Number of values	Exactly one canonicalName
`sh:datatype`	Literal type	canonicalName must be xsd:string
`sh:class`	Object type	memberOfVillage value must be a naruto:Village
`sh:pattern`	Regex on literals	villageCode matches "[A-Z]{2}"
`sh:in`	Allowed values	confidence in {canonical, disputed, fan-theory}
`sh:hasValue`	Required specific value	a node must be of rdf:type naruto:Ninja
`sh:node`	Reference another shape	employer must conform to OrganizationShape
`sh:sparql`	Custom SPARQL constraint	No circular senseiOf chains

Severity levels

SHACL has three built-in severity levels, attached with sh:severity:

sh:Violation — the node does not conform; a data ingestion pipeline should reject it (default if not specified).
sh:Warning — the constraint is best-practice but not blocking; flag for review.
sh:Info — informational annotation; not a conformance failure.

Use severity to express policy rather than binary pass/fail. A missing schema:url on an organization might be a Warning; a missing canonicalName on a Ninja is a Violation. Both constraints live in the same shape file.

SPARQL-based constraints

For constraints that basic SHACL components cannot express, sh:sparql allows embedding a SPARQL query that returns violation results.

# Custom constraint: no ninja should be their own sensei
naruto:NinjaShape sh:sparql [
    a sh:SPARQLConstraint ;
    sh:message "A Ninja cannot be their own sensei."@en ;
    sh:severity sh:Violation ;
    sh:select """
        PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>
        SELECT $this WHERE {
            $this naruto:senseiOf $this .
        }
    """ ;
] .

SHACL vs OWL restrictions — choosing between them

Both SHACL and OWL can say "a Ninja must have a Village." They mean different things. OWL's existential restriction says "if this is a Ninja, there must exist some Village it belongs to — possibly unasserted." SHACL's minCount says "if this is a Ninja and there is no asserted Village triple, this is a validation error." Use OWL for reasoning (what can be inferred). Use SHACL for data quality (what must be present). Both can coexist in the same system serving different purposes.

02 · Vocabulary alignment

Not all equivalences are equal.

When connecting your graph to an external vocabulary or dataset, the equivalence predicate you choose has consequences that compound through reasoning. The wrong choice is usually owl:sameAs applied where skos:exactMatch should go.

Predicate	Asserts	Reasoner consequence	Use when
`owl:sameAs`	These two IRIs refer to the same individual — full identity	All properties from both nodes merge. Every triple about A is also about B and vice versa.	You actually intend full property merging. Rare in practice — usually too strong.
`skos:exactMatch`	These two concepts are semantically equivalent in their respective vocabularies	No property merging. Linked for vocabulary alignment purposes only.	Linking a local skill concept to an ESCO concept. Linking a Wikidata entity to your ontology individual.
`skos:closeMatch`	These two concepts are similar but not identical	No property merging. Weaker alignment signal.	A local "data engineering" skill is close to but not exactly ESCO's "apply data engineering methods."
`skos:broadMatch`	The local concept is narrower than the external concept	No property merging. Hierarchical signal for cross-vocabulary navigation.	A specific "PyTorch proficiency" skill broadly matches ESCO's "machine learning" concept.
`owl:equivalentClass`	These two classes have exactly the same extension	Instances of one class are inferred to be instances of the other.	Your naruto:Ninja class is declared equivalent to an external ontology's Fighter class — intentional class merging.

The owl:sameAs hammer in practice

Asserting :BarbaraHidalgo owl:sameAs wikidata:Q12345678 causes a reasoner to merge every Wikidata property into your local node — birth date, nationality, Wikipedia categories, external identifiers, everything. If your graph has a sensemaking:personalBlog property, it now transfers to the Wikidata entity. This is almost never what you want. Use skos:exactMatch instead: it signals "same referent" without triggering property inheritance. Reserve owl:sameAs for cases where you genuinely need two IRIs to behave identically under all reasoning.

03 · Skill inference

What formal rules can surface that a resume doesn't state.

The Resume Graph Explorer contains explicitly asserted skills — what someone listed on their resume. Formal inference rules can surface skills that are implied by the combination of asserted skills and ESCO's vocabulary structure, or by the duration and nature of work experience.

The inference rules (from the Module 3 README)

# Rule 1: ESCO-related exposure
# If a person has skill X and X is skos:related to Y in ESCO,
# infer sensemaking:hasInferredSkill ?person ?y at "exposure" level.

CONSTRUCT {
  ?person sensemaking:hasInferredSkill ?relatedConcept .
  _:basis a sensemaking:InferenceBasis ;
          sensemaking:inferenceRule "esco-related-exposure" ;
          sensemaking:exposureLevel "exposure" .
}
WHERE {
  ?person sensemaking:hasSkill ?skill .
  ?skill skos:exactMatch/skos:related ?relatedConcept .
  FILTER NOT EXISTS {
    ?person sensemaking:hasSkill ?s2 .
    ?s2 skos:exactMatch ?relatedConcept .
  }
}

# Rule 2: Long-tenure promotion
# If a person held a role for 2+ years AND has a skill at "advanced",
# escalate skos:related concepts from "exposure" to "demonstrated".

CONSTRUCT {
  ?person sensemaking:hasInferredSkill ?relatedConcept .
  _:basis a sensemaking:InferenceBasis ;
          sensemaking:inferenceRule "long-tenure-promotion" ;
          sensemaking:exposureLevel "demonstrated" .
}
WHERE {
  ?person sensemaking:hasEmployment ?emp ;
          sensemaking:hasSkill ?skill .
  ?skill skos:exactMatch/skos:related ?relatedConcept ;
         sensemaking:skillLevel "advanced" .
  ?emp sensemaking:startDate ?start ;
       sensemaking:endDate   ?end .
  FILTER ((?end - ?start) > "P2Y"^^xsd:duration)
  FILTER NOT EXISTS {
    ?person sensemaking:hasSkill ?s2 .
    ?s2 skos:exactMatch ?relatedConcept .
  }
}

What to load for the query lab

Load both modules/01-foundations/artifacts/resume-graph/ttl/resume-001.ttl (the Alex Rivera resume) and modules/03-reasoning/artifacts/skill-inference/skill-inference-starter.ttl (the ESCO-related stubs and inference property declarations) into the same Fuseki dataset before running the query lab. The queries below apply the inference rules to Alex's resume data.

04 · Query lab

Five queries — validation, alignment, and inference.

q01

What SHACL violations would Alex's resume have against the NinjaShape equivalent?

Pattern: SHACL validation simulation via SPARQL · checking minCount constraints manually

Open Fuseki ↗

PREFIX foaf:        <http://xmlns.com/foaf/0.1/>
PREFIX dcterms:     <http://purl.org/dc/terms/>
PREFIX sensemaking: <https://sensemaking-ai.com/ns/>

# Simulate a PersonShape validation:
# - Every foaf:Person must have foaf:name (minCount 1)
# - Every sensemaking:Resume must have dcterms:created (minCount 1)
# Find violations: persons without a name

SELECT ?person ?violation WHERE {
  ?person a foaf:Person .
  BIND("missing foaf:name" AS ?violation)
  FILTER NOT EXISTS { ?person foaf:name ?name . }
}
UNION
{
  ?resume a sensemaking:Resume .
  BIND("missing dcterms:created" AS ?violation)
  FILTER NOT EXISTS { ?resume dcterms:created ?date . }
}
ORDER BY ?person

Expected output

0 rows — Alex's resume has foaf:name and dcterms:created. Try deliberately removing one from the TTL and reloading to see violations appear.

This query manually simulates what a SHACL validator computes automatically. The FILTER NOT EXISTS pattern is the SPARQL equivalent of SHACL's sh:minCount 1 violation check. In production, you would run Apache Jena's shacl validate CLI or a library integration rather than writing this SPARQL manually — but understanding that SHACL validation IS SPARQL-executable is the key insight.

Think about this

1. Add a third UNION branch checking that every sensemaking:Employment has at least one sensemaking:startDate. Does the query return a violation for the current data?

2. The real SHACL validator (Apache Jena shacl validate command) produces a validation report as RDF output itself — a graph describing each violation. What advantages does that have over returning violations as SPARQL table rows?

q02

Which resume skills have exactMatch to ESCO, and which have skos:related links available?

Pattern: vocabulary alignment audit · exactMatch vs related · inference readiness check

Open Fuseki ↗

PREFIX foaf:        <http://xmlns.com/foaf/0.1/>
PREFIX skos:        <http://www.w3.org/2004/02/skos/core#>
PREFIX sensemaking: <https://sensemaking-ai.com/ns/>

SELECT ?skillLabel ?hasExactMatch ?relatedCount WHERE {
  ?person a foaf:Person ;
          sensemaking:hasSkill ?skill .
  ?skill skos:prefLabel ?skillLabel .
  FILTER (LANG(?skillLabel) = "en")
  BIND(EXISTS { ?skill skos:exactMatch ?esco . } AS ?hasExactMatch)
  {
    SELECT ?skill (COUNT(?related) AS ?relatedCount) WHERE {
      OPTIONAL { ?skill skos:exactMatch/skos:related ?related . }
    } GROUP BY ?skill
  }
}
ORDER BY DESC(?relatedCount)

Expected output

3 rows · Python (true / 3 related) · SQL (true / 3 related) · Machine learning (true / 3 related)

All three skills have exactMatch links (to the ESCO stubs) and each has 3 skos:related concepts. This is the "inference readiness check" — confirming that the vocabulary alignment is in place before running the inference rules. A skill with hasExactMatch = false or relatedCount = 0 would not benefit from Rule 1.

Think about this

1. If you added a skill to Alex's resume without an exactMatch link (e.g., "Excel proficiency"), this query would show it with hasExactMatch = false and relatedCount = 0. That skill would not be inferred to have any related skills. What does this tell you about the cost of incomplete vocabulary alignment?

2. The ESCO portal lets you look up actual skill IRIs and their skos:related links. For Exercise 3.4, you would replace the stubs with real ESCO IRIs and load a portion of the ESCO RDF dataset. How many related skills does the real Python skill in ESCO have?

q03

CONSTRUCT: apply Rule 1 — infer skills from ESCO related concepts.

Pattern: CONSTRUCT as inference rule · skos:exactMatch/skos:related path · FILTER NOT EXISTS for new-only

Open Fuseki ↗

PREFIX foaf:        <http://xmlns.com/foaf/0.1/>
PREFIX skos:        <http://www.w3.org/2004/02/skos/core#>
PREFIX sensemaking: <https://sensemaking-ai.com/ns/>

CONSTRUCT {
  ?person sensemaking:hasInferredSkill ?relatedConcept .
}
WHERE {
  ?person a foaf:Person ;
          sensemaking:hasSkill ?skill .
  ?skill skos:exactMatch/skos:related ?relatedConcept .
  FILTER NOT EXISTS {
    ?person sensemaking:hasSkill ?s2 .
    ?s2 skos:exactMatch ?relatedConcept .
  }
}

Expected output

9 triples (Turtle) · person hasInferredSkill esco_software_testing, esco_version_control, esco_data_pipelines (from Python) · esco_database_design, esco_data_modeling, esco_etl_processes (from SQL) · esco_statistics, esco_data_preprocessing, esco_model_evaluation (from ML)

Switch Fuseki to Turtle format. The CONSTRUCT produces 9 new triples — inferred skills that were not on Alex's resume but are implied by the ESCO-related relationships of his asserted skills. These are "exposure-level" inferences: Alex has Python at advanced, so the reasoner infers he has likely been exposed to software testing, version control, and data pipelines.

Think about this

1. The FILTER NOT EXISTS excludes any related concept that is already an explicitly asserted skill. Alex's asserted skills are Python, SQL, and machine learning. Do any of the related concepts overlap with these asserted skills? If so, they would not appear in the inferred output.

2. This is Rule 1 producing "exposure" level inferences. Write a separate query that returns only the inferred skills where the source skill is at "advanced" level — those would be candidates for Rule 2's "demonstrated" escalation.

q04

CONSTRUCT: apply Rule 2 — escalate to "demonstrated" for long-tenure roles.

Pattern: date arithmetic in SPARQL · tenure-based inference escalation

Open Fuseki ↗

PREFIX foaf:        <http://xmlns.com/foaf/0.1/>
PREFIX xsd:         <http://www.w3.org/2001/XMLSchema#>
PREFIX skos:        <http://www.w3.org/2004/02/skos/core#>
PREFIX sensemaking: <https://sensemaking-ai.com/ns/>

# Skills inferred at "demonstrated" level:
# person held a role for 2+ years AND the asserted skill is "advanced"
CONSTRUCT {
  ?person sensemaking:hasDemonstratedSkill ?relatedConcept .
}
WHERE {
  ?person a foaf:Person ;
          sensemaking:hasEmployment ?emp ;
          sensemaking:hasSkill ?skill .
  ?skill skos:exactMatch/skos:related ?relatedConcept ;
         sensemaking:skillLevel "advanced" .
  ?emp sensemaking:startDate ?start .
  OPTIONAL { ?emp sensemaking:endDate ?end . }
  BIND(COALESCE(?end, "2026-06-03"^^xsd:date) AS ?effectiveEnd)
  FILTER ((?effectiveEnd - ?start) > "P730D"^^xsd:duration)
  FILTER NOT EXISTS {
    ?person sensemaking:hasSkill ?s2 .
    ?s2 skos:exactMatch ?relatedConcept .
  }
}

Expected output

Triples (Turtle) · hasDemonstratedSkill for esco_software_testing, esco_version_control, esco_data_pipelines (from Python/advanced + Riverbend 2.5yr employment) · esco_database_design, esco_data_modeling, esco_etl_processes (from SQL/advanced + Riverbend 2.5yr)

The Riverbend Analytics role ran 2020-01 to 2022-07 — 2.5 years. Alex was advanced in Python and SQL during that role. The COALESCE handles the current role (Northwind Health, no endDate) by substituting today's date. Duration comparison in Fuseki uses xsd:duration arithmetic — "P730D" is 730 days (approximately 2 years).

Think about this

1. The COALESCE for current employment (no endDate) uses today's date. Is that the right semantics? What if Alex has been at Northwind for only 3 months with Python at "advanced"? Should those related skills be "demonstrated"?

2. Rules 1 and 2 produce different output triples (hasInferredSkill vs hasDemonstratedSkill). What would Rule 3 (core competency by co-occurrence) produce? Write the CONSTRUCT template header — the WHERE clause is the harder part.

q05

What skills do the formal rules infer — and which of those would an LLM also identify?

Pattern: inference result summary · the comparison query that sets up Exercise 3.4

Open Fuseki ↗

PREFIX foaf:        <http://xmlns.com/foaf/0.1/>
PREFIX skos:        <http://www.w3.org/2004/02/skos/core#>
PREFIX sensemaking: <https://sensemaking-ai.com/ns/>

# After running q03 and q04 CONSTRUCTs and loading the results:
# Summary of all inferred skills with their labels
SELECT DISTINCT ?skillLabel ?inferenceType WHERE {
  ?person a foaf:Person .
  {
    ?person sensemaking:hasInferredSkill ?concept .
    BIND("exposure" AS ?inferenceType)
  }
  UNION
  {
    ?person sensemaking:hasDemonstratedSkill ?concept .
    BIND("demonstrated" AS ?inferenceType)
  }
  ?concept skos:prefLabel ?skillLabel .
  FILTER (LANG(?skillLabel) = "en")
}
ORDER BY ?inferenceType ?skillLabel

Expected output (after running q03 + q04 and inserting results)

Up to 9 rows · demonstrated: database design, data modeling, data pipelines, ETL processes, software testing, version control · exposure: model evaluation, data preprocessing, statistics

This is the comparison baseline for Exercise 3.4. Print or save this list. Then run an LLM prompt asking the same question from the raw resume text: "What skills would you infer from this resume that aren't explicitly listed?" Compare the two lists systematically: what did formal rules find that the LLM missed? What did the LLM find that formal rules missed? Where do they agree? Where does each one get it wrong?

Think about this

1. The formal rules are only as good as the ESCO vocabulary links. If a skill is not linked to ESCO, it contributes no inferences. An LLM has no such constraint — it draws on broad training data. What does this asymmetry tell you about when to use each approach?

2. The formal rules produce reproducible results from the same data. An LLM may produce different results on successive runs, or may hallucinate skills entirely. For a hiring system, which property matters more: reproducibility or coverage?

05 · Formal reasoning vs LLM reasoning

Where each one wins — and where it doesn't.

This comparison is the consulting-positioning artifact the Module 3 README describes as "highest-leverage." The honest version of the comparison — not a promotional piece for either approach — is what makes it valuable.

Property	Formal reasoning (OWL + SPARQL)	LLM inference
Reproducibility	Deterministic: same data + same rules = same result, always.	Non-deterministic: results vary across runs and model versions.
Coverage	Bounded by vocabulary links and rule coverage. Can miss obvious skills not in ESCO.	Broad: draws on training data. Can surface skills the vocabulary doesn't cover.
Explainability	Every inferred skill traces to a specific rule and evidence triple. Fully auditable.	Opaque: the reasoning is distributed across billions of parameters. Not auditable.
Hallucination risk	None: inferences are strictly derived from asserted data. Cannot produce a skill not in the graph.	High: LLMs can confidently assert skills that have no basis in the resume text.
Maintenance cost	High: rules must be written, tested, and updated as the vocabulary evolves.	Low: the model updates handle most improvement without manual rule changes.
Nuance	Low: rules are binary — the condition is met or it is not. Cannot handle "probably" without custom confidence scoring.	High: LLMs handle hedged, contextual, and domain-specific inferences naturally.

The honest consulting position

Neither approach is uniformly better. Formal reasoning wins where reproducibility, auditability, and hallucination risk matter — compliance, legal, medical, regulatory. LLM inference wins where coverage, nuance, and low maintenance matter — exploratory discovery, consumer-facing features, rapid iteration. The most honest answer — and the most valuable consulting position — is knowing which context you are in and which risks your client is willing to accept. Building both and comparing them on real data (Exercise 3.4) is how you earn the right to say that confidently.

06 · Bridge to Exercise 3.4

The primary project: Resume skill inference.

Exercise 3.4 (primary project B) builds the Jupyter notebook that demonstrates skill inference end-to-end on 5–10 resume profiles and compares formal rules to LLM inference. The query lab above provides the formal rules side. For the LLM side:

The LLM prompt template

For each resume: "Given the following resume data [paste plain-text resume], what skills would you infer that are NOT explicitly listed? List only skills implied by the work experience, not mentioned directly." Run this prompt with the same resume text used in the formal inference. Record both outputs.

The comparison structure

For each profile, create a table with four columns: Skill name · Formal rules found it (yes/no) · LLM found it (yes/no) · Is it actually a reasonable inference? (your judgment). The fourth column is the ground truth — and your judgment IS the data. After 5-10 profiles, patterns emerge: formal rules miss X, LLM hallucinates Y, both agree on Z. That pattern is the blog post.

The artifact lives at modules/03-reasoning/artifacts/skill-inference/ as a Jupyter notebook. The Module 3 README says this connects directly to the existing narrative synthesizer work — the comparison result is positioning gold for the consulting practice.

07 · Resources

Reading, tools, and next module.

SHACL spec

W3C SHACL

Section 2 (Data Shapes) and Section 5 (Core Constraint Components) are the most relevant. The SPARQL-based constraints (§8) are the advanced feature used in the NinjaShape circular-reference example.

SHACL book

Validating RDF Data

Labra Gayo et al. — open access. The most complete treatment of SHACL and ShEx available. Chapters 3–5 cover the constraint model and validation semantics. Keep open as a reference during Exercise 3.4.

Primary reading

Allemang et al. — Ch 13

Chapter 13 covers vocabulary alignment and the sameAs / exactMatch distinction in production contexts. The examples are from life sciences and finance — different domains, same design decisions.

Starter data

skill-inference-starter.ttl

ESCO-related concept stubs, inference property declarations, and the three inference rule templates as comments. Load alongside resume-001.ttl before running the query lab.

Vocabulary alignment

ESCO API

The ESCO portal's Linked Data download and API. For Exercise 3.4, download the ESCO skills taxonomy as RDF and load it into Fuseki to replace the stubs with real concept IRIs and real skos:related links.

Next module

Module 4 — Shipping

SPARQL UPDATE, federation, deployment, and the TwinKit Semantic v2.0 capstone. Module 4 depends on having the Naruto ontology (Module 2) and either reification artifact (3.3) or skill inference (3.4) ready as demo data.