SHACL spec
W3C SHACL
Section 2 (Data Shapes) and Section 5 (Core Constraint Components) are the most relevant. The SPARQL-based constraints (§8) are the advanced feature used in the NinjaShape circular-reference example.
Closed-world validation in an open-world graph. owl:sameAs vs skos:exactMatch. Formal skill inference applied to resume data — and where it beats (and loses to) LLM reasoning.
Submodule 2.3 introduced SHACL as the closed-world answer to OWL's open-world assumption. This submodule goes deeper: the full constraint component vocabulary, SPARQL-based custom constraints, severity levels, and how SHACL fits into a production data pipeline.
SHACL defines two kinds of shapes:
sh:NodeShape — targets specific nodes in the graph (via sh:targetClass, sh:targetNode, sh:targetSubjectsOf, or sh:targetObjectsOf) and applies property constraints to them.sh:PropertyShape — a reusable constraint on a specific property, which can be referenced from multiple NodeShapes via sh:property.# A NodeShape targeting all naruto:Ninja nodes
naruto:NinjaShape
a sh:NodeShape ;
sh:targetClass naruto:Ninja ;
sh:property [
sh:path naruto:canonicalName ;
sh:datatype xsd:string ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:message "A Ninja must have exactly one canonicalName."@en ;
sh:severity sh:Violation
] ;
sh:property [
sh:path naruto:memberOfVillage ;
sh:class naruto:Village ;
sh:minCount 1 ;
sh:message "A Ninja must belong to at least one Village."@en
] .
| Component | Constrains | Example |
|---|---|---|
sh:minCount / sh:maxCount | Number of values | Exactly one canonicalName |
sh:datatype | Literal type | canonicalName must be xsd:string |
sh:class | Object type | memberOfVillage value must be a naruto:Village |
sh:pattern | Regex on literals | villageCode matches "[A-Z]{2}" |
sh:in | Allowed values | confidence in {canonical, disputed, fan-theory} |
sh:hasValue | Required specific value | a node must be of rdf:type naruto:Ninja |
sh:node | Reference another shape | employer must conform to OrganizationShape |
sh:sparql | Custom SPARQL constraint | No circular senseiOf chains |
SHACL has three built-in severity levels, attached with sh:severity:
sh:Violation — the node does not conform; a data ingestion pipeline should reject it (default if not specified).sh:Warning — the constraint is best-practice but not blocking; flag for review.sh:Info — informational annotation; not a conformance failure.Use severity to express policy rather than binary pass/fail. A missing schema:url on an organization might be a Warning; a missing canonicalName on a Ninja is a Violation. Both constraints live in the same shape file.
For constraints that basic SHACL components cannot express, sh:sparql allows embedding a SPARQL query that returns violation results.
# Custom constraint: no ninja should be their own sensei
naruto:NinjaShape sh:sparql [
a sh:SPARQLConstraint ;
sh:message "A Ninja cannot be their own sensei."@en ;
sh:severity sh:Violation ;
sh:select """
PREFIX naruto: <https://sensemaking-ai.com/ns/naruto#>
SELECT $this WHERE {
$this naruto:senseiOf $this .
}
""" ;
] .
Both SHACL and OWL can say "a Ninja must have a Village." They mean different things. OWL's existential restriction says "if this is a Ninja, there must exist some Village it belongs to — possibly unasserted." SHACL's minCount says "if this is a Ninja and there is no asserted Village triple, this is a validation error." Use OWL for reasoning (what can be inferred). Use SHACL for data quality (what must be present). Both can coexist in the same system serving different purposes.
When connecting your graph to an external vocabulary or dataset, the equivalence predicate you choose has consequences that compound through reasoning. The wrong choice is usually owl:sameAs applied where skos:exactMatch should go.
| Predicate | Asserts | Reasoner consequence | Use when |
|---|---|---|---|
owl:sameAs |
These two IRIs refer to the same individual — full identity | All properties from both nodes merge. Every triple about A is also about B and vice versa. | You actually intend full property merging. Rare in practice — usually too strong. |
skos:exactMatch |
These two concepts are semantically equivalent in their respective vocabularies | No property merging. Linked for vocabulary alignment purposes only. | Linking a local skill concept to an ESCO concept. Linking a Wikidata entity to your ontology individual. |
skos:closeMatch |
These two concepts are similar but not identical | No property merging. Weaker alignment signal. | A local "data engineering" skill is close to but not exactly ESCO's "apply data engineering methods." |
skos:broadMatch |
The local concept is narrower than the external concept | No property merging. Hierarchical signal for cross-vocabulary navigation. | A specific "PyTorch proficiency" skill broadly matches ESCO's "machine learning" concept. |
owl:equivalentClass |
These two classes have exactly the same extension | Instances of one class are inferred to be instances of the other. | Your naruto:Ninja class is declared equivalent to an external ontology's Fighter class — intentional class merging. |
Asserting :BarbaraHidalgo owl:sameAs wikidata:Q12345678 causes a reasoner to merge every Wikidata property into your local node — birth date, nationality, Wikipedia categories, external identifiers, everything. If your graph has a sensemaking:personalBlog property, it now transfers to the Wikidata entity. This is almost never what you want. Use skos:exactMatch instead: it signals "same referent" without triggering property inheritance. Reserve owl:sameAs for cases where you genuinely need two IRIs to behave identically under all reasoning.
The Resume Graph Explorer contains explicitly asserted skills — what someone listed on their resume. Formal inference rules can surface skills that are implied by the combination of asserted skills and ESCO's vocabulary structure, or by the duration and nature of work experience.
# Rule 1: ESCO-related exposure # If a person has skill X and X is skos:related to Y in ESCO, # infer sensemaking:hasInferredSkill ?person ?y at "exposure" level. CONSTRUCT { ?person sensemaking:hasInferredSkill ?relatedConcept . _:basis a sensemaking:InferenceBasis ; sensemaking:inferenceRule "esco-related-exposure" ; sensemaking:exposureLevel "exposure" . } WHERE { ?person sensemaking:hasSkill ?skill . ?skill skos:exactMatch/skos:related ?relatedConcept . FILTER NOT EXISTS { ?person sensemaking:hasSkill ?s2 . ?s2 skos:exactMatch ?relatedConcept . } }
# Rule 2: Long-tenure promotion # If a person held a role for 2+ years AND has a skill at "advanced", # escalate skos:related concepts from "exposure" to "demonstrated". CONSTRUCT { ?person sensemaking:hasInferredSkill ?relatedConcept . _:basis a sensemaking:InferenceBasis ; sensemaking:inferenceRule "long-tenure-promotion" ; sensemaking:exposureLevel "demonstrated" . } WHERE { ?person sensemaking:hasEmployment ?emp ; sensemaking:hasSkill ?skill . ?skill skos:exactMatch/skos:related ?relatedConcept ; sensemaking:skillLevel "advanced" . ?emp sensemaking:startDate ?start ; sensemaking:endDate ?end . FILTER ((?end - ?start) > "P2Y"^^xsd:duration) FILTER NOT EXISTS { ?person sensemaking:hasSkill ?s2 . ?s2 skos:exactMatch ?relatedConcept . } }
Load both modules/01-foundations/artifacts/resume-graph/ttl/resume-001.ttl (the Alex Rivera resume) and modules/03-reasoning/artifacts/skill-inference/skill-inference-starter.ttl (the ESCO-related stubs and inference property declarations) into the same Fuseki dataset before running the query lab. The queries below apply the inference rules to Alex's resume data.
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX sensemaking: <https://sensemaking-ai.com/ns/> # Simulate a PersonShape validation: # - Every foaf:Person must have foaf:name (minCount 1) # - Every sensemaking:Resume must have dcterms:created (minCount 1) # Find violations: persons without a name SELECT ?person ?violation WHERE { ?person a foaf:Person . BIND("missing foaf:name" AS ?violation) FILTER NOT EXISTS { ?person foaf:name ?name . } } UNION { ?resume a sensemaking:Resume . BIND("missing dcterms:created" AS ?violation) FILTER NOT EXISTS { ?resume dcterms:created ?date . } } ORDER BY ?person
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX sensemaking: <https://sensemaking-ai.com/ns/> SELECT ?skillLabel ?hasExactMatch ?relatedCount WHERE { ?person a foaf:Person ; sensemaking:hasSkill ?skill . ?skill skos:prefLabel ?skillLabel . FILTER (LANG(?skillLabel) = "en") BIND(EXISTS { ?skill skos:exactMatch ?esco . } AS ?hasExactMatch) { SELECT ?skill (COUNT(?related) AS ?relatedCount) WHERE { OPTIONAL { ?skill skos:exactMatch/skos:related ?related . } } GROUP BY ?skill } } ORDER BY DESC(?relatedCount)
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX sensemaking: <https://sensemaking-ai.com/ns/> CONSTRUCT { ?person sensemaking:hasInferredSkill ?relatedConcept . } WHERE { ?person a foaf:Person ; sensemaking:hasSkill ?skill . ?skill skos:exactMatch/skos:related ?relatedConcept . FILTER NOT EXISTS { ?person sensemaking:hasSkill ?s2 . ?s2 skos:exactMatch ?relatedConcept . } }
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX sensemaking: <https://sensemaking-ai.com/ns/> # Skills inferred at "demonstrated" level: # person held a role for 2+ years AND the asserted skill is "advanced" CONSTRUCT { ?person sensemaking:hasDemonstratedSkill ?relatedConcept . } WHERE { ?person a foaf:Person ; sensemaking:hasEmployment ?emp ; sensemaking:hasSkill ?skill . ?skill skos:exactMatch/skos:related ?relatedConcept ; sensemaking:skillLevel "advanced" . ?emp sensemaking:startDate ?start . OPTIONAL { ?emp sensemaking:endDate ?end . } BIND(COALESCE(?end, "2026-06-03"^^xsd:date) AS ?effectiveEnd) FILTER ((?effectiveEnd - ?start) > "P730D"^^xsd:duration) FILTER NOT EXISTS { ?person sensemaking:hasSkill ?s2 . ?s2 skos:exactMatch ?relatedConcept . } }
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX sensemaking: <https://sensemaking-ai.com/ns/> # After running q03 and q04 CONSTRUCTs and loading the results: # Summary of all inferred skills with their labels SELECT DISTINCT ?skillLabel ?inferenceType WHERE { ?person a foaf:Person . { ?person sensemaking:hasInferredSkill ?concept . BIND("exposure" AS ?inferenceType) } UNION { ?person sensemaking:hasDemonstratedSkill ?concept . BIND("demonstrated" AS ?inferenceType) } ?concept skos:prefLabel ?skillLabel . FILTER (LANG(?skillLabel) = "en") } ORDER BY ?inferenceType ?skillLabel
This comparison is the consulting-positioning artifact the Module 3 README describes as "highest-leverage." The honest version of the comparison — not a promotional piece for either approach — is what makes it valuable.
| Property | Formal reasoning (OWL + SPARQL) | LLM inference |
|---|---|---|
| Reproducibility | Deterministic: same data + same rules = same result, always. | Non-deterministic: results vary across runs and model versions. |
| Coverage | Bounded by vocabulary links and rule coverage. Can miss obvious skills not in ESCO. | Broad: draws on training data. Can surface skills the vocabulary doesn't cover. |
| Explainability | Every inferred skill traces to a specific rule and evidence triple. Fully auditable. | Opaque: the reasoning is distributed across billions of parameters. Not auditable. |
| Hallucination risk | None: inferences are strictly derived from asserted data. Cannot produce a skill not in the graph. | High: LLMs can confidently assert skills that have no basis in the resume text. |
| Maintenance cost | High: rules must be written, tested, and updated as the vocabulary evolves. | Low: the model updates handle most improvement without manual rule changes. |
| Nuance | Low: rules are binary — the condition is met or it is not. Cannot handle "probably" without custom confidence scoring. | High: LLMs handle hedged, contextual, and domain-specific inferences naturally. |
Neither approach is uniformly better. Formal reasoning wins where reproducibility, auditability, and hallucination risk matter — compliance, legal, medical, regulatory. LLM inference wins where coverage, nuance, and low maintenance matter — exploratory discovery, consumer-facing features, rapid iteration. The most honest answer — and the most valuable consulting position — is knowing which context you are in and which risks your client is willing to accept. Building both and comparing them on real data (Exercise 3.4) is how you earn the right to say that confidently.
Exercise 3.4 (primary project B) builds the Jupyter notebook that demonstrates skill inference end-to-end on 5–10 resume profiles and compares formal rules to LLM inference. The query lab above provides the formal rules side. For the LLM side:
For each resume: "Given the following resume data [paste plain-text resume], what skills would you infer that are NOT explicitly listed? List only skills implied by the work experience, not mentioned directly." Run this prompt with the same resume text used in the formal inference. Record both outputs.
For each profile, create a table with four columns: Skill name · Formal rules found it (yes/no) · LLM found it (yes/no) · Is it actually a reasonable inference? (your judgment). The fourth column is the ground truth — and your judgment IS the data. After 5-10 profiles, patterns emerge: formal rules miss X, LLM hallucinates Y, both agree on Z. That pattern is the blog post.
The artifact lives at modules/03-reasoning/artifacts/skill-inference/ as a Jupyter notebook. The Module 3 README says this connects directly to the existing narrative synthesizer work — the comparison result is positioning gold for the consulting practice.
Section 2 (Data Shapes) and Section 5 (Core Constraint Components) are the most relevant. The SPARQL-based constraints (§8) are the advanced feature used in the NinjaShape circular-reference example.
Labra Gayo et al. — open access. The most complete treatment of SHACL and ShEx available. Chapters 3–5 cover the constraint model and validation semantics. Keep open as a reference during Exercise 3.4.
Chapter 13 covers vocabulary alignment and the sameAs / exactMatch distinction in production contexts. The examples are from life sciences and finance — different domains, same design decisions.
ESCO-related concept stubs, inference property declarations, and the three inference rule templates as comments. Load alongside resume-001.ttl before running the query lab.
The ESCO portal's Linked Data download and API. For Exercise 3.4, download the ESCO skills taxonomy as RDF and load it into Fuseki to replace the stubs with real concept IRIs and real skos:related links.
SPARQL UPDATE, federation, deployment, and the TwinKit Semantic v2.0 capstone. Module 4 depends on having the Naruto ontology (Module 2) and either reification artifact (3.3) or skill inference (3.4) ready as demo data.