The four query forms — Sub-module 1.2 · Sensemaking Semantic Web

01 · Why a graph needs its own query language

SQL thinks in tables. SPARQL thinks in patterns.

A relational database stores rows and columns. SQL joins them. An RDF store keeps triples — subject, predicate, object. SPARQL matches patterns against those triples the way a regular expression matches text: flexible, composable, and ignorant of storage layout.

The conceptual move is straightforward. A SPARQL graph pattern is a set of triple templates where any position can be a variable or a fixed term. The engine finds every assignment of values to variables that makes all templates simultaneously true against the data. That is the entire core idea. Everything else — filtering, aggregating, sorting, constructing new graphs — is built on top of that matching step.

Where SQL asks "give me all rows from the Orders table where customer_id = 42," SPARQL asks "give me every ?subject that has a foaf:knows predicate to at least one ?object whose name matches 'Alice.'" The difference is not just syntax. RDF has no schema that predetermines which predicates a node might have. SPARQL's pattern matching is the only way to navigate that open-ended structure.

The pattern-matching intuition

A triple pattern is a template with one or more variables. Wrap a set of them in a WHERE clause and the engine finds every combination of values that matches all of them at once:

SELECT ?name ?jutsu
WHERE {
  ?ninja a             sensemaking:Ninja ;
         schema:name   ?name ;
         sensemaking:hasJutsu ?jutsu .
}

Three triple patterns. The engine binds ?ninja, ?name, and ?jutsu to every combination in the graph that satisfies all three simultaneously. A ninja with two jutsu appears twice — one row per jutsu. This is not a flaw; it is how triple-pattern matching works, and it matters when you use OPTIONAL.

Coming from Neo4j

SPARQL triple patterns map roughly to Cypher's MATCH clause. The key difference: Cypher's graph model lets you put properties on edges (-[:KNOWS {since: 2020}]->). In SPARQL, "since 2020" requires a separate triple — either a new node (event-class pattern) or RDF-star annotation. That is the same trade-off as Sub-module 1.1.

02 · The endpoint model

A SPARQL endpoint is just an HTTP service.

You write a query in a .rq file (or a text box). You POST it to an endpoint URL. The endpoint returns results — a table of variable bindings for SELECT, a boolean for ASK, a new RDF graph for CONSTRUCT or DESCRIBE. The protocol is standard; the results format is negotiated via HTTP content-type headers.

The same query syntax works against any SPARQL-compliant endpoint. What changes is the data behind it and any endpoint-specific extensions — Wikidata's SERVICE wikibase:label is the most common example.

The two endpoints you'll use most

Local Fuseki runs on your machine at localhost:3030. Fast, private, no rate limits. Data you load is yours to query — the Naruto and mythology datasets from the starter kit, your own Turtle files. This is the workbook environment.

Wikidata's query service at query.wikidata.org is a public, read-only SPARQL endpoint over one of the largest structured knowledge graphs on the web. No account needed. Rate-limited (queries must complete in 60 seconds). Wikidata uses the standard SPARQL protocol with a few proprietary extensions — most importantly SERVICE wikibase:label, which resolves human-readable labels for any entity. Section 5 has three runnable queries.

Why Wikidata uses Q-numbers

Wikidata identifies entities by opaque IRIs like wd:Q49108 (MIT). Human-readable labels are separate rdfs:label values — often dozens of them in different languages. The SERVICE wikibase:label extension picks the right one automatically. It is a convenience wrapper, not part of the SPARQL standard.

03 · Query anatomy

Every SELECT query has the same four parts.

A SELECT query broken into its named components:

① PREFIX declarations — short aliases for full IRIs
PREFIX schema:      <https://schema.org/>
PREFIX sensemaking: <https://sensemaking-ai.com/ns/example#>

② SELECT clause — which variables to project into the result table
SELECT ?name ?teamName

③ WHERE clause — graph pattern that must match
WHERE {
  ?ninja a sensemaking:Ninja ;
         schema:name ?name ;
         sensemaking:memberOfTeam ?team .
  ?team schema:name ?teamName .
}

④ Solution modifiers — post-processing on the matched rows
ORDER BY ?teamName ?name
LIMIT 50

① PREFIX declarations

PREFIX lines define short aliases so you can write schema:name instead of <https://schema.org/name>. They are purely syntactic — not stored, not inferred, not shared between queries. Every query that needs an IRI must declare its own PREFIX for it. The curriculum's standard prefix block is the starting point for every file.

② SELECT clause

Lists the variables to include in the result table. Use SELECT * to project all bound variables. Use SELECT DISTINCT to remove duplicate rows — useful when the same combination appears through multiple matching paths. You can also compute expressions: SELECT (COUNT(?ninja) AS ?total).

③ WHERE clause

The graph pattern. Each line inside WHERE { } is a triple pattern — subject, predicate, object, where any position can be a variable (?x) or a fixed term (IRI or literal). All triple patterns in the same block must match simultaneously for a row to appear in results. OPTIONAL patterns relax this: OPTIONAL { ?ninja sensemaking:hasJutsu ?j } keeps the ninja in results even if they have no jutsu — ?j is simply unbound for those rows.

④ Solution modifiers

Applied after the pattern matching, in this order: GROUP BY (aggregate into groups) → HAVING (filter on aggregated values) → ORDER BY (sort) → OFFSET / LIMIT (paginate). You can use all, some, or none.

The semicolon shorthand

Multiple predicates for the same subject can be chained with semicolons: ?ninja schema:name ?name ; sensemaking:hasJutsu ?jutsu . is equivalent to two separate triple patterns with the same subject. This is Turtle syntax carried into SPARQL WHERE clauses — same rule, same effect.

04 · The four query forms

Same language, four different outputs.

The WHERE clause works identically across all four forms. What changes is what the query does with the matched solutions.

SELECT

Returns a table of variable bindings

The everyday form. Each matched solution becomes one row; each projected variable becomes one column. Most SPARQL tooling defaults to showing SELECT results as a table or JSON object.

ASK

Returns true or false

Tests whether any solution to the WHERE clause exists. Faster than SELECT ... LIMIT 1 and more readable when existence is the only question. Use it for validation, guards, and assertions.

CONSTRUCT

Returns a new RDF graph

Maps matched solutions onto a triple template to produce new triples. Use it to derive inferred relationships, transform data between vocabularies, or materialize a view of the graph for external consumption.

DESCRIBE

Returns all known facts about a URI

Returns an RDF graph containing everything the endpoint knows about one or more resources. What "everything" means is implementation-defined — Fuseki returns all triples where the URI appears as subject. Good for exploration; less useful when you need a precise result shape.

SELECT in depth

The most important thing to understand about SELECT is how solution cardinality works. Each row in the result represents one way of satisfying all the triple patterns simultaneously. If a ninja has two jutsu, the same ninja appears twice — once per jutsu. If you want one row per ninja, you have two options: aggregate with COUNT/GROUP_CONCAT, or remove the jutsu triple pattern and accept that jutsu data isn't in this query.

# Returns N rows — one per (ninja, jutsu) combination
SELECT ?name ?jutsuLabel WHERE {
  ?ninja schema:name ?name ;
         sensemaking:hasJutsu ?j .
  ?j skos:prefLabel ?jutsuLabel .
}

# Returns one row per ninja, jutsu concatenated
SELECT ?name (GROUP_CONCAT(?jutsuLabel; separator=", ") AS ?jutsu)
WHERE {
  ?ninja schema:name ?name .
  OPTIONAL { ?ninja sensemaking:hasJutsu ?j . ?j skos:prefLabel ?jutsuLabel . }
}
GROUP BY ?name

ASK in depth

ASK is an assertion, not a lookup. The most important thing to know about it: a result of false does not mean the thing does not exist. Under the open-world assumption, it means the pattern was not satisfied by data in this endpoint at this moment. "No data asserting X" is not the same as "X is false."

# true if any ninja has the Sharingan in this dataset
ASK
WHERE {
  ?ninja sensemaking:hasJutsu sensemaking:Sharingan .
}

# false — but that means only "not asserted here"
ASK
WHERE {
  ?ninja sensemaking:hasJutsu sensemaking:FireStyleJutsu .
}

CONSTRUCT in depth

CONSTRUCT takes a template — triples with variable positions — and fills it in for each matched solution. The result is a new RDF graph, not a table. In Fuseki, switch the result format dropdown to Turtle to read it.

# Infer "colleague" from shared team membership
CONSTRUCT {
  ?a sensemaking:colleagueOf ?b .
}
WHERE {
  ?a sensemaking:memberOfTeam ?team .
  ?b sensemaking:memberOfTeam ?team .
  FILTER (STR(?a) < STR(?b))
}

CONSTRUCT is how you materialize inferred triples that a reasoner would derive — useful when you want to precompute results for performance, or when you're transforming data from one vocabulary to another without running a full reasoner.

DESCRIBE in depth

DESCRIBE is the most exploratory form. Give it a URI and it returns whatever the endpoint considers "relevant" about that resource. Fuseki's default is all triples where the URI is the subject. Some endpoints include incoming triples too (triples where the URI is the object).

# Everything the endpoint knows about Kakashi
DESCRIBE sensemaking:KakashiHatake

# DESCRIBE also accepts WHERE clauses
DESCRIBE ?ninja
WHERE {
  ?ninja sensemaking:senseiOf sensemaking:NarutoUzumaki .
}

DESCRIBE is most useful when you do not yet know what predicates a resource has — it is the "tell me everything" query. Once you know the shape of the data, SELECT is almost always clearer and more precise.

Which form to use

Default to SELECT. Use ASK when you need a boolean check. Use CONSTRUCT when you need to produce new triples or transform data. Use DESCRIBE when exploring unknown data — find out what predicates exist, then write a targeted SELECT.

05 · Three Wikidata queries

Real data, public endpoint, no setup.

These three queries run against Wikidata's public SPARQL endpoint at query.wikidata.org. Copy each one, paste it into the query editor, click Run. Each one also matches an exercise from Exercise 1.1.

Wikidata uses wd: for entity IRIs (wd:Q49108 = MIT) and wdt: for property IRIs (wdt:P69 = "educated at"). The SERVICE wikibase:label block is a Wikidata extension that resolves human-readable labels — without it you see raw IRIs. These conventions are Wikidata-specific; standard SPARQL endpoints do not have them.

WQ1

Books written by Adrian Tchaikovsky, most recent first

Pattern: SELECT · single property · ORDER BY year · OPTIONAL for date

PREFIX wd:  <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd:  <http://www.bigdata.com/rdf#>

SELECT ?book ?bookLabel ?pubYear WHERE {
  ?book wdt:P50 wd:Q21050585 .  # P50 = author; Q21050585 = Tchaikovsky
  OPTIONAL {
    ?book wdt:P577 ?pubDate .
    BIND(YEAR(?pubDate) AS ?pubYear)
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
ORDER BY DESC(?pubYear)

Notice: ?bookLabel is never defined in the WHERE clause — the label SERVICE populates it automatically from the ?book IRI. Also notice OPTIONAL for ?pubYear: some items lack a publication date, so without OPTIONAL they would drop from the results entirely.

WQ2

People who studied at MIT and have a Wikipedia article

Pattern: two triple patterns · sitelinks filter · LIMIT

PREFIX wd:  <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd:  <http://www.bigdata.com/rdf#>
PREFIX schema: <https://schema.org/>

SELECT ?person ?personLabel WHERE {
  ?person wdt:P69 wd:Q49108 .   # P69 = educated at; Q49108 = MIT
  ?article schema:about ?person ;
           schema:inLanguage "en" ;
           schema:isPartOf <https://en.wikipedia.org/> .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
ORDER BY ?personLabel
LIMIT 30

WQ3

Occupations in Wikidata whose English label contains "data"

Pattern: FILTER with CONTAINS · LANG filter · rdfs:label

PREFIX wd:   <http://www.wikidata.org/entity/>
PREFIX wdt:  <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?occ ?label WHERE {
  ?occ wdt:P31 wd:Q28640 ;      # P31 = instance of; Q28640 = occupation
       rdfs:label ?label .
  FILTER (LANG(?label) = "en")
  FILTER (CONTAINS(LCASE(?label), "data"))
}
ORDER BY ?label

This query uses standard SPARQL's rdfs:label with a LANG filter rather than the label SERVICE — both approaches work on Wikidata. Notice LCASE() before CONTAINS(): without it, "Data scientist" and "data scientist" are different strings. Case normalization is a common SPARQL defensive pattern.

When the Wikidata endpoint times out

Wikidata queries must complete in 60 seconds. Complex patterns without LIMIT, or patterns that hit very large sets (all people, all items), will time out. Always add a LIMIT clause when exploring. The helper UI at query.wikidata.org shows example queries for common domains — modify those rather than starting from scratch.

06 · Pain points

Where SPARQL surprises developers.

The LANG tag trap. If a resource has multilingual labels — schema:name "Olympians"@en, "Caelestes"@la — a query that binds ?name to that predicate without a LANG filter returns two rows per resource. Aggregations double. Counts are wrong. Always add FILTER (LANG(?name) = "en") when the data has language-tagged literals and you only want one language. The workbook drills this repeatedly.
OPTIONAL is a left join, not an existence check. OPTIONAL { ?ninja sensemaking:hasJutsu ?j } keeps the ninja in the result with ?j unbound if they have no jutsu. But if they have two jutsu, they appear twice — once per jutsu. Row multiplication is the most common OPTIONAL surprise. Use GROUP_CONCAT or restructure to avoid it.
Blank nodes are opaque. Blank node identifiers (_:b0) in query results cannot be referenced in subsequent queries. If your data uses blank nodes for intermediate structure (anonymous Employment events, reification nodes), you will see them in DESCRIBE output but cannot construct further patterns from them without going through their connecting predicates first.
Plain SPARQL does not see OWL inferences. If your ontology declares sensemaking:rivalOf as owl:SymmetricProperty, a plain SPARQL query against the raw data sees only explicitly asserted triples. The symmetric inverse is not there unless you either materialize it with CONSTRUCT, enable an OWL-aware reasoner, or add both directions to the data. Module 3 covers the reasoning layer. For now: what SPARQL sees is what is in the store, nothing more.
Endpoint availability is not guaranteed. Wikidata's public endpoint goes down, rate-limits heavy queries, and has occasional schema changes. Design applications to handle failures. For the curriculum, always have a local Fuseki fallback.

07 · Next

What to do with this.

If you are working through the curriculum

The Sub-module 1.2 workbook (1-2-workbook-naruto.html or the mythology variant) has five queries that go beyond the 1.1 workbook: DESCRIBE, UNION, property paths, FILTER NOT EXISTS, and VALUES. Start Fuseki first, then work through each query card before opening the "Think about this" prompts. Prediction before execution is the habit that makes SPARQL stick.

If you want to go deeper now

DuCharme's Learning SPARQL chapters 1–2 cover the foundations with more examples than this page has room for. The W3C SPARQL 1.1 spec's overview section is denser but authoritative on edge cases. And the Wikidata query service helper UI has dozens of working example queries to study and modify.

08 · Resources

Curated links.

DuCharme — Learning SPARQL (2nd ed., 2013), Chapters 1–2. The canonical reference for the curriculum. Clear, example-driven, honest about edge cases. learningsparql.com has companion examples; the author's blog at bobdc.com tracks current SPARQL and KG developments.
W3C — SPARQL 1.1 Query Language specification. Authoritative on every edge case. Not a tutorial, but essential for when you hit something surprising. The Grammar section is also the clearest complete description of query syntax. w3.org/TR/sparql11-query/
Wikidata Query Service — query.wikidata.org Public SPARQL endpoint over the largest open knowledge graph. The built-in example library is the fastest way to learn Wikidata's naming conventions. Open an example, modify it, break it deliberately, fix it.
Allemang, Hendler & Gandon — Semantic Web for the Working Ontologist (3rd ed.), Chapters 1–3. Not a SPARQL tutorial, but provides the mental model that makes SPARQL make sense: why the triple is the only primitive, why open-world reasoning changes what queries mean.
Module 1 README — exercise list and primary project hook. If you have not yet started Exercise 1.3 (Resume Graph RDF slice), this is the moment. The SPARQL queries you have been running on Naruto and mythology data map directly to the three queries in the exercise — same patterns, your own resume data.