This blog post shows how RESTful web services can provide, and link to, provenance data for their exposed resources by using the PROV-AQ mechanism of HTTP Link headers. This is demonstrated by showing how to update a hello world REST service implemented with Java and JAX-RS 2.0 to provide these links.
The PROV-AQ HTTP mechanism is easiest explained by an example:
This request for
http://example.com/resource.html returns some HTML, but also provides a
Link: header that says that the provenance is located at
http://example.com/resource-provenance. Within this file, the resource is known as the anchor
http://example.com/resource rather than
http://example.com/resource.html. The anchor URI can be omitted if it is the same as the one requested.
Link headers are specified by RFC 5988, which also defines standard relations like
rel="previous". PROV-AQ uses
rel="http://www.w3.org/ns/prov#has_provenance" to say that the linked resource has the provenance data for the requested resource. PROV-AQ also defines other relations for provenance query services and provenance pingback, which is not covered by this blog post.
RESTful web services, or “Web APIs“, are popular ways of exposing structured data on the web, in addition to providing simple ways to programmatically access popular services such as Twitter, Dropbox and Google+. Using REST with RDF forms the foundation for Linked Data, which has grown to become a standardized way to expose and interrelate public datasets, such as from data.gov.uk. The web standardization consortium W3C and its provenance working group has published the PROV family of specifications for describing provenance data, in particular a PROV primer that introduces the PROV data model, and PROV-O, an OWL ontology for using the PROV data model in RDF.
In order to suggest a common way to locate such provenance data for a given web resource, the provenance working group has proposed the note PROV-AQ: Provenance access and query (PAQ). This note specifies how to locate provenance by a general HTTP resource, from within an HTML document or within a RDF representation. In this blog post we demonstrate the first of these, by using HTTP Link headers.
Provenance from a RESTful service in Java
There are two branches in this project:
- master – REST service that can say hello, and return provenance of greeting
- paq – REST service that also provides link between greeting and its provenance
Below we’ll assume you have checked out the master branch with git:
To compile/run, you will need Java and Maven:
The base URI should be http://localhost:8080/paq/ unless you modify the port with
Check the HelloWorld REST service is working using your favourite HTTP client (e.g. browser or curl in a new terminal window):
You may replace Alice with any name, as long as it is URI escaped:
The code for this resource is quite straight forward. From
Note that we used the
-i parameter above to verify that the correct media-type text/provenance-notation was returned.
This PROV-N trace is generated by
HelloWorld.helloProvenance() by filling in the URIs and name in the template src/main/resources/provTemplate.txt – a more detailed provenance trace might include things like timestamps and details about who provided the name, and might, through content-negotation, be provided in different representations such as PROV-O and PROV-XML.
Our provenance method is a bit more complicated than
hello() as it generates the absolute URIs for the greeting resource (depending on the name parameter) and then build the PROV-N trace – here using a simple MessageFormat template:
Providing links to the provenance
A restful client who has requested http://localhost:8080/paq/hello/Alice will not magically know that there is a provenance trace at http://localhost:8080/paq/provenance/hello/Alice – the URI for the provenance resource could just as well have been say http://localhost:8080/about/history/1337.
As we described above, the PROV-AQ says that a resource accessed by HTTP can describe its provenance trace by adding a
Link: header with the relation
"http://www.w3.org/ns/prov#has_provenance". So in our case, this can be achieved with:
In order to provide the RESTful links we will need to insert
Link: headers in the
hello() response. As we need to return both the greeting and HTTP headers, we change our return to a Response:
We’ll inject the same
@Context UriInfo ui parameter as in
in order to find the absolute URIs needed for the provenance link:
and then build a new Link instance
This uses the fixed URI for the provenance relation:
Finally we include the new
Link: header by adding
provLink to the response builder before returning:
The prov-aq-enhanced version of hello() should then look like:
Finding the provenance links
If you have not followed the tutorial above, make sure you check out the
paq branch from https://github.com/stain/paq to include the PROV-AQ Link headers. If you are still running the web server from above, stop it with Ctrl-C.
Now change to the paq branch and restart the server:
Now retrieving the hello world resource with
-i should show us the new
Just to verify we did generate our absolute URIs right above, we follow the link:
Let’s try to do some hackish shell script to extract this URL:
Note, the above will not work if the Link header spans multiple lines, which would be legal according to HTTP 1.1 and RFC 5988.
Running this against the Alice helloworld URI should look up the PROV-AQ header, download the provenance, and then generate an SVG diagram using
An example of this diagram for Alice, converted to PNG: