Locating provenance for a RESTful web service

This blog post shows how RESTful web services can provide, and link to, provenance data for their exposed resources by using the PROV-AQ mechanism of HTTP Link headers. This is demonstrated by showing how to update a hello world REST service implemented with Java and JAX-RS 2.0 to provide these links.

The  PROV-AQ HTTP mechanism is easiest explained by an example:

This request for http://example.com/resource.html returns some HTML, but also provides a Link: header that says that the provenance is located at http://example.com/resource-provenance. Within this file, the resource is known as the anchor http://example.com/resource rather than http://example.com/resource.html. The anchor URI can be omitted if it is the same as the one requested.

Link headers are specified by RFC 5988, which also defines standard relations like rel="previous". PROV-AQ uses rel="http://www.w3.org/ns/prov#has_provenance" to say that the linked resource has the provenance data for the requested resource. PROV-AQ also defines other relations for provenance query services and provenance pingback, which is not covered by this blog post.

Background

RESTful web services, or “Web APIs“, are popular ways of exposing structured data on the web, in addition to providing simple ways to programmatically access popular services such as Twitter, Dropbox and Google+. Using REST with RDF forms the foundation for Linked Data, which has grown to become a standardized way to expose and interrelate public datasets, such as from data.gov.uk.  The web standardization consortium W3C and its provenance working group has published the PROV family of specifications for describing provenance data, in particular a PROV primer that introduces the PROV data model, and PROV-O, an OWL ontology for using the PROV data model in RDF.

In order to suggest a common way to locate such provenance data for a given web resource, the provenance working group has proposed the note PROV-AQ: Provenance access and query (PAQ). This note specifies how to locate provenance by a general HTTP resource,  from within an HTML document or within a RDF representation. In this blog post we demonstrate the first of these, by using HTTP Link headers.

Provenance from a RESTful service in Java

The GitHub project https://github.com/stain/paq/ contains a simple Hello World service implemented in Java using  JAX-RS 2.0 backed by the library CXF.

There are two branches in this project:

  • master – REST service that can say hello, and return provenance of greeting
  • paq – REST service that also provides link between greeting and its provenance

Below we’ll assume you have checked out the master branch with git:

To compile/run, you will need Java and Maven:

The base URI should be http://localhost:8080/paq/ unless you modify the port with mvn -Djetty.port=9999

Check the HelloWorld REST service is working using your favourite HTTP client (e.g. browser or curl in a new terminal window):

You may replace Alice with any name, as long as it is URI escaped:

The code for this resource is quite straight forward. From Helloworld.hello():

Provenance resource

This example service provide provenance, using the PROV-N format:

Note that we used the -i parameter above to verify that the correct media-type text/provenance-notation was returned.

This provenance says that the resource http://localhost:8080/paq/hello/Alice was derived from a name with value “Alice”, and made by the (web) service http://localhost:8080/paq/hello.

This PROV-N trace is generated by HelloWorld.helloProvenance() by filling in the URIs and name in the template src/main/resources/provTemplate.txt – a more detailed provenance trace might include things like timestamps and details about who provided the name, and might, through content-negotation, be provided in different representations such as PROV-O and PROV-XML.

Our provenance method is a bit more complicated than hello() as it generates the absolute URIs for the greeting resource (depending on the name parameter) and then build the PROV-N trace – here using a simple MessageFormat template:

Providing links to the provenance

A restful client who has requested http://localhost:8080/paq/hello/Alice will not magically know that there is a provenance trace at http://localhost:8080/paq/provenance/hello/Alice – the URI for the provenance resource could just as well have been say http://localhost:8080/about/history/1337.

As we described above, the PROV-AQ says that a resource accessed by HTTP can describe its provenance trace by adding a Link: header with the relation "http://www.w3.org/ns/prov#has_provenance". So in our case, this can be achieved with:

In order to provide the RESTful links we will need to insert Link: headers in the hello() response. As we need to return both the greeting and HTTP headers, we change our return to a Response:

We’ll inject the same @Context UriInfo ui parameter as in  helloProvenance()
in order to find the absolute URIs needed for the provenance link:

and then build a new Link instance  provLink:

This uses the fixed URI for the provenance relation:

Finally we include the new  Link: header by adding provLink to the response builder before returning:

The prov-aq-enhanced version of hello() should then look like:

You may check out the paq branch to see the prov-aq-enhanced HelloWorld.java.

Finding the provenance links

If you have not followed the tutorial above, make sure you check out the paq branch from https://github.com/stain/paq to include the PROV-AQ Link headers. If you are still running the web server from above, stop it with Ctrl-C.

Now change to the paq branch and restart the server:

Now retrieving the hello world resource with -i should show us the new Link: header:

Just to verify we did generate our absolute URIs right above, we follow the link:

Let’s try to do some hackish shell script to extract this URL:

Note, the above will not work if the Link header spans multiple lines, which would be legal according to HTTP 1.1 and RFC 5988.

If we now have a ProvToolbox installed, we can generate a diagram. The below shell script paq2svg.sh assumes that toolbox-0.1.3-release.zip was unzipped into $HOME/software/provToolbox:

Running this against the Alice helloworld URI should look up the PROV-AQ header, download the provenance, and then generate an SVG diagram using provconvert:

An example of this diagram for Alice, converted to PNG:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s