Locating provenance for a RESTful web service

This blog post shows how RESTful web services can provide, and link to, provenance data for their exposed resources by using the PROV-AQ mechanism of HTTP Link headers. This is demonstrated by showing how to update a hello world REST service implemented with Java and JAX-RS 2.0 to provide these links.

The  PROV-AQ HTTP mechanism is easiest explained by an example:

GET http://example.com/resource.html HTTP/1.1
Accept: text/html
HTTP/1.1 200 OK
Content-type: text/html
Link: <http://example.com/resource-provenance&gt;;
rel="http://www.w3.org/ns/prov#has_provenance&quot;;
anchor="http://example.com/resource&quot;
<html>
<!– … –>
</html>

view raw
gistfile1.http
hosted with ❤ by GitHub

This request for http://example.com/resource.html returns some HTML, but also provides a Link: header that says that the provenance is located at http://example.com/resource-provenance. Within this file, the resource is known as the anchor http://example.com/resource rather than http://example.com/resource.html. The anchor URI can be omitted if it is the same as the one requested.

Link headers are specified by RFC 5988, which also defines standard relations like rel="previous". PROV-AQ uses rel="http://www.w3.org/ns/prov#has_provenance" to say that the linked resource has the provenance data for the requested resource. PROV-AQ also defines other relations for provenance query services and provenance pingback, which is not covered by this blog post.

Background

RESTful web services, or “Web APIs“, are popular ways of exposing structured data on the web, in addition to providing simple ways to programmatically access popular services such as Twitter, Dropbox and Google+. Using REST with RDF forms the foundation for Linked Data, which has grown to become a standardized way to expose and interrelate public datasets, such as from data.gov.uk.  The web standardization consortium W3C and its provenance working group has published the PROV family of specifications for describing provenance data, in particular a PROV primer that introduces the PROV data model, and PROV-O, an OWL ontology for using the PROV data model in RDF.

In order to suggest a common way to locate such provenance data for a given web resource, the provenance working group has proposed the note PROV-AQ: Provenance access and query (PAQ). This note specifies how to locate provenance by a general HTTP resource,  from within an HTML document or within a RDF representation. In this blog post we demonstrate the first of these, by using HTTP Link headers.

Provenance from a RESTful service in Java

The GitHub project https://github.com/stain/paq/ contains a simple Hello World service implemented in Java using  JAX-RS 2.0 backed by the library CXF.

There are two branches in this project:

  • master – REST service that can say hello, and return provenance of greeting
  • paq – REST service that also provides link between greeting and its provenance

Below we’ll assume you have checked out the master branch with git:

stain@ralph-ubuntu:~/src$ git clone https://github.com/stain/paq.git
Cloning into 'paq'…
remote: Counting objects: 127, done.
remote: Compressing objects: 100% (55/55), done.
remote: Total 127 (delta 26), reused 125 (delta 24)
Receiving objects: 100% (127/127), 35.72 KiB, done.
Resolving deltas: 100% (26/26), done.

view raw
gistfile1.txt
hosted with ❤ by GitHub

To compile/run, you will need Java and Maven:

stain@ralph-ubuntu:~/src/paq$ mvn clean jetty:run
[INFO] Scanning for projects…
[INFO] ————————————————————————
[INFO] Building Example PROV-AQ usage 0.1-SNAPSHOT
[INFO] ————————————————————————
(..)
INFO: Root WebApplicationContext: initialization completed in 2462 ms
2013-03-27 16:22:04.566:INFO::Started SelectChannelConnector@0.0.0.0:8080
[INFO] Started Jetty Server
[INFO] Starting scanner at interval of 10 seconds.

view raw
gistfile1.txt
hosted with ❤ by GitHub

The base URI should be http://localhost:8080/paq/ unless you modify the port with mvn -Djetty.port=9999

Check the HelloWorld REST service is working using your favourite HTTP client (e.g. browser or curl in a new terminal window):

stain@ralph-ubuntu:~/src/paq$ curl http://localhost:8080/paq/hello/Alice
Hello, Alice

view raw
gistfile1.txt
hosted with ❤ by GitHub

You may replace Alice with any name, as long as it is URI escaped:

stain@ralph-ubuntu:~/src/paq$ curl http://localhost:8080/paq/hello/Joe%20Bloggs
Hello, Joe Bloggs

view raw
gistfile1.txt
hosted with ❤ by GitHub

The code for this resource is quite straight forward. From Helloworld.hello():

@GET
@Path("hello/{name}")
@Produces("text/plain")
public String hello(@PathParam("name") String name) {
String greeting = "Hello, " + name + "\n";
return greeting;
}

view raw
HelloWorld.java
hosted with ❤ by GitHub

Provenance resource

This example service provide provenance, using the PROV-N format:

stain@ralph-ubuntu:~/src/paq$ curl -i http://localhost:8080/paq/provenance/hello/Alice
HTTP/1.1 200 OK
Content-Type: text/provenance-notation
Date: Wed, 27 Mar 2013 16:27:14 GMT
Content-Length: 305
Server: Jetty(6.1.26)
document
prefix hello <http://localhost:8080/paq/hello/>
prefix app <http://localhost:8080/paq/>
entity(hello:Alice)
wasDerivedFrom(hello:Alice, name)
entity(name, [ prov:value="Alice" ])
agent(app:hello, [ prov:type=prov:SoftwareAgent ])
wasAttributedTo(hello:Alice, app:hello)
endDocument

view raw
gistfile1.http
hosted with ❤ by GitHub

Note that we used the -i parameter above to verify that the correct media-type text/provenance-notation was returned.

This provenance says that the resource http://localhost:8080/paq/hello/Alice was derived from a name with value “Alice”, and made by the (web) service http://localhost:8080/paq/hello.

This PROV-N trace is generated by HelloWorld.helloProvenance() by filling in the URIs and name in the template src/main/resources/provTemplate.txt – a more detailed provenance trace might include things like timestamps and details about who provided the name, and might, through content-negotation, be provided in different representations such as PROV-O and PROV-XML.

Our provenance method is a bit more complicated than hello() as it generates the absolute URIs for the greeting resource (depending on the name parameter) and then build the PROV-N trace – here using a simple MessageFormat template:

@GET
@Path("provenance/hello/{name}")
@Produces("text/provenance-notation")
public String helloProvenance(@PathParam("name") String name,
@Context UriInfo ui) throws IOException {
// Get our absolute URI
// See http://cxf.apache.org/docs/jax-rs-basics.html#JAX-RSBasics-URIcalculationusingUriInfoandUriBuilder
UriBuilder appUri = ui.getBaseUriBuilder();
// Absolute URIs for resources we are to give provenance about
URI helloURI = appUri.path(getClass(), "hello").build(name);
// Prepare prefixes for PROV-N qualified names
URI appURI = appUri.build("").resolve("../");
URI helloPrefix = helloURI.resolve("./");
// The PROV-N qualified name for our /hello/{name} resource
String helloEntity = "hello:" + helloPrefix.relativize(helloURI);
// Simple PROV-N trace, see <http://www.w3.org/TR/prov-n/&gt;
// Here this is done in a naive way by loading a template
// from src/main/resources and do string-replace to insert
// our URIs.
String template = IOUtils.toString(getClass().getResourceAsStream("/provTemplate.txt"));
String prov = MessageFormat.format(template,
helloPrefix, appURI, helloEntity, name);
// Note: PROV-N should be be built using say the PROV Toolbox
// rather than this naive template approach!
return prov;
}
}

view raw
HelloWorld.java
hosted with ❤ by GitHub

Providing links to the provenance

A restful client who has requested http://localhost:8080/paq/hello/Alice will not magically know that there is a provenance trace at http://localhost:8080/paq/provenance/hello/Alice – the URI for the provenance resource could just as well have been say http://localhost:8080/about/history/1337.

As we described above, the PROV-AQ says that a resource accessed by HTTP can describe its provenance trace by adding a Link: header with the relation "http://www.w3.org/ns/prov#has_provenance". So in our case, this can be achieved with:

In order to provide the RESTful links we will need to insert Link: headers in the hello() response. As we need to return both the greeting and HTTP headers, we change our return to a Response:

@GET
@Path("hello/{name}")
@Produces("text/plain")
public Response hello(@PathParam("name") String name) {
String greeting = "Hello, " + name + "\n";
ResponseBuilder responseBuilder = Response.ok().entity(greeting);
return responseBuilder.build();
}

view raw
HelloWorld.java
hosted with ❤ by GitHub

We’ll inject the same @Context UriInfo ui parameter as in  helloProvenance()
in order to find the absolute URIs needed for the provenance link:

public Response hello(@PathParam("name") String name, @Context UriInfo ui) {
URI provUri = ui.getBaseUriBuilder().path(getClass(), "helloProvenance").build(name);

view raw
HelloWorld.java
hosted with ❤ by GitHub

and then build a new Link instance  provLink:

Link provLink = Link.fromUri(provUri).rel(HAS_PROVENANCE).build();

view raw
gistfile1.java
hosted with ❤ by GitHub

This uses the fixed URI for the provenance relation:

private static final String HAS_PROVENANCE = "http://www.w3.org/ns/prov#has_provenance";

view raw
gistfile1.java
hosted with ❤ by GitHub

Finally we include the new  Link: header by adding provLink to the response builder before returning:

return responseBuilder.header(HttpHeaders.LINK, provLink).build();

view raw
gistfile1.java
hosted with ❤ by GitHub

The prov-aq-enhanced version of hello() should then look like:

private static final String HAS_PROVENANCE = "http://www.w3.org/ns/prov#has_provenance";
@GET
@Path("hello/{name}")
@Produces("text/plain")
public Response hello(@PathParam("name") String name, @Context UriInfo ui) {
// TODO: Could have used Link.fromResourceMethod instead,
// but it seems to return wrong URI in current CXF
URI provUri = ui.getBaseUriBuilder().path(getClass(), "helloProvenance").build(name);
Link provLink = Link.fromUri(provUri).rel(HAS_PROVENANCE).build();
String greeting = "Hello, " + name + "\n";
ResponseBuilder responseBuilder = Response.ok().entity(greeting);
return responseBuilder.header(HttpHeaders.LINK, provLink).build();
}

view raw
HelloWorld.java
hosted with ❤ by GitHub

You may check out the paq branch to see the prov-aq-enhanced HelloWorld.java.

Finding the provenance links

If you have not followed the tutorial above, make sure you check out the paq branch from https://github.com/stain/paq to include the PROV-AQ Link headers. If you are still running the web server from above, stop it with Ctrl-C.

Now change to the paq branch and restart the server:

stain@ralph-ubuntu:~/src/paq$ git checkout paq
Switched to branch 'paq'
stain@ralph-ubuntu:~/src/paq$ mvn clean jetty:run
[INFO] Scanning for projects…
[INFO]
[INFO] ————————————————————————
[INFO] Building Example PROV-AQ usage 0.1-SNAPSHOT
(..)
2013-03-27 16:56:29.565:INFO::Started SelectChannelConnector@0.0.0.0:8080
[INFO] Started Jetty Server
[INFO] Starting scanner at interval of 10 seconds.

view raw
gistfile1.txt
hosted with ❤ by GitHub

Now retrieving the hello world resource with -i should show us the new Link: header:

stain@ralph-ubuntu:~/src/paq$ curl -i http://localhost:8080/paq/hello/Alice
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Wed, 27 Mar 2013 16:59:46 GMT
Link: <http://localhost:8080/paq/provenance/hello/Alice&gt;;rel=http://www.w3.org/ns/prov#has_provenance
Content-Length: 13
Server: Jetty(6.1.26)
Hello, Alice

view raw
gistfile1.txt
hosted with ❤ by GitHub

Just to verify we did generate our absolute URIs right above, we follow the link:

stain@ralph-ubuntu:~/src/paq$ curl http://localhost:8080/paq/provenance/hello/Alice
document
prefix hello <http://localhost:8080/paq/hello/&gt;
prefix app <http://localhost:8080/paq/&gt;
entity(hello:Alice)
wasDerivedFrom(hello:Alice, name)
entity(name, [ prov:value="Alice" ])
agent(app:hello, [ prov:type=prov:SoftwareAgent ])
wasAttributedTo(hello:Alice, app:hello)
endDocument

view raw
gistfile1.txt
hosted with ❤ by GitHub

Let’s try to do some hackish shell script to extract this URL:

stain@ralph-ubuntu:~/src/paq$ curl -s -I http://localhost:8080/paq/hello/Alice |
grep ^Link:.*has_provenance | sed 's/.*<//' | sed 's/>.*//'
http://localhost:8080/paq/provenance/hello/Alice

view raw
gistfile1.txt
hosted with ❤ by GitHub

Note, the above will not work if the Link header spans multiple lines, which would be legal according to HTTP 1.1 and RFC 5988.

If we now have a ProvToolbox installed, we can generate a diagram. The below shell script paq2svg.sh assumes that toolbox-0.1.3-release.zip was unzipped into $HOME/software/provToolbox:

#!/bin/bash
set -e
PROVTOOLBOX="$HOME/software/provToolbox"
PROVCONVERT="$PROVTOOLBOX/bin/provconvert"
provUri=`curl -s -I $1 | grep ^Link:.*has_provenance | sed 's/.*<//' | sed 's/>.*//' | head -n 1`
provn=/tmp/$$.provn
curl -s -o $provn $provUri
svg=/tmp/$$.svg
cd $PROVTOOLBOX
$PROVCONVERT -infile $provn -outfile $svg
echo "Created $svg"

view raw
paq2svg.sh
hosted with ❤ by GitHub

Running this against the Alice helloworld URI should look up the PROV-AQ header, download the provenance, and then generate an SVG diagram using provconvert:

stain@ralph-ubuntu:~/src/paq$ ./paq2svg.sh http://localhost:8080/paq/hello/Alice
InteropFramework run() -> {hello=http://localhost:8080/paq/hello/, app=http://localhost:8080/paq/}
log4j:WARN No appenders could be found for logger (org.openprovenance.prov.interop.InteropFramework).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
ProvToDot role no label
ProvToDot role no label (by default)
Created /tmp/21903.svg

view raw
gistfile1.txt
hosted with ❤ by GitHub

An example of this diagram for Alice, converted to PNG:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s