Installing ProvToolbox on macOS

ProvToolbox is a useful command line tool for validating and visualizing PROV documents, but unfortunately it can be a bit of a challenge to install on Windows and on macOS because of its dependency requirements.

This post suggests three step-by-step methods of installing ProvToolbox on your Mac – you should follow the method you feel most comfortable with, but can try the other methods in case of problems.

Continue reading “Installing ProvToolbox on macOS”

Installing ProvToolbox in Windows

While there are several tools available for validating and visualizing PROV, the ProvToolbox is perhaps the most useful for validating PROV-N syntax. However, the normal releases does not run in Windows due to a operating system restriction for command line and folder path length.

We have suggested a fix, but while we wait for that, here we describe a patch build that should work on Windows. We also show how to install dependencies: Java for executing ProvToolbox, and Graphviz for visualization. (See also macOS install).

Continue reading “Installing ProvToolbox in Windows”

Multiple agents sharing roles

Assuming the task of writing provenance for a student group exercise, consider the question:

Do we need to assign everyone in the group a specific role since in our group we found that for many of the tasks, everyone worked together to complete it?

MSc Student in Understanding Data and their Environment, University of Manchester, 2020

This blog post explores the different PROV patterns that could describe this scenario.

Continue reading “Multiple agents sharing roles”

PAV Ontology paper highly accessed


Our recent paper about the PAV ontology has been classified as highly accessed by Journal of Biomedical Semantics, with more than 1097 views since it was published two months ago, with an Altmetric score of 12.

The PAV ontology provides a lightweight approach to record typical Provenance, Authorship and Versioning information, and builds upon existing standards like PROV-O and DC Terms.

Our previous Practical Provenance post gives a brief overview of PAV, but you might also want to explore these links for more details:

Resources that change state

The PROV working group received a question from Mike:

My understanding is that an entity referenced in a PROV bundle (e.g. via wasGeneratedBy) must be in the bundle…but I do not wish to duplicate entity definitions through out my bundles. My entities are long lived and will exist in multiple bundles.

So lets say I have a resource for alarms which contains a list of all alarms my company monitors. If I turn off the alarm at alarm/1, my understanding is that in PROV a new entity is created for the new state of alarm/1. But in my actual data store, I don’t create a new record, I just toggle a flag.

So there is a disconnect between how my PROV looks and how my data looks. This is by design is my understanding. So I would have a new entity in my prov for the alarm/1 in the new state which is a specialization of alarm/1, yes?

Ultimately, I want to display all of the provenance for alarm/1 so I can see its history from creation to invalidation. Am I going about this the wrong way?

Here is my reply (slightly revised for this post). My examples use the Turtle syntax and PROV-O, but are also applicable to other serializations of PROV, like PROV-XML or PROV-JSON.

Continue reading “Resources that change state”

PROV released as W3C Recommendations

The Provenance Working Group was chartered to develop a framework for interchanging provenance on the Web. The Working Group has now published the PROV Family of Documents as W3C Recommendations, along with corresponding supporting notes. You can find a complete list of the documents in the PROV Overview Note. PROV enables one to represent and interchange provenance information using widely available formats such as RDF and XML. In addition, it provides definitions for accessing provenance information, validating it, and mapping to Dublin Core. Learn more about the Semantic Web.

@prefix prov: <> .
<#quote> prov:wasQuotedFrom <> .

This means the PROV data model and specifications are released and official recommendations, and can be used as a stable platform for expressing and exploring provenance data across the web.

Practically speaking, this blog would recommend you start with the the PROV primer, followed by the tutorial and then PROV-O for LinkedData/RDF/OWL (alternatively PROV-XML for XML or alternatively PROV-JSON for JSON). For deeper understanding and definition of the PROV concepts, see the PROV datamodel.

Locating provenance for a RESTful web service

This blog post shows how RESTful web services can provide, and link to, provenance data for their exposed resources by using the PROV-AQ mechanism of HTTP Link headers. This is demonstrated by showing how to update a hello world REST service implemented with Java and JAX-RS 2.0 to provide these links.

The  PROV-AQ HTTP mechanism is easiest explained by an example:

Accept: text/html
HTTP/1.1 200 OK
Content-type: text/html
Link: <;;
<!– … –>

view raw


hosted with ❤ by GitHub

This request for returns some HTML, but also provides a Link: header that says that the provenance is located at Within this file, the resource is known as the anchor rather than The anchor URI can be omitted if it is the same as the one requested.

Link headers are specified by RFC 5988, which also defines standard relations like rel="previous". PROV-AQ uses rel="" to say that the linked resource has the provenance data for the requested resource. PROV-AQ also defines other relations for provenance query services and provenance pingback, which is not covered by this blog post.

Continue reading “Locating provenance for a RESTful web service”

Recording authorship, curation and digital creation with the PAV ontology

PAV is a lightweight ontology for tracking Provenance, Authoring and Versioning. PAV supplies terms for distinguishing between the different roles of the agents contributing content in current web based systems: contributors, authors, curators and digital artifact creators. The ontology also provides terms for tracking provenance of digital entities that are published on the web and then accessed, transformed and consumed.

PAV version 2.1.1 was released on 2013-03-27, making PAV an extension of the W3C provenance ontology PROV-O, thus  enabling interoperability between PAV and PROV-compliant tools such as ProvToolbox.



Note: PAV does not define any classes, and the PAV properties do not put any explicit restrictions on their domain/ranges. Therefore the classes above, like “another resource”, are only for illustration of typical use. The diagram above does not show data properties attached to resources, like pav:createdOn.


Here’s an example of using PAV:

@prefix xsd: <> .
@prefix pav: <; .
@prefix foaf: <; .
@prefix prov: <> .
@prefix : <> .
pav:createdBy :alice ;
pav:createdWith :wordpress ;
pav:importedFrom <; ;
pav:importedBy :csv2html ;
pav:authoredBy :bob ;
pav:curatedBy :charlie ;
pav:authoredOn "2012-12-24T15:15:15Z"^^xsd:dateTime ;
pav:importedOn "2013-03-27T10:06:17Z"^^xsd:dateTime .
:alice foaf:name "Alice" .
:bob foaf:name "Bob" .
:charlie foaf:name "Charlie" .
:csv2html a prov:SoftwareAgent ;
foaf:homepage <; .
:wordpress a prov:SoftwareAgent ;
foaf:homepage <; .

view raw

hosted with ❤ by GitHub

Continue reading “Recording authorship, curation and digital creation with the PAV ontology”

Tutorial on the W3C PROV family of specifications

Provenance, a form of structured metadata designed to record the origin or source of information, can be instrumental in deciding whether information is to be trusted, how it can be integrated with other diverse information sources, and how to establish attribution of information to authors throughout its history.

The PROV set of specifications, produced by the World Wide Web Consortium (W3C), is designed to promote the publication of provenance information on the Web, and offers a basis for interoperability across diverse provenance management systems. The PROV provenance model is deliberately generic and domain-agnostic, but extension mechanisms are available and can be exploited for modelling specific domains.

Paolo Missier, Khalid Belhajjame and James Cheny gave a tutorial at the EDBT conference on 2013-03-20 in Genova, Italy. The tutorial provided an account of these specifications. Starting from intuitive and informal examples that present idiomatic provenance patterns, it progressively introduces the relational model of provenance along with the constraints model for validation of provenance documents, and concludes with example applications that show the extension points in use.

Tutorial material

The tutorial is in three parts, each about 30 minutes long, and consists of the following material:

There is also a short paper describing the motivation, structure and content of the tutorial, published in the EDBT’13 proceedings: The W3C PROV family of specifications for modelling provenance metadata, Paolo Missier, Khalid Belhajjame, and James Cheney