Installing ProvToolbox on macOS

ProvToolbox is a useful command line tool for validating and visualizing PROV documents, but unfortunately it can be a bit of a challenge to install on Windows and on macOS because of its dependency requirements.

This post suggests three step-by-step methods of installing ProvToolbox on your Mac – you should follow the method you feel most comfortable with, but can try the other methods in case of problems.

Table of content

  1. Overview of requirements
    1. Software packaging for macOS
  2. Conda
    1. Installing Graphviz and OpenJDK with Conda
  3. HomeBrew
    1. Installing Graphviz with HomeBrew
    2. Installing OpenJDK with HomeBrew
  4. Installing manually
    1. Installing AdoptOpenJDK manually
    2. Installing Graphviz manually
  5. Installing ProvToolbox
    1. Using ProvToolbox from VSCode

Overview of requirements

As of 2020-08, ProvToolbox 0.9.5 is the latest release, which requires:

  • Java JRE 9 or later. We’ll show how to install OpenJDK 11 which is the closest Long-Term Support release.
  • Graphviz 2.x (for visualization)

There is also an outdated installer of ProvToolbox for macOS, which is currently not recommended. In this guide we’ll show how to install ProvToolbox 0.9.5 from ZIP along with its requirements.

Warning: This guide has been developed for Intel-based Macs, if you are using the new ARM64 Mac you will either need to use compatibility modes or install/compile these individual dependencies manually.

Software packaging for macOS

Traditionally software installations on macOS are either drag-drop Application bundles (as from the Mac App Store or dmg disk images) or the wizard-based pkg installation packages, which can modify the Operating System.

Both of these approaches require the software to bundle all its dependencies, or that the user carefully install matching dependencies in order. As many open source software packages, particularly for the command line and programming, rely heavily on other software libraries, this traditional approach can become cumbersome or fragile over time.

As an alternative, several software packaging initiatives have evolved that work with macOS, including:

  • Conda
    Initially centered around supporting multiple Python binaries and Jupyter Notebook, the Anaconda repository provides a large selection of pre-compiled open source software tools and libraries, particularly for data science and academic users. Also available for Windows and Linux. Conda can activate different environments which is useful for reproducibility or if installing multiple versions of the same software.
  • HomeBrew
    A recipe-based repository of pre-compiled software, including most general-purpose open source you will want to install in macOS, but also useful for adding newer versions of software already installed in macOS. Usually fast and straight forward to use as long as HomeBrew itself installs correctly, and packages are kept up to date.
  • MacPorts
    Similar to HomeBrew, providing a large selection of general open source packages. Compiles from source code locally, which can be time-consuming and requires installation of build tools like XCode. Useful for experimenting with bleeding edge versions of software.

These systems work similar to Linux distributions like Ubuntu or Debian, in that installing a particular package will also install its dependencies. However the packages and binaries are installed under a dedicated folder, typically under /usr/local or /Users/alice/miniconda3 within the user home directory. This alternative root will have traditional POSIX folders corresponding to their operating system counter-parts under /usr like bin/ lib/ and share/ — allowing the package system to rely on runtimes like Python, independent of what Apple may provide in that particular macOS version.

These packaging systems therefore typically modify the system variable PATH for the current user, so that the tools installed can take preference over the commands included with the operating system.

Note: You should not need to use sudo to install software within the chosen packaging system.

Warning: If you are already using programming languages like Java, Python or Ruby, be aware that after installing a packaging system according to this guide, your PATH may pick up a different version of runtimes/compilers, which may require you to reinstall any additional libraries you use. To check which path is resolved, use the shell built-in type:

$ type python
python is /home/stain/miniconda3/bin/python


We found that using the package manager Conda gave the most consistent results for installing the dependencies of ProvToolbox. The large selection of packages in Anaconda repository is also useful for data science purposes, such as using Jupyter Notebook or R.

While there are multiple ways to install Conda on macOS, we here show the Miniconda method using the Terminal.

First open the Terminal application from Applications/Utilities:

To install Miniconda, run these two commands:

curl -O


As installation finishes, ensure you say yes to allow conda init to modify your PATH.

Here the Conda base environment have been installed under /Users/testuser/miniconda3 however we need to start a new terminal to activate it.

Note: If you are on macOS Catalina or newer, the default shell is now zsh. Substituting testuser below you may need to run the equivalent of:

source /Users/testuser/miniconda3/bin/activate

conda init zsh

Installing Graphviz and OpenJDK with Conda

First, in a new terminal window, check that the conda command is working by searching for Graphviz:

conda search graphviz

We can install graphviz and OpenJDK 11 at the same time using:

conda install graphviz openjdk=11

After Conda has resolved dependencies, answer y to install:

After download and installation has finished, verify both GraphViz and Java work:

dot -V
java -version

You can now skip to the section on installing ProvToolbox.


HomeBrew is a popular package management system for macOS that can help with installing pre-built open source software. However, installing and using HomeBrew itself is not always trivial. This section is provided as an alternative to the Conda method above.

Warning: In our testing we found HomeBrew did not work using an older macOS 10.11. If you are using the newest macOS version on compatible hardware, you are free to try this approach, which can be useful later as HomeBrew adds a convenient way to install many other data science tools in recent versions, e.g. R, LaTeX, Snakemake.

Installing HomeBrew

First open the Terminal application, found under Applications/Utilities.

Following the instructions on the HomeBrew home page, paste this command line on a single line to start installing, providing your user password for administrator rights:

/bin/bash -c "$(curl -fsSL"

The defaults for installing are usually fine:

Note: If you get a similar warning about old macOS version, using HomeBrew may not work well on your machine (as we found in our testing). Try instead the Conda method described above.

After installing HomeBrew you may get a warning about shallow clone, this can be ignored unless you are developing your own brew recipes.

Installing GraphViz with Homebrew

To activate brew it should be sufficient to start a new Terminal window. Test this by doing running:

brew search graphviz

In this case you are ready to install:

brew install graphviz

On a good day the above should complete and you would be able to verify the installation of graphviz:

dot -V

However on our test machine we got a stack trace error indicating a bug in Homebrew itself. This is why this method of Homebrew is not recommended for older Macs.

Installing OpenJDK with HomeBrew

Warning: This section has not been tested.

AdoptOpenJDK is a community effort for packaging binary installers/packages of the open source Java implementation OpenJDK, avoiding restrictive licenses and registration requirements.

To install OpenJDK 11 from AdoptOpenJDK with HomeBrew try:

brew tap AdoptOpenJDK/openjdk

brew cask install 

Verify the Java version:

java -version

If both Graphviz and Java are working, you can now skip to the section on installing ProvToolbox.

Installing manually

Installing AdoptOpenJDK manually

AdoptOpenJDK is a community effort for packaging binary installers/packages of the open source Java implementation OpenJDK, avoiding restrictive licenses and registration requirements.

While using a packaging system can help you keep your OpenJDK install updated, if you were unable to use Conda or HomeBrew to install Java, as an fallback it is also possible to download the standalone AdoptOpenJDK installer.

From the AdoptOpenJDK make sure you select:

  • Version: Open JDK 11 (LTS)
  • JVM: HotSpot
  • OS: macOS
  • Architecture: x64

If these options are not available, select them from the Other Platforms page. Download the pkg installer of the JRE.

Walk through the installer and use the default settings.

Now open a new Terminal from Applications/Utilities

The installer should have adjusted your PATH. To check the installed version, run:

java -version

If you don’t get the correct Java version you will need to adjust your PATH and/or JAVA_HOME environmental variables.

Installing Graphviz manually

Compiling and installing Graphviz from source code is a non-trivial task on macOS. Some outdated pkg installers of Graphviz have been made but we have not tested these. A recent blog post details how graphviz can be compiled using brew dependencies, but this should only be needed for PDF support compared to brew install graphviz described above. If you already use MacPorts then try sudo port install graphviz

Installing ProvToolbox

Following the ProvToolbox install instructions for “Other Platforms”, download the

Opening with Archive Utility will unzip to your Downloads directory. From there, move the ProvToolbox directory to your Applications folder:

Next we will add the provconvert command line to your PATH. First open the Terminal from Applications/Utilities:

Become the root user and carefully run:

sudo -i

cd /etc/paths.d/

echo /Applications/ProvToolbox/bin > ProvToolbox


Tip: You can use the Tab button to auto-complete the paths.

To activate the new PATH, either restart Terminal or log out of macOS and in again. Now verify with:

provconvert -version

You can now start validating and visualizing PROV-N with ProvToolbox.

Tip: Convert to png and use the open command to preview the diagram:

You can use the commands cd and ls to change directory and list directories in the Terminal. If you are unfamiliar with navigating the shell, you may find it easiest to save the provn files directly in your home directory.

Using ProvToolbox from VSCode

While you may use an editor like Application/Utilities/TextEdit bundled with macOS for creating PROVN files, you may find an editor like VSCode more convenient, particularly as it allows opening an embedded terminal. After installing, try View -> Terminal in the menu.

If you convert to png you can preview the diagram within VSCode:

Installing ProvToolbox in Windows

While there are several tools available for validating and visualizing PROV, the ProvToolbox is perhaps the most useful for validating PROV-N syntax. However, the normal releases does not run in Windows due to a operating system restriction for command line and folder path length.

We have suggested a fix, but while we wait for that below we describe a patch build that should work on Windows. You will also need Java for executing ProvToolbox, and Graphviz for visualization.

Install Java for Windows

You will need Java JRE 9 or later to run ProvToolbox 0.9.5, so below we show how to install JRE 11 LTS on Windows. Do not install from as that provides the older Java 8, which unfortunately do not work with this ProvToolbox release.

Unfortunately there are quite a few alternatives for installing Java in Windows:

Oracle provide installers of JDK 11 for Windows x64, however they are distributed under a restrictive license you or your organization may have reservations against using. These are however straight forward to install if you only want to run ProvToolbox for personal or development use.

The official open source release of OpenJDK 11 for Windows do not provide an installer, and are fairly large. If you choose to install this you will need to modify your PATH system environment variable manually depending on where you extract the folder jdk-11 to.

Alternatively RedHat provide OpenJDK installers based on the open source JDK, we would recommend jre- MSI or newer. The MSI packages are installable by double-click in newer versions of Windows. You will need to create a RedHat account to download.

Finally, AdoptOpenJDK is a community project for providing user-friendly open source builds of Java. We found option this to be the easiest to install in Windows, as they provide a MSI installer of the smaller JRE distribution, and do not require registration. However the website needs some help navigating as they provide many alternatives.

Installing AdoptOpenJDK

We recommend this open source option for installing Java for Windows users.

From select Other Platforms, giving the full list of downloads. Make sure you select:

  • Version: Open JDK 11 (LTS)
  • JVM: HotSpot
  • Operating System: Windows
  • Architecture: x64 (64-bit Windows)
    • If you use 32-bit Windows, say on an older or smaller machine, try x86 instead

Download the JRE as a MSI installer (the ZIP file does not include an installer).

Before you open the MSI you may need to change the Windows Store settings to allow installing applications. From the start menu type Add or Remove Programs to open the Settings pane for Choose where to get apps. Select Anywhere.

Now double-click the MSI file from the Download folder and walk through the Installer. Enable the “Set JAVA_HOME environment variable” option.

Now open the Command Prompt window:

Run java -version to check you now got OpenJDK 11 installed on the PATH.

Install ProvToolbox (patched)

Now we will install ProvToolbox 0.9.5 including the suggested fix so it runs in Windows. Until this fix has been accepted, for now we’ll use the ProvToolbox 0.9.5 for Windows patch release from the fork

Download and, after opening, click Extract All:

Shorten the path to install the folder to your home directory, e.g. C:\Users\stain

You should now have a folder ProvToolbox in your home directory:

You should now be able to run ProvToolbox\bin\provconvert from the Command Prompt open in your home directory:

Note that if you change the directory with cd you will need to modify the path to provconvert. To avoid that we’ll add it to the System Environment Variable PATH. In the Start menu, type PATH and select Settings -> Edit the system environment variables.

Click System Environment, then under User variables select PATH and click Edit. (If not found, click New and set Variable Name as PATH). Add a new line (or set Variable Value) where you can use Browse to navigate to find the full path of the ProvToolbox\bin folder:

After applying the settings with OK you will need to close and restart the Command Prompt window. This time provconvert -version should work from any path.

Install Graphviz for Windows

ProvToolbox can also generate visualization, but for this we need to install the open source tool Graphviz.

Following the link for downloading Stable Windows Install Packages we are unfortunately thrown into an undocumented directory browsing. We found that the cmake/Release/x64 installer worked well on 64-bit Windows (for older 32-bit Windows, try Win32).

You will need to allow the unsifned package to install

When running the install wizard for GraphViz, make sure you enable to add GraphViz to the system PATH for the current user:

Close and restart the Command Prompt Window. This time dot -V (notice capital) should work:

Unfortunately the installer of GraphViz does not initialize the plugins for graphical formats. To fix this we need to open another Command Prompt, but running as Administrator:

In this window, run dot -c to initialize the GraphViz plugins.

Installing Visual Studio Code

VSCode is a free and lightweight text editor which is useful for writing *.provn files. The User Installer for 64-bit Windows should work for most users.

Note: After installing VSCode you will need to Restart Windows to pick up the updated PATH for ProvToolbox.

Running provconvert from VSCode

Remember to save PROV-N files (example) with the extension .provn so that ProvToolbox can recognize the file type. The first time you may need to select No Extension and add .provn yourself to the filename.

In VSCode’s menu, select Terminal -> New Terminal, an embedded command prompt should open in the correct directory. Check with dir *.provn that the file is present with the correct name.

You should now be able to convert the provn file to a PNG image, in our example using the command:

provconvert -infile alice.provn -outfile alice.png

You will be able to navigate and open the PNG image within VSCode. See validating and visualisation for details.

If you have made a syntactical error in the PROV-N file, then provconvert report errors by line numbers, which you can recognize within the VSCode editor. Note that VSCode has no native PROVN support and so its Problems tab is currently unable to detect these errors.

Remember an error early in the file, such as a broken prefix, or a missing ), may cause phantom errors to be reported later in the file.

Attribution vs association

A valid question when writing provenance in responsibility view and process view is:

Should we attribute contributors from entities, isn’t that what the activities are showing?

Specially when roles it may seem unnecessary to also declare wasAttributedTo statements.

It is true that you can conclude from:

wasAttributed(ex:entity, ex:agent)

then there was some activity X such that:

wasGeneratedBy(ex:entity, X)
wasAssociatedWith(X, ex:entity)

This conclusion follows from the constraint on agents and the definition of wasAttributedTo.

However you can’t conclude the opposite directions – someone might be associated with an activity but not be responsible for a particular entity generated by that activity.

Imagine three family members writing Christmas cards:


wasAssociatedWith(ex:Alice, ex:WritingCards, prov:role=['ex:Writing'])
wasAssociatedWith(ex:Bob, ex:WritingCards, prov:role=['ex:Writing'])
wasAssociatedWith(ex:Charlie, ex:WritingCards, prov:role=['ex:Addressing'])


wasGeneratedBy(ex:cardToEve, ex:WritingCards)
wasGeneratedBy(ex:cardToFrank, ex:WritingCards)
wasGeneratedBy(ex:cardToMalory, ex:WritingCards)

Now from this we may want to give the cards attribution to the individual agents, and while we can perhaps assume Charlie has written the address on all the cards, we don’t know if it was Alice or Bob who
wrote the particular card going to Frank.

We therefore can add more specifically:

wasAttributedTo(ex:cardToEve, ex:Alice)
wasAttributedTo(ex:cardToFrank, ex:Alice)
wasAttributedTo(ex:cardToMalory, ex:Bob)

.. and for completeness:

wasAttributedTo(ex:cardToEve, ex:Charlie)
wasAttributedTo(ex:cardToFrank, ex:Charlie)
wasAttributedTo(ex:cardToMalory, ex:Charlie)

This highlights that responsibility view and process view can complement each other.

It is part of our modelling to decide what is considered “work height” in terms of attributions. Perhaps we should not consider Charlie putting on addresses as part of writing the cards, if what mattered was the personalized message, not how the cards were posted.

It is possible to also assign prov:type to wasAttributedTo relations, e.g. ex:authorship, but this can easily become overlapping with the roles in associations with the corresponding activity. This is perhaps more useful when the activity is not explicitly declared, or we want to declare explicit contributor roles using metadata models like CRediT.

Multiple agents sharing roles

Assuming the task of writing provenance for a student group exercise, consider the question:

Do we need to assign everyone in the group a specific role since in our group we found that for many of the tasks, everyone worked together to complete it?

If you all worked together without distinguishing roles, then you can either assign the same role to each agent (showing you shared that role) or not have any roles.

prefix ex
prefix terms
activity(ex:writingCards, [prov:type='terms:CardWriting'])
wasAssociatedWith(ex:Alice, ex:writingCards,
wasAssociatedWith(ex:Bob, ex:writingCards,

While the role terms:Writing above does not really tell us much as the activity is already of type terms:CardWriting, the distinction is somewhat that not saying the role is “We don’t know / didn’t say what work they did” while explicit role is saying “They did the same work, but we have not broken it down further“.

In a way a role is a short-circuiting device to avoid having infinite recursion into very tiny activities – it allows multiple things to happen in the same activity.

Let’s say Charlie helped with putting addresses on the envelopes, in our simple one-activity modelling we can associate them with a separate role:

wasAssociatedWith(ex:Charlie, ex:writingCards, 

Now we know who to blame if the cards end up on the North pole!

It is a modelling question to use a single activity with multiple agents associated, potentially with different roles; or to use multiple activities with simpler associations, perhaps not needing roles. One thing to keep in mind is that more activities will also need more entities to represent the different states, e.g. ex:unwrittenCard1, ex:writtenCard1, ex:addressedCard1.


Rather than associating multiple agents to a single activity it is possible to make a new agent that represent the whole group, in which case it is a choice to not break down to individual engagement.

For instance this would be correct if recording a board room decision, where the group of Alice, Bob and Charlie together decided/stated something they may agreed with or be responsible for individually.

PROV does not itself define how to specify the make-up of groups – but
it has agent types prov:Person, prov:Organization and
prov:SoftwareAgent that you can assign using prov:type.

A student group might not sound like an organization, but it is the
closest match of the three. We also see that there is an equivalent type Organization in the vocabulary, with looser sub-types including PerformingGroup (a band or orchestra), Project (someone working towards a shared goal) or even CollegeOrUniversity (the University as a whole).

While PROV literature does not cover well how to use external vocabularies like, we can take advantage of such types and attributes in PROV-N and express group membership using, adding a parentOrganization relation to the University for good measure.

prefix s <>
prefix ex <>

agent(ex:UoM, [prov:label="University of Manchester",

agent(ex:RedGroup, [prov:label="Red Group",

The keen reader will notice that the above example repeats some attribute names to express multiple relations of prov:type and s:member – both of these are unordered.


As an alternative, PROV has the concept of Collections.

Although collections are mainly used for describing compound entities, such as a dataset containing samples, as agents can concurrently be entities
they could also be used for showing group members.

Tip: Treating a agent as an entity can be of interest if your provenance need to show its evolution over time, for instance changes to group membership. In this case, note that you will need multiple agent identifiers depending on which “version” of the group was associated with particular activities across time, or use specialization to make a “timeless” entity.

Using prov:Collection would perhaps feel more PROV native than use above, but without any additional attributes like parentOrganization:

prefix ex <>

agent(ex:Red, [prov:type='prov:Organization'])
entity(ex:Red, [prov:type='prov:Collection'])

hadMember(ex:Red, group:Alice)
hadMember(ex:Red, group:Bob)

This choice between vocabularies is I’m afraid part of data modelling — there will never be a single perfect metadata model, and so we have to either pick the one that is most useful, or find a way to combine both approaches. (Although verbose, the above should work both ways)

Sometimes two metadata models can be in conflict in terms of granularity, in which case you should use different identifiers, in some cases even different documents.

What are good PROV-N prefixes?

Most examples of PROV-N use example prefixes like:

prefix ex <>
prefix exg <>

These example domains are explicitly reserved globally for all kinds of examples and training material, and deliberately do not have any content, advertisement or affiliations.

Assume you are writing the provenance of a student group exercise, should you be using the prefix/namespace ex and to define agents/entities/relationship and your own attribute types?

You can keep using, but ideally you should define your
own namespace based on a Web URL you “control” or “own”. This would make your Linked Data identifiers globally unique.

Students at The University of Manchester can publish their own home pages, and luckily most universities still provide a similar facility. Our students can search up themselves on and find for instance that they are which means they could make any sub-page under that directory.

Now the page does not need to exist – this blog is not about making Web pages – it is just important that it could exist at that address – that means it is that person’s identifier space.

So we could imagine that Alice made two files under her directory:

    Imagine this was a page defining new terms to be used in PROV
    Imagine this page was the group exercise as text (which provenance is described by PROV), or the PROV-N file itself (self-describing)

In real life you would also have to deal with file extensions like .html or server configuration for content types, see Tim Berners-Lee’s Cool URIs don’t change for further considerations.

As above it is considered good practice to separate the prefix/namespace for new roles/attributes that you define vs specific agents/entities in one particular provenance trace. The idea being that the general terms could be reused in multiple provenance documents with the same meaning.

So let’s map in the above documents as namespaces in PROV-N:

prefix terms <>
prefix group <>

The prefix string usually is a short name somewhat matching the address, it just needs be unique and consistent within the same PROV document. Other documents can freely map the same namespace to a different prefix.

A diligent reader might notice the # at the end of the namespace – this “fragment” is in Web documents used to indicate a subsection or heading within the same file.

This is so that terms:student expands to
rather than end with alice.davidson/termsstudent – even if we have not made the pages now we would not want to make a separate page for each word.

In some cases separate pages are desirable, in which case the namespace
ends in / to mark a directory – for instance s:Person becomes as have decided their term list is too big to keep in a single page.

Now we can use these prefixes for identifying
agents/entities/activities, as well as using our own attributes, roles
and types.

agent(group:Alice, [prov:type='prov:Person',

wasAssociatedWith(group:Alice, group:studying,

It is customary that attributes start with lower case after :, while
types/roles start with Capital – but this is just stylistic.

As for entity identifiers, your terms and roles should not have spaces or special characters in them as they must be combineable with the namespace to a valid URI, e.g. camel case favouriteDay or underscore favourite_day

It is possible in PROV-N (but not easily in other PROV syntaxes) to use freehand roles/types strings like prov:role="Writing carefully" – these are kind of anonymous and cannot be assumed to mean the same thing across multiple PROV documents.

In many cases there is no suitable URI yet, in which case using a temporary namespace like is perfectly valid as a working example, just be aware of the risk of other people having the same namespace idea!

Validating and visualising PROV

One of the advantages of W3C PROV having a common data model is that it can be serialized, or written out, in multiple file formats. The PROV family of W3C specifications describe mappings PROV-XML and PROV-O (which, being based on OWL2 itself has multiple serializations, for Linked Data including RDF formats Turtle and JSON-LD.

In addition to these standard approaches we also have PROV-JSON and PROV-JSONLD which could be well-suited for Web applications. All of these can in theory be mapped to each-other through the common PROV Data Model and the use of URIs as Linked Data global identifiers.


PROV also specifies its own language, PROV-N, a text-based file format that most closely represent the PROV Data Model. This representation is used by the PROV Primer to explain the PROV types (entity/agent/activity) and their relationships (e.g. wasAttributedTo). For example:

  prefix ex <>
  prefix s <>
  entity(ex:dataset, [ prov:type='s:Dataset' ])
  activity(ex:composing, [ prov:type='ex:Composing', 
     prov:label="Composing region and data" ])

       [ prov:type='prov:Person', s:givenName="Derek", 
         s:email="" ])

  used(ex:composing, ex:dataset, 2011-11-16T16:00:00)
  used(ex:composing, ex:regionList, -)
  wasGeneratedBy(ex:composition, ex:composing,
  wasAssociatedWith(ex:composing, ex:derek, -)
  wasAttributedTo(ex:composition, ex:derek)


The above PROV-N can be rendered as a diagram:

Let’s go through the PROV-N line by line:

  • prefix maps ex to the URI namespace starting with
    • PROV identifiers like ex:dataset1 can be expanded to a Linked Data global identifier (which in an ideal world would describe or perhaps let you download the dataset)
    • External vocabularies like can be reused, e.g. the property 's:givenName' expands with the prefix s: to form the URI
    • prefix prov <> is implicit, and is the internal namespace for PROV types and attributes.
    • Tip: It is possible to declare default <> after which ex:regionList can be shortened to regionList, however it is recommended to always use explicit prefixes to ease reuse and combination of PROV-N files.
  • entity(ex:regionList) declares the existence of an entity with that identifier. It can thereafter be used in relationships expecting an entity.
  • The entity ex:dataset is similarly declared, but also assigning a more specific type, using from the external vocabulary.
  • The activity ex:composition is typed using an ad-hoc type ex:Composing from our own namespace, but also adds a string attribute to give a more descriptive label.
  • The agent identified as ex:derek is described with attributes from
  • Relationships like used goes backwards in time
    • The activity ex:composing used the pre-exsting entity ex:dataset.
    • The usage happened on 16 Nov 2011 at 16:00 (given in ISO8601 date-time format)
  • The second use, this time of ex:regionList, has a placeholder - indicating that the required PROV-N argument for date-time is unknown.
  • The relationship wasGeneratedBy also points backwards in time, the new entity ex:composition was generated by the activity ex:composing some time later, at 16:45.
  • wasAssociatedWith indicates that our agent Derek took part in the ex:composing activity, with placeholder – as we don’t know when.
  • wasAttributedTo says Derek was (at least partially) invovled in generating the composition.

Some subtleties about PROV-N worth mentioning:

  • Like other Linked Data representations, PROV has an open world assumption, meaning that statements given may be a partial description of the actual provenance.
    • Additional statements carrying new knowledge can always be added, as long as they don’t break semantic constraints.
  • Statements can be listed in any order
    • It is convention to use a chronological partial order old…new so that the last lines in PROV represent the newest events.
  • Entities, Agents and Activities should be explicitly declared as such.
    • By convention declarations can be grouped together towards the top (as in above example)
    • Alternatively, a declaration can be listed just before the first reference to its identifier in other statements.
    • If a relationship is not showing up in PROV visualization, ensure it has the correct declaration.
  • Identifiers are globally unique according to the prefix mapping to URI namespaces.
    • Use of namespaces is legal for examples/prototypes/training, but is at danger of collision if PROV graphs are combined.
    • To encourage Linked Data, as a minimum use a namespace leading to a human readable page, appending #
    • For instance PROV entities described within this blog post could use:
      prefix pp <>
  • An entity can’t concurrently be an activity.
    • However an agent could concurrently be an entity or an activity

Two immediate questions arise when faced with this “new” syntax and language for provenance:

  1. How can we validate its syntax and the correct use of PROV types and arguments to PROV relations?
  2. How can we convert from/to PROV-N and file formats that are more accessible programmatically, such as PROV-JSONLD or PROV-O in Turtle?

PROV tooling

KCL’s lists PROV supporting tools and libraries, including: ProvToolbox (Java), Prov Python, ProvJS. These libraries can be used by developers for generating or consuming PROV from within a programmatic environment like Jupyter Notebook or a data management application.

In addition there are graphical tools for PROV editing, validation, conversion and visualization described below:

PROV-N editor

The PROV-N Editor is an online text editor that provides syntax highlighting and autocomplete for PROV-N, and is useful for beginners new to PROV-N.

Screenshot of as of 2020-11-13

Note that the starting example PROV-N aims to be somewhat complete, including the advanced use of nested bundle .…. endBundle block, //comments and deliberate invalid statements (shown in red).

We recommend using the PROV-N Editor starting with a simpler example, and to use copy-paste to save the PROV-N locally to a file, using a text editor like Visual Studio Code (which unfortunately do not have syntax highlighting for PROV-N):

PROV-N example from above edited in VSCodium

Note: The file extension for PROVN is .provn, but you may use .provn.txt to ensure it opens in a text editor. Do not edit PROV-N in a text processor like Microsoft Word, as its binary format .docx (actually a structured ZIP archive of XML files) is not parseable by PROV tools; in addition text processors may provide unhelpful assistance such as changing “quotes” to “curly quotes” which are not part of PROV-N syntax.

Validating PROV

Although the PROV-N editor does syntax highlighting and can detect glaring mistakes such as invalid file comments, it does not do deeper inspection to detect mistakes such as missing commas, mismatches parentheses, wrong or missing argument to PROV relations. You may also accidentally have added logically inconsistent statements, such as:

  prefix ex <>

  wasDerivedFrom(ex:results, ex:data)
  wasDerivedFrom(ex:data, ex:interviews)
  wasDerivedFrom(ex:interviews, ex:results)

While the above “scruffy” PROV-N file is syntactically valid, and each of the statements are OK semantically, as a whole we seem to have added a semantic violation of causality; an entity can’t be generated from entities not yet existing. An attempt to draw the above as a diagram will show an endless loop of derivations:

To ensure your PROV-N is both syntactically valid and semantically consistent, it is best to use a PROV validator.

PROV Validator

The PROV validator can support PROV-N; remember to tick the correct syntax, specially when pasting rather than uploading a file with the correct extension.

The checks performed by the PROV Validator mainly focus on semantic constraints such as correct typing and ensuring provenance goes backwards in time without any causality loops (e.g. you can’t be your own grandparent).

Unfortunately we have found that the PROV Validator service occasionally does not detect syntactic PROV-N errors, for instance if we delete the placeholder argument ,- from the wasGeneratedBy statement above it is silently accepted by this validator, even though the timestamp is required by PROV-N definition of used. If there are syntactic errors the user is not provided with line-numbers of where the error might be.

Therefore we also recommend using the PROV Toolbox command line tool to validate the PROV-N syntax before using the PROV Validator.

PROV Toolbox

The PROV Toolbox is a Java library for consuming and generating PROV, but it also includes a versatile command line tool that can do:

  • Validation
  • Conversion
  • Merging
  • Visualization
  • Generate PROV from templates

See PROV Toolbox tutorials for further information.

Installing PROV Toolbox

To use the command line tool, the PROV Toolbox must be installed locally on a desktop/laptop computer.

Installation requirements lists what is needed for compiling and development. For the command line tool we’ve found it is sufficient to have:

Binary packages of PROV Toolbox are included for Linux (RedHat/Centos, Debian/Ubuntu) and macOS although they are not always updated.

Note: Installing Java and PROV Toolbox in Windows users requires a series of steps that are detailed separately.

After installing or unzipping to a subdirectory you should be able to run its provconvert or bin/provconvert command:

(base) stain@biggie:~/software/ProvToolbox$ bin/provconvert -help
usage: provconvert [-allexpanded] [-bindformat <string>] [-bindings
       <file>] [-bindver <int>] [-builder] [-compare <file>] [-config]
       [-debug] [-flatten] [-formats] [-generator <string>] [-genorder]
       [-help] [-index] [-infile <file>] [-informat <string>] [-layout
       <string>] [-location <location>] [-log2prov <file>] [-merge <file>]
       [-namespaces <file>] [-outcompare <file>] [-outfile <file>]
       [-outformat <string>] [-package <package>] [-template <string>]
       [-templatebuilder <file>] [-title <string>] [-verbose] [-version]
 -allexpanded,--allexpanded                  In template expansion,
                                             generate term if all
                                             variables are bound.
 -bindformat,--bindformat <string>           specify the format of the
 -bindings,--bindings <file>                 use given file as bindings
                                             for template expansion
                                             (template is provided as
 -bindver,--bindver <int>                    bindings version
 -builder,--builder                          template builder
 -compare,--compare <file>                   compare with given file
 -config,--config                            get configuration
 -debug,--debug                              print debugging information
 -flatten,--flatten                          flatten all bundles in a
                                             single document (to used with
                                             -index option or -merge
 -formats,--formats                          list supported formats
 -generator,--generator <string>             graph generator
 -genorder,--genorder                        In template expansion,
                                             generate order attribute. By
                                             default does not.
 -help,--help                                print this message
 -index,--index                              index all elements and edges
                                             of a document, merging them
                                             where appropriate
 -infile,--infile <file>                     use given file as input
 -informat,--informat <string>               specify the format of the
 -layout,--layout <string>                   dot layout: circo, dot
                                             (default), fdp, neato, osage,
                                             sfdp, twopi
 -location,--location <location>             location of where the
                                             template resource is to be
                                             found at runtime
 -log2prov,--log2prov <file>                 fully qualified ClassName of
                                             initialiser in jar file
 -merge,--merge <file>                       merge all documents (listed
                                             in file argument) into a
                                             single document
 -namespaces,--namespaces <file>             use given file as declaration
                                             of prefix namespaces
 -outcompare,--outcompare <file>             output file for log of
 -outfile,--outfile <file>                   use given file as output
 -outformat,--outformat <string>             specify the format of the
 -package,--package <package>                package in which bindings
                                             bean class is generated
 -template,--template <string>               template name, used to create
                                             bindings bean class name
 -templatebuilder,--templatebuilder <file>   template builder
 -title,--title <string>                     document title
 -verbose,--verbose                          be verbose
 -version,--version                          print the version information
                                             and exit

Here is an example of converting from provn to RDF Turtle.

(base) stain@biggie:~/software/ProvToolbox$ bin/provconvert -infile test.provn -outfile test.ttl

The example output is valid RDF and uses the same prefixes in a different notation. (This kind of output can be loaded in Triple stores like Jena Fuseki for further queries).

Note that as a UNIX-like tool, no output from provconvert means the conversion was successful. We can use provconvert for validation, even if we do not need the translated file. If the provn has syntax errors, this will be reported as:

(base) stain@biggie:~/software/ProvToolbox$ bin/provconvert -infile test.provn -outfile test.ttl
13:46:42,100  WARN Utility:35 - test.provn line 12:34 mismatched input ')' expecting ','

This tells us that in line 12, position 34, PROV-N expected an additional argument (the – placeholder) instead of the closing character ).

If you have installed Graphviz dot you can also make SVG or PNG images:

bin/provconvert -infile test.provn -outfile test.svg

Note that on Windows you would need to modify the PATH system variable for GraphViz to work, see installing PROV Toolbox for Windows.

PROV Store

PROV Store allows uploading of PROV documents, conversion and visualization. It is recommended to edit and validate PROV-N files with the methods listed above before uploading, as the PROV Store can be more picky on compliance with the PROV standards.

There seems to be a bug in email notifications not being sent when registering, so use the big “Register for free account” on which lets you straight in. Hack: For a second registration if email link has not been received, make a username like fred14 and add +14 to your email address:

Tracking versions with PAV

The PAV ontology specializes the W3C PROV-O standard to give a lightweight approach to recording details about a resource, giving its Provenance, Authorship and Versioning. Our  paper on PAV explores all of these aspects in details. In this blog post we would like to discuss Versioning as modelled by PAV.

Versioning is commonly used for software releases (e.g. Windows 8.1, Firefox 26, Python 3.3.2), but increasingly also for datasets and documents. For the purpose of provenance, a version number allows the declaration of the current state of a resource, which can be cross-checked against release notes and used for references, for instance to indicate which particular version of a dataset was used in producing an analysis report.

Versions in PAV are quite straight forward. For our working example, let’s look at the official releases of the PAV ontology itself. Note that PAV is intended for describing any kind of web resource (e.g. documents, datasets, diagrams), not just ontologies, but we’ll use this example as it allows us to explore versioning both from a document and a technical perspective.

Version numbers

So as an example, some versions of the PAV 2.x series (skipping patch versions for now):


The property pav:version gives a human-readable version string. Note that there is no particular requirements on this string, we could just as well have labelled the versions “red”, “blue” and “green”.

Semantic versioning

Rather than arbitrary version strings, a numeric major.minor.patch version number following semantic versioning rules are a bit easier to understand, and come with explicit promises that help predict backward and forward compatibility. What would classify as a major/minor/patch change really depend on the nature of a resource and its role, and although these rules are written for software they also apply well to a range of resources. For instance:

  • Changing the font of the Coca-Cola logo would mean a new major version, e.g. from 1.1.5 to 2.0.0
  • Adding a new paragraph to a legal document means incrementing the minor version, e.g. from 2.2.1 to 2.3.0
  • Fixing grammar in a chemistry lab report would increment the patch version, e.g. from 2.4.0 to 2.4.1
    • Changing a single chemical symbol in a formula would however be a minor increment (changing the reaction), e.g. from 2.4.1 to 2.5.0
  • In software, adding a new function to an API or a new command line option means incrementing the minor version, e.g. from 2.5.0 to 2.6.0
  • For a web mail service, removing the “Reply To All” button would be a new major version (removes functionality), e.g. from 2.6.0 to 3.0.0
  • Removing a column from a dataset would usually mean incrementing the major version (as this could break functionality for anyone depending on that column), e.g. from 3.5.1 to 4.0.0
    • Adding more rows would be a minor change (as it would scientifically speaking be an updated dataset), e.g from 4.0.0 to 4.1.0
    • Fixing a particular cell that was wrongly formatted as a number rather than a date would just be a patch change, e.g. from 4.1.0 to 4.1.1

Many resources such as a regular home page or an Excel spreadsheet of expenses does not have any formal versioning process, and probably won’t really benefit much from semantic versioning, in which case the best options would often be increasing numbers (“19”, “20”, “21”) or ISO-8601 date/time stamps (“2013-12-24”, “2013-12-28”, “2014-01-02 15:04:01Z”) – both which can easily be generated by software without needing any understanding of the nature of the change.

Making versions retrievable

In the figure above, each versioned resource have their own URI to allow you to retrieve that particular version. Although there is no requirement for such availability, it can be quite beneficial for several reasons, particularly combined with semantic versioning. For instance, the way we have deployed our ontology means that if you wanted to use PAV version 2.1 without any terms introduced in 2.2 or later, then you can use to consistently download (or programmatically import) the ontology as it was in version 2.1.

(Side note: We deliberately have not versioned the PAV namespace, so pav:version expands to no matter which ontology version was loaded. To avoid misunderstandings such as we removed the trailing / in the version URI from 2.1 onwards).

Ordering previous versions

Now, a computer seeing these three resources would not know they are ordered 2.0, 2.1, 2.2, or not even that they are related at all. With PAV we can add the pav:previousVersion property:

PAV versions

Note how pav:previousVersion goes directly between the resources, in PAV the ‘previous version’ is not a free standing tag separate from the resource, but an actual copy or snapshot of the versioned resource as it was in that state. This eventually forms a chain of versioned resources, here providing the lineage of version 2.3 through 2.2 and 2.1 to 2.0. In PAV, pav:previousVersion is meant to be used as a functional property (pointing at a single resource); this means that for any given resource, only the exactly previous version is stated directly, to find any earlier versions you can follow the chain.

In the picture above I have pencilled in a PAV version 2.3 as a draft, to highlight that pav:previousVersion is purely a way to show the version lineage from a given resource, and not as prescribing as dcterms:replaces, which specifies a related resource that is supplanted, displaced, or superseded by the described resource. The authority of when a resource is ready to supersede its previous version is often separate from its version lineage. We’ll come back to the “current version” later in this blog post. Note that since making this figure, PAV 2.3 has actually been released. :-)

Providing provenance for each version

One advantage of having each versioned resource explicit, beyond being able to retrieve them, is that you can attach additional properties, reflecting the state of each version. For instance, for a dataset, each version can have its own provenance of how they had been prepared:

PAV dataset
Example of using PAV to version datasets, showing the provenance of each individual version. doi:10.6084/m9.figshare.894329

In this example, dataset-1.0.0.csv has been pav:importedFrom survey.xls, i.e. probably saved from Excel (the software can be specified using pav:createdWith). The Excel file was imported from an SPSS survey data file, but in addition had a pav:sourceAccessedAt the survey form (e.g. the creator looked up more descriptive column headers).

For dataset-1.1.0.csv we (as humans) can see the minor version has been incremented, and that it has a different provenance, this version was imported from dataset.xlsx, which has been pav:derivedFrom the earlier survey.xls (indicating that the spreadsheet have evolved significantly). The data was imported from a different survey2.spv (which might or might not be related to survey.spv), but still accessed the same surveyform.docx.

For dataset-2.0.0.csv the provenance is quite different, this time the scientist has simply used Survey Monkey rather than SPSS to manage their survey, and have simply published its exported CSV. Presumably this dataset is quite different in its structure, as it has gained a new major version to become 2.0.0. Note that if the content of the dataset (its knowledge) had significantly changed, e.g the old dataset showed  baby birth weights while the next dataset was a survey of pregnant mothers, their education levels and their baby’s birth weight, then the new dataset should rather be related with pav:derivedFrom.

Adding other PAV properties to relate agents to versions, such as pav:createdBy, pav:importedBy and pav:authoredBy, can be useful particularly to attribute different people involved with each release. 

Related work

While we have presented versioning with PAV, other vocabularies exists with alternative ways to model versions.

PROV-O revisions

In the W3C specification PROV-O, the term prov:wasRevisionOf can be used to relate versions:

A revision is a derivation for which the resulting entity is a revised version of some original. The implication here is that the resulting entity contains substantial content from the original. Revision is a particular case of derivation.

While at first prov:wasRevisionOf seem to achieve the same as pav:previousVersion, the PROV definition is focusing on revision as a form of derivation. As the dataset example above showed, versions are not necessarily related through simple derivations, but can have their own provenance. It is unclear if prov:wasRevisionOf also might be used to give shortcuts to older versions, while pav:previousVersion only should be used towards the directly previous version. The PAV property also recommends giving the human-readable pav:version.

We do however acknowledge that most common use of prov:wasRevisionOf is very similar to pav:previousVersion, and have therefore mapped pav:previousVersion as a subproperty of prov:wasRevisionOf. Although this also indirectly means a PAV previous version is related with a PROV derivation, the definition of prov:wasDerivedFrom is intentionally quite wide and should also cover pav:previousVersion as an ‘update’:

A derivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity.

Our derivation subproperty pav:derivedFrom is again intentionally more specific, requiring a significant change in content, and thus can be used to clarify the level of change.

We made a mapping to PROV-O which explains the rationale for each PAV subproperty.

Qualified revisions

One interesting aspect of PROV-O is the ability to qualify relations. prov:wasRevisionOf (and therefore also pav:previousVersion) can be qualified using prov:qualifiedRevision. For instance we could expand the relation between dataset 2.0.0 and 1.1.0 to explain why we had to change the major version:

prov:qualifiedRevision can be used to detail pav:previousVersion, here explaining the changes of the the dataset using rdfs:comment. Note that this figure does not show the qualified link prov:entity from the revision to dataset-1.1.0.csv.

Note that it will often be difficult to assign a retrievable URI for the revision itself, unless some kind of versioning system (like Github or Google Code) provides a way to link to the change or revision itself.

This kind of qualification pattern can be also be used for other PAV properties that have PROV superproperties, such as prov:qualifiedDerivation on pav:importedFrom, or prov:qualifiedAttribution on pav:authoredBy, however in many cases it might be better to expand the change by relating entities to PROV activities.

DC Terms

The Dublin Core Terms is a well-established and popular vocabulary to provide bibliographic records, particularly for document-like resources. As its focus is on human-readable bibliographies rather than provenance, there is not necessarily a ‘backwards in time’ lineage when using DC Terms relations. These DC Terms properties can be used for describing versions of resources:

  • dcterms:replaces – A related resource that is supplanted, displaced, or superseded by the described resource. As mentioned before, this is similar to pav:previousVersion, but adds a stamp of authority as the older version is superseded or displaced. So for instance if our dataset-2.0.0.csv was experimental and not really a good replacement for 1.1.0 (say we really wanted to include eye colour), then dcterms:replaces would not be appropriate until there was a new “official version” – which might not be until 2.1.3. The inverse, dcterms:isReplacedBy, can be used as a forward pointing property to indicate that a resource is no longer current.
  • dcterms:isVersionOf – A related resource of which the described resource is a version, edition, or adaptation. Changes in version imply substantive changes in content rather than differences in format. This property is quite wide, in that it could cover any kind of adaptation, like the Romeo+Juliet movie being a version of the Shakespeare theatre play Romeo and Juliet.
    In provenance term, such adaptions are normally covered by prov:wasDerivedFrom (the movie was based on the theatre play) or prov:alternateOf  (the movie as an alternate of a theatre performance), while differences in abstraction levels (e.g. the DVD vs. the movie in general) are covered with prov:specializationOf and FRBR-like abstraction models.  Additionally, pav:previousVersion does not normally cover substantive changes in content, that should be described using pav:derivedFrom.
  • dcterms:hasVersion – A related resource that is a version, edition, or adaptation of the described resource. This is the inverse of dcterms:isVersionOf, but also suffers from sometimes being used as a kind of prov:qualifiedRevision pointing at a free-standing revision resource (as in our dataset example above), or as a more hierarchical unversioned-to-versioned relationship (prov:generatizationOf). Even within the DC Terms history there seems to be a confusing mix of dcterms:hasVersion and dcterms:replaces that hints of hierarchical use, but also makes a resources have themselves as versions.

PAV has a mapping to DC Terms (available as SKOS) which explains how the two vocabularies could be aligned, however we have not included the versioning part of this mapping in the formal OWL ontology due to the above reasons. is a set of terms that has grown to be amongst the most popular vocabularies for describing web resources, partially because of its usage by Google, Yahoo and Bing. Terms we identified to be related to versioning are:

  • schema:version – The version of the CreativeWork embodied by a specified resource. This can be seen as a more specific version of pav:version, the biggest difference is that schema:version is typed to be a schema:Number, and so might not cover versions  like “1.5.2” or “2014-01-05”.
  • schema:isBasedOnUrl – A resource that was used in the creation of this resource. This term can be repeated for multiple sources.  This is more of a loose provenance term which could be seen to cover all of pav:sourceAccessedAt, pav:importedFrom, pav:retrievedFrom, prov:wasDerivedFrom and prov:wasInfluencedBy.
  • schema:successorOf – A pointer from a newer variant of a product to its previous, often discontinued predecessor. While this description is similar to pav:previousVersion and dcterms:replaces, the term seem to only be used from/to schema:ProductModels which would not cover web resources that are not product sheets. The same applies to its inverse schema:predecessorOf.
  • schema:isVariantOf A pointer to a base product from which this product is a variant. It is safe to infer that the variant inherits all product features from the base model, unless defined locally. This property, also only used from/to schema:ProductModel, is a specialization of dcterms:isVersionOf and prov:specializationOf.

Organize the versions

In PAV 2.3 we added three additional properties for versioning:

Earlier versions

pav:hasEarlierVersion point to any earlier version, not just the directly previous version. This is a transitive super-property of pav:previousVersion, which means you can build a linear chain of previous versions, and imply all the earlier versions. (Importantly pav:previousVersion is NOT transitive). For simplicity there is no inverse property for the later version – as we think an earlier version shouldn’t make “future” declarations, rather the newer version should indicate its earlier version (following the direction of provenance).

PAV versions - hasEarlierVersion

Has a version (snapshots)

pav:hasVersion is a specialization of dcterms:hasVersion – which formalizes that this property is for hierarchical versioning:

PAV versions - hasVersion

This shows how is a more general entity that spans across the multiple snapshots, therefore pav:hasVersion is also a subproperty of prov:generalizationOf – indicating the hierarchical nature of the entities describing the same thing with different (time) characteristics.

Note that unlike dcterms:hasVersion, pav:hasVersion goes to a snapshot – the version should be retrievable at its URI, so it would usually not be good taste to use pav:hasVersion to a revision info page that does not include the page as it was in that version.

However for Software Releases, using GitHub release pages as versions is probably a good idea.

Current version

While these snapshots should contain pav:previousVersion between them to provide a version lineage, it is often useful to declare what is the current version. So we have also pav:hasCurrentVersion:


Thus pav:hasCurrentVersion is useful to provide a permalink for a dynamic page.  Often this is what people have meant with a more functional use dcterms:hasVersion – pointing to a single current snapshot – where older snapshots would have dcterms:isVersionOf backlinks.  While that pattern might have been used, it is not formally defined as such by DC Terms.

As pav:hasCurrentVersion specializes pav:hasVersion you don’t need to duplicate that relation for the current version.  Note that the current version is not necessarily the latest version – there could be a newer version (e.g. a draft or release candidate) which is not yet official – as exemplified above with PAV 2.3 as a draft. (Note that since making this figure PAV 2.3.1 has been released)

Here we can see that there’s a “future” PAV version that may or may not later become the pav:hasCurrentVersion (it is infact now the current version).This is typical of software development, where you often have alpha versions and release candidates.

It can be useful to have third-party “versions” (e.g. forks in software development) – where you would not find the official pav:hasVersion statement from the . In this case you should add a prov:specializationOf backlink and pav:derivedFrom statement to which version you forked.

Hierarchies all the way down

There is nothing preventing you from also using pav:hasVersion to define deeper hierarchies, e.g. for software using semantic versioning:

But this raises some challenges with pav:previousVersion, pav:hasCurrentVersion and pav:version.

I would suggest this pattern for representing semantic versioning hierarchically:

<; pav:hasCurrentVersion <; ;
pav:version "2.1.0" .
<; pav:hasCurrentVersion <; ;
pav:previousVersion <; ;
pav:version "2.1.0" .
<; pav:hasCurrentVersion <; ;
pav:previousVersion <; ;
pav:version "2.1.0" .
<; pav:version "2.1.0" ;
pav:previousVersion <; .

.. as pav:hasCurrentVersion should point to the permalink snapshot in a functional way, it would be confusing to also include its “current version” as “2” and “2.1”. So I suggest to let it always point to the “deepest” version. pav:version of the intermediaries should show the latest version of their pav:hasCurrentVersion – not a generic “2” or “2.1”. (You can use rdfs:label to say “2.1”).

For the ‘abandoned’ versions, pav:hasCurrentVersion and pav:version would be the latest one within their level:

<; pav:hasCurrentVersion <; ;
pav:previousVersion <; ;
pav:version "2.0.1" .
<; pav:version "2.0.1" ;
pav:previousVersion <; .
<; pav:version "2.0.0" ;
pav:previousVersion <; .

Note that software often have patch updates at “older” maintenance branches – e.g. it could be that the current 1.2 version is 1.2.9 even though v2.0.0 was derived from 1.2.3.

If you want to describe merges across these branches, then you would probably need to add additional pav:derivedFrom statements.

PAV Ontology paper highly accessed


Our recent paper about the PAV ontology has been classified as highly accessed by Journal of Biomedical Semantics, with more than 1097 views since it was published two months ago, with an Altmetric score of 12.

The PAV ontology provides a lightweight approach to record typical Provenance, Authorship and Versioning information, and builds upon existing standards like PROV-O and DC Terms.

Our previous Practical Provenance post gives a brief overview of PAV, but you might also want to explore these links for more details:

Resources that change state

The PROV working group received a question from Mike:

My understanding is that an entity referenced in a PROV bundle (e.g. via wasGeneratedBy) must be in the bundle…but I do not wish to duplicate entity definitions through out my bundles. My entities are long lived and will exist in multiple bundles.

So lets say I have a resource for alarms which contains a list of all alarms my company monitors. If I turn off the alarm at alarm/1, my understanding is that in PROV a new entity is created for the new state of alarm/1. But in my actual data store, I don’t create a new record, I just toggle a flag.

So there is a disconnect between how my PROV looks and how my data looks. This is by design is my understanding. So I would have a new entity in my prov for the alarm/1 in the new state which is a specialization of alarm/1, yes?

Ultimately, I want to display all of the provenance for alarm/1 so I can see its history from creation to invalidation. Am I going about this the wrong way?

Here is my reply (slightly revised for this post). My examples use the Turtle syntax and PROV-O, but are also applicable to other serializations of PROV, like PROV-XML or PROV-JSON.

Continue reading “Resources that change state”

PROV released as W3C Recommendations

The Provenance Working Group was chartered to develop a framework for interchanging provenance on the Web. The Working Group has now published the PROV Family of Documents as W3C Recommendations, along with corresponding supporting notes. You can find a complete list of the documents in the PROV Overview Note. PROV enables one to represent and interchange provenance information using widely available formats such as RDF and XML. In addition, it provides definitions for accessing provenance information, validating it, and mapping to Dublin Core. Learn more about the Semantic Web.

@prefix prov: <> .
<#quote> prov:wasQuotedFrom <> .

This means the PROV data model and specifications are released and official recommendations, and can be used as a stable platform for expressing and exploring provenance data across the web.

Practically speaking, this blog would recommend you start with the the PROV primer, followed by the tutorial and then PROV-O for LinkedData/RDF/OWL (alternatively PROV-XML for XML or alternatively PROV-JSON for JSON). For deeper understanding and definition of the PROV concepts, see the PROV datamodel.