We have suggested a fix, but while we wait for that below we describe a patch build that should work on Windows. You will also need Java for executing ProvToolbox, and Graphviz for visualization.
Install Java for Windows
You will need Java JRE 9 or later to run ProvToolbox 0.9.5, so below we show how to install JRE 11 LTS on Windows. Do not install from java.com as that provides the older Java 8, which unfortunately do not work with this ProvToolbox release.
Unfortunately there are quite a few alternatives for installing Java in Windows:
Oracle provide installers of JDK 11 for Windows x64, however they are distributed under a restrictive license you or your organization may have reservations against using. These are however straight forward to install if you only want to run ProvToolbox for personal or development use.
The official open source release of OpenJDK 11 for Windows do not provide an installer, and are fairly large. If you choose to install this you will need to modify your PATH system environment variable manually depending on where you extract the folder jdk-11 to.
Finally, AdoptOpenJDK is a community project for providing user-friendly open source builds of Java. We found option this to be the easiest to install in Windows, as they provide a MSI installer of the smaller JRE distribution, and do not require registration. However the website needs some help navigating as they provide many alternatives.
We recommend this open source option for installing Java for Windows users.
If you use 32-bit Windows, say on an older or smaller machine, try x86 instead
Download the JRE as a MSI installer (the ZIP file does not include an installer).
Before you open the MSI you may need to change the Windows Store settings to allow installing applications. From the start menu type Add or Remove Programs to open the Settings pane for Choose where to get apps. Select Anywhere.
Now double-click the MSI file from the Download folder and walk through the Installer. Enable the “Set JAVA_HOME environment variable” option.
Now open the Command Prompt window:
Run java -version to check you now got OpenJDK 11 installed on the PATH.
Shorten the path to install the folder to your home directory, e.g. C:\Users\stain
You should now have a folder ProvToolbox in your home directory:
You should now be able to run ProvToolbox\bin\provconvert from the Command Prompt open in your home directory:
Note that if you change the directory with cd you will need to modify the path to provconvert. To avoid that we’ll add it to the System Environment Variable PATH. In the Start menu, type PATH and select Settings -> Edit the system environment variables.
Click System Environment, then under User variables select PATH and click Edit. (If not found, click New and set Variable Name as PATH). Add a new line (or set Variable Value) where you can use Browse to navigate to find the full path of the ProvToolbox\bin folder:
After applying the settings with OK you will need to close and restart the Command Prompt window. This time provconvert -version should work from any path.
Install Graphviz for Windows
ProvToolbox can also generate visualization, but for this we need to install the open source tool Graphviz.
You will need to allow the unsifned package to install
When running the install wizard for GraphViz, make sure you enable to add GraphViz to the system PATH for the current user:
Close and restart the Command Prompt Window. This time dot -V (notice capital) should work:
Unfortunately the installer of GraphViz does not initialize the plugins for graphical formats. To fix this we need to open another Command Prompt, but running as Administrator:
In this window, run dot -c to initialize the GraphViz plugins.
Installing Visual Studio Code
VSCode is a free and lightweight text editor which is useful for writing *.provn files. The User Installer for 64-bit Windows should work for most users.
Note: After installing VSCode you will need to Restart Windows to pick up the updated PATH for ProvToolbox.
Running provconvert from VSCode
Remember to save PROV-N files (example) with the extension .provn so that ProvToolbox can recognize the file type. The first time you may need to select No Extension and add .provn yourself to the filename.
In VSCode’s menu, select Terminal -> New Terminal, an embedded command prompt should open in the correct directory. Check with dir *.provn that the file is present with the correct name.
You should now be able to convert the provn file to a PNG image, in our example using the command:
If you have made a syntactical error in the PROV-N file, then provconvert report errors by line numbers, which you can recognize within the VSCode editor. Note that VSCode has no native PROVN support and so its Problems tab is currently unable to detect these errors.
Remember an error early in the file, such as a broken prefix, or a missing ), may cause phantom errors to be reported later in the file.
Now from this we may want to give the cards attribution to the individual agents, and while we can perhaps assume Charlie has written the address on all the cards, we don’t know if it was Alice or Bob who wrote the particular card going to Frank.
This highlights that responsibility view and process view can complement each other.
It is part of our modelling to decide what is considered “work height” in terms of attributions. Perhaps we should not consider Charlie putting on addresses as part of writing the cards, if what mattered was the personalized message, not how the cards were posted.
It is possible to also assign prov:type to wasAttributedTo relations, e.g. ex:authorship, but this can easily become overlapping with the roles in associations with the corresponding activity. This is perhaps more useful when the activity is not explicitly declared, or we want to declare explicit contributor roles using metadata models like CRediT.
While the role terms:Writing above does not really tell us much as the activity is already of type terms:CardWriting, the distinction is somewhat that not saying the role is “We don’t know / didn’t say what work they did” while explicit role is saying “They did the same work, but we have not broken it down further“.
In a way a role is a short-circuiting device to avoid having infinite recursion into very tiny activities – it allows multiple things to happen in the same activity.
Let’s say Charlie helped with putting addresses on the envelopes, in our simple one-activity modelling we can associate them with a separate role:
Now we know who to blame if the cards end up on the North pole!
It is a modelling question to use a single activity with multiple agents associated, potentially with different roles; or to use multiple activities with simpler associations, perhaps not needing roles. One thing to keep in mind is that more activities will also need more entities to represent the different states, e.g. ex:unwrittenCard1, ex:writtenCard1, ex:addressedCard1.
Rather than associating multiple agents to a single activity it is possible to make a new agent that represent the whole group, in which case it is a choice to not break down to individual engagement.
For instance this would be correct if recording a board room decision, where the group of Alice, Bob and Charlie together decided/stated something they may agreed with or be responsible for individually.
PROV does not itself define how to specify the make-up of groups – but it has agent typesprov:Person, prov:Organization and prov:SoftwareAgent that you can assign using prov:type.
A student group might not sound like an organization, but it is the closest match of the three. We also see that there is an equivalent type Organization in the schema.org vocabulary, with looser sub-types including PerformingGroup (a band or orchestra), Project (someone working towards a shared goal) or even CollegeOrUniversity (the University as a whole).
While PROV literature does not cover well how to use external vocabularies like schema.org, we can take advantage of such types and attributes in PROV-N and express group membership using schema.org/member, adding a parentOrganization relation to the University for good measure.
prefix s <http://schema.org/>
prefix ex <http://example.com/>
agent(ex:UoM, [prov:label="University of Manchester",
agent(ex:RedGroup, [prov:label="Red Group",
The keen reader will notice that the above example repeats some attribute names to express multiple relations of prov:type and s:member – both of these are unordered.
As an alternative, PROV has the concept of Collections.
Although collections are mainly used for describing compound entities, such as a dataset containing samples, as agents can concurrently be entities they could also be used for showing group members.
Tip: Treating a agent as an entity can be of interest if your provenance need to show its evolution over time, for instance changes to group membership. In this case, note that you will need multiple agent identifiers depending on which “version” of the group was associated with particular activities across time, or use specialization to make a “timeless” entity.
Using prov:Collection would perhaps feel more PROV native than schema.org use above, but without any additional attributes like parentOrganization:
This choice between vocabularies is I’m afraid part of data modelling — there will never be a single perfect metadata model, and so we have to either pick the one that is most useful, or find a way to combine both approaches. (Although verbose, the above should work both ways)
Sometimes two metadata models can be in conflict in terms of granularity, in which case you should use different identifiers, in some cases even different documents.
Most examples of PROV-N use example prefixes like:
prefix ex <http://example.com/>
prefix exg <http://example.org/government>
These example domains are explicitly reserved globally for all kinds of examples and training material, and deliberately do not have any content, advertisement or affiliations.
Assume you are writing the provenance of a student group exercise, should you be using the prefix/namespace ex and example.org to define agents/entities/relationship and your own attribute types?
You can keep using example.org, but ideally you should define your own namespace based on a Web URL you “control” or “own”. This would make your Linked Data identifiers globally unique.
Students at The University of Manchester can publish their own home pages, and luckily most universities still provide a similar facility. Our students can search up themselves on https://personalpages.manchester.ac.uk/ and find for instance that they are http://personalpages.manchester.ac.uk/postgrad/alice.davidson/ which means they could make any sub-page under that directory.
Now the page does not need to exist – this blog is not about making Web pages – it is just important that it could exist at that address – that means it is that person’s identifier space.
So we could imagine that Alice made two files under her directory:
http://personalpages.manchester.ac.uk/postgrad/alice.davidson/terms Imagine this was a page defining new terms to be used in PROV
http://personalpages.manchester.ac.uk/postgrad/alice.davidson/groupExercise Imagine this page was the group exercise as text (which provenance is described by PROV), or the PROV-N file itself (self-describing)
In real life you would also have to deal with file extensions like .html or server configuration for content types, see Tim Berners-Lee’s Cool URIs don’t change for further considerations.
As above it is considered good practice to separate the prefix/namespace for new roles/attributes that you define vs specific agents/entities in one particular provenance trace. The idea being that the general terms could be reused in multiple provenance documents with the same meaning.
So let’s map in the above documents as namespaces in PROV-N:
prefix terms <http://personalpages.manchester.ac.uk/postgrad/alice.davidson/terms#>
prefix group <http://personalpages.manchester.ac.uk/postgrad/alice.davidson/groupExercise#>
The prefix string usually is a short name somewhat matching the address, it just needs be unique and consistent within the same PROV document. Other documents can freely map the same namespace to a different prefix.
A diligent reader might notice the # at the end of the namespace – this “fragment” is in Web documents used to indicate a subsection or heading within the same file.
This is so that terms:student expands to <http://personalpages.manchester.ac.uk/postgrad/alice.davidson/terms#student> rather than end with alice.davidson/termsstudent – even if we have not made the pages now we would not want to make a separate page for each word.
In some cases separate pages are desirable, in which case the namespace ends in / to mark a directory – for instance s:Person becomes http://schema.org/Person as schema.org have decided their term list is too big to keep in a single page.
Now we can use these prefixes for identifying agents/entities/activities, as well as using our own attributes, roles and types.
It is customary that attributes start with lower case after :, while types/roles start with Capital – but this is just stylistic.
As for entity identifiers, your terms and roles should not have spaces or special characters in them as they must be combineable with the namespace to a valid URI, e.g. camel case favouriteDay or underscore favourite_day
It is possible in PROV-N (but not easily in other PROV syntaxes) to use freehand roles/types strings like prov:role="Writing carefully" – these are kind of anonymous and cannot be assumed to mean the same thing across multiple PROV documents.
In many cases there is no suitable URI yet, in which case using a temporary namespace like http://example.com/yourthing is perfectly valid as a working example, just be aware of the risk of other people having the same namespace idea!
One of the advantages of W3C PROV having a common data model is that it can be serialized, or written out, in multiple file formats. The PROV family of W3C specifications describe mappings PROV-XML and PROV-O (which, being based on OWL2 itself has multiple serializations, for Linked Data including RDF formats Turtle and JSON-LD.
In addition to these standard approaches we also have PROV-JSON and PROV-JSONLD which could be well-suited for Web applications. All of these can in theory be mapped to each-other through the common PROV Data Model and the use of URIs as Linked Data global identifiers.
PROV also specifies its own language, PROV-N, a text-based file format that most closely represent the PROV Data Model. This representation is used by the PROV Primer to explain the PROV types (entity/agent/activity) and their relationships (e.g. wasAttributedTo). For example:
prefix prov <http://www.w3.org/ns/prov#> is implicit, and is the internal namespace for PROV types and attributes.
Tip: It is possible to declare default <http://example.com/> after which ex:regionList can be shortened to regionList, however it is recommended to always use explicit prefixes to ease reuse and combination of PROV-N files.
entity(ex:regionList) declares the existence of an entity with that identifier. It can thereafter be used in relationships expecting an entity.
The entity ex:dataset is similarly declared, but also assigning a more specific type, using http://schema.org/Dataset from the external vocabulary.
The activityex:composition is typed using an ad-hoc type ex:Composing from our own namespace, but also adds a string attribute to give a more descriptive label.
In addition there are graphical tools for PROV editing, validation, conversion and visualization described below:
The PROV-N Editor is an online text editor that provides syntax highlighting and autocomplete for PROV-N, and is useful for beginners new to PROV-N.
Note that the starting example PROV-N aims to be somewhat complete, including the advanced use of nested bundle .…. endBundle block, //comments and deliberate invalid statements (shown in red).
We recommend using the PROV-N Editor starting with a simpler example, and to use copy-paste to save the PROV-N locally to a file, using a text editor like Visual Studio Code (which unfortunately do not have syntax highlighting for PROV-N):
Note: The file extension for PROVN is .provn, but you may use .provn.txt to ensure it opens in a text editor. Do not edit PROV-N in a text processor like Microsoft Word, as its binary format .docx (actually a structured ZIP archive of XML files) is not parseable by PROV tools; in addition text processors may provide unhelpful assistance such as changing “quotes” to “curly quotes” which are not part of PROV-N syntax.
Although the PROV-N editor does syntax highlighting and can detect glaring mistakes such as invalid file comments, it does not do deeper inspection to detect mistakes such as missing commas, mismatches parentheses, wrong or missing argument to PROV relations. You may also accidentally have added logically inconsistent statements, such as:
While the above “scruffy” PROV-N file is syntactically valid, and each of the statements are OK semantically, as a whole we seem to have added a semantic violation of causality; an entity can’t be generated from entities not yet existing. An attempt to draw the above as a diagram will show an endless loop of derivations:
To ensure your PROV-N is both syntactically valid and semantically consistent, it is best to use a PROV validator.
The openprovenance.org PROV validator can support PROV-N; remember to tick the correct syntax, specially when pasting rather than uploading a file with the correct extension.
The checks performed by the PROV Validator mainly focus on semantic constraints such as correct typing and ensuring provenance goes backwards in time without any causality loops (e.g. you can’t be your own grandparent).
Unfortunately we have found that the PROV Validator service occasionally does not detect syntactic PROV-N errors, for instance if we delete the placeholder argument ,- from the wasGeneratedBy statement above it is silently accepted by this validator, even though the timestamp is required by PROV-N definition of used. If there are syntactic errors the user is not provided with line-numbers of where the error might be.
Therefore we also recommend using the PROV Toolbox command line tool to validate the PROV-N syntax before using the PROV Validator.
The PROV Toolbox is a Java library for consuming and generating PROV, but it also includes a versatile command line tool that can do:
After installing or unzipping to a subdirectory you should be able to run its provconvert or bin/provconvert command:
(base) stain@biggie:~/software/ProvToolbox$ bin/provconvert -help
usage: provconvert [-allexpanded] [-bindformat <string>] [-bindings
<file>] [-bindver <int>] [-builder] [-compare <file>] [-config]
[-debug] [-flatten] [-formats] [-generator <string>] [-genorder]
[-help] [-index] [-infile <file>] [-informat <string>] [-layout
<string>] [-location <location>] [-log2prov <file>] [-merge <file>]
[-namespaces <file>] [-outcompare <file>] [-outfile <file>]
[-outformat <string>] [-package <package>] [-template <string>]
[-templatebuilder <file>] [-title <string>] [-verbose] [-version]
-allexpanded,--allexpanded In template expansion,
generate term if all
variables are bound.
-bindformat,--bindformat <string> specify the format of the
-bindings,--bindings <file> use given file as bindings
for template expansion
(template is provided as
-bindver,--bindver <int> bindings version
-builder,--builder template builder
-compare,--compare <file> compare with given file
-config,--config get configuration
-debug,--debug print debugging information
-flatten,--flatten flatten all bundles in a
single document (to used with
-index option or -merge
-formats,--formats list supported formats
-generator,--generator <string> graph generator
-genorder,--genorder In template expansion,
generate order attribute. By
default does not.
-help,--help print this message
-index,--index index all elements and edges
of a document, merging them
-infile,--infile <file> use given file as input
-informat,--informat <string> specify the format of the
-layout,--layout <string> dot layout: circo, dot
(default), fdp, neato, osage,
-location,--location <location> location of where the
template resource is to be
found at runtime
-log2prov,--log2prov <file> fully qualified ClassName of
initialiser in jar file
-merge,--merge <file> merge all documents (listed
in file argument) into a
-namespaces,--namespaces <file> use given file as declaration
of prefix namespaces
-outcompare,--outcompare <file> output file for log of
-outfile,--outfile <file> use given file as output
-outformat,--outformat <string> specify the format of the
-package,--package <package> package in which bindings
bean class is generated
-template,--template <string> template name, used to create
bindings bean class name
-templatebuilder,--templatebuilder <file> template builder
-title,--title <string> document title
-verbose,--verbose be verbose
-version,--version print the version information
Here is an example of converting from provn to RDF Turtle.
The example output is valid RDF and uses the same prefixes in a different notation. (This kind of output can be loaded in Triple stores like Jena Fuseki for further queries).
Note that as a UNIX-like tool, no output from provconvert means the conversion was successful. We can use provconvert for validation, even if we do not need the translated file. If the provn has syntax errors, this will be reported as:
Note that on Windows you would need to modify the PATH system variable for GraphViz to work, see installing PROV Toolbox for Windows.
PROV Store allows uploading of PROV documents, conversion and visualization. It is recommended to edit and validate PROV-N files with the methods listed above before uploading, as the PROV Store can be more picky on compliance with the PROV standards.
There seems to be a bug in email notifications not being sent when registering, so use the big “Register for free account” on https://openprovenance.org/store/account/signup/ which lets you straight in. Hack: For a second registration if email link has not been received, make a username like fred14 and add +14 to your email address: firstname.lastname@example.org
This means the PROV data model and specifications are released and official recommendations, and can be used as a stable platform for expressing and exploring provenance data across the web.
Practically speaking, this blog would recommend you start with the the PROV primer, followed by the tutorial and then PROV-O for LinkedData/RDF/OWL (alternatively PROV-XML for XML or alternatively PROV-JSON for JSON). For deeper understanding and definition of the PROV concepts, see the PROV datamodel.