IATI Consultations Archive

Live discussions and consultations can be found at discuss.iatistandard.org.

Track updates to files/data using the IATI Registry

How does the Registry know when records have changed?

  • The Registry runs a nightly process that pulls all the XML files registered on the registry from their URLs on the web.
  • It then process these files to see if there are differences from the records they currently hold on those files.
  • The archiver script () extracts some information from the file contents (nº of activities and last updated date), which is stored as metadata on the registry. It also generates a hash, after removing the generated-datetime tags from the original XML file, which is also stored as metadata. If the hash has changed from the last check, the record is updated, even if the number of activities or dates haven’t.
  • It is important to remember that, apart from the contents of the remote file, the file metadata held on the registry can be also be modified via the web interface or the CSV import feature.

You can find out what has changed by:

  • Using the CKAN API (CKAN is the software that runs the registry)
  • Following ATOM feeds


Using the CKAN API

CKAN provides a powerful search API that gives access to all necessary metadata to developers building third-party applications. There are two versions available of the search API.

Versions 1 and 2

Full documentation for this version can be found here:
http://docs.ckan.org/en/ckan-1.5.1/api.html#search-api

Endpoints:
http://iatiregistry.org/api/search/dataset (Defaults to version 1)
http://iatiregistry.org/api/1/search/dataset
http://iatiregistry.org/api/2/search/dataset

The version 1 and 2 of the API support GET requests to get metadata from the registry records. The only difference between version 1 and 2 is that version 2 returns ids instead of entity names.

NB: These versions are not longer being developed, and although they won’t be deprecated in a near future, we recommend to use the version 3 API described on the next section.

By default, the API will return a JSON object with a count and results properties, with the results being the names of the matched datasets (or ids if using version 2).

{
   "count": 0,
   "results": [

"theglobalfund-nic",

“theglobalfund-slv"

]
}

If you include the all_fields=true parameter, you will get all available fields for each record.

Here is an example query that will return the first 20 records that were modified since the the 1st May 2012, order by the most recent first (Note that you must specify the full date in ISO format):

http://iatiregistry.org/api/search/dataset?q=metadata_modified:[2012-05-01T00:00:00.000Z%20TO%20NOW]&sort=metadata_modified%20desc&rows=20&all_fields=true

Version 3

The version 3 of the API (also known as action API) is a more powerful version which allows more advanced features like faceting.

The endpoint for this version is:
http://iatiregistry.org/api/action/package_search


Full documentation can be found here:
http://docs.ckan.org/en/ckan-1.5.1/apiv3.html

The CKAN version the IATI registry runs on (1.5.1) requires all request to this API to be POST requests. Responses are JSON dictionaries that have the following structure:

{
   "help": null,
   "result": {
       "count": 0,
       "facets": {},
       "results": [...]
}
           

  • count is the total number of records for the current query, even if only some of them are returned.
  • facets show the aggregated count for some field (we’ll see more about them later).
  • results is a list of dataset dictionaries (i.e. registry records).


Each dataset has the following fields (all fields left-out are not relevant):

{

"id": "fa2dd0e1-f3e7-42e7-8f91-8d606331f1dd",
     "name": "theglobalfund-nic",
     "title": "The Global Fund Activity File for Nicaragua"
     "isopen": true,
     "license_id": "other-at",
     "state": "active",
     "groups": [
         {
            "id": "78f3df0a-d32e-43c1-80a3-6fcb31c4198b",
            "name": "theglobalfund",
            "state": "active",
            "title": "The Global Fund to Fight AIDS, Tuberculosis and Malaria"
        ...
           }
           ],
    "extras": [
               {
                "id": "896e48bf-b48f-438f-ab54-b4c7cf9f5ac4",
                "state": "active",
                "key": "activity_count",
                "value": "7"
            ...
                },

    ...

],

"resources": [

             {

              "id": "d7da0f48-c2ca-4239-8337-1b03976617fe",

              "format": "IATI-XML",

              "hash": "d683eb9aa1d49671cc1db896b294bc013b543f92",

              "last_modified": "2012-03-05T02:42:15.847287",

              "mimetype": "text/xml",

              "size": "324761",

              "state": "active",

              "url": "http://portfolio.theglobalfund.org/en/IATI/Activities?countryCode=NIC",

  ...

              }

],

...
}

Here are some example queries using this version of the API (the contents of the payload.json file are shown below the query):

Show the first 20 records, ordered by the modification date, most recent first:
curl -X POST -d @payload.json http://iatiregistry.org/api/action/package_search

{

   "q":"*:*",

   "sort":"metadata_modified desc",

   "rows":20

}

Show the first 20 records that were modified since the 1st May 2012, ordered by the modification date, most recent first:
curl -X POST -d @payload.json http://iatiregistry.org/api/action/package_search

{

   "q":"metadata_modified:[2012-05-01T00:00:00.000Z TO NOW]",

   "sort":"metadata_modified desc",

   "rows":20

}

Show the first 20 records that were published by the Catholic Agency for Overseas Development and were modified since the 1st May 2012, ordered by the modification date, most recent first. Also return a list of aggregate counts for each country:
curl -X POST -d @payload.json http://iatiregistry.org/api/action/package_search

{

   "q":"metadata_modified:[2012-05-01T00:00:00.000Z TO NOW] AND groups:cafod",

   "sort":"metadata_modified desc",

   "rows":20,

   "facet":"true",

   "facet.field":"country"

}

    On this case, the response will look like:

{

   "help": null,

   "result": {

       "count": 57,

       "facets": {

           "country": {

               "289": 1,

               "298": 3,

               "AF": 1,

               "BD": 1,

         ...

}

    "results": {

    ...

}

}

CKAN Fields Glossary

In the response dataset dictionary there are a number of date fields that can be returned. The following explains what they mean:
Metadata Modified:
"metadata_modified" field on the top level.
This is updated whenever the metadata held on the registry (e.g. title, country, etc) *or* the actual remote file is updated. This is recommended field to use when tracking changes to the registry.

Metadata Created
metadata_created:
The date this record was created on the registry.

Hash code:
resources.0.hash .
The hash is calculated with the actual contents of the activity file after removing all generated-datetime tags.

NB Within the IATI xml, the generated-datetime tag is required to let users know when the XML file was generated. Some providers serve their files via their own API, so this value is often changing when the rest of the data has not. By removing the generated-datetime from the data to be hashed, this should be a reliable indicator as to whether or not the file has changed.

Data Updated
extras.data_updated:
The date of the most recent activity in the remote file (i.e the most recent value for xpath "iati-activity/@last-updated-datetime")

NB As data quality varies between data providers, we cannot always rely on this value being present in the files.

Record Updated
extras.record_updated:
The date the record was updated on the registry. This should be the same value as "metadata_modified", and I said should because I now have noticed that in some cases this is not the case, which is a bug that will be fixed.

Last Modified
resources.0.last_modified:
This is used internally and I'd suggest not to rely on it. It should show the date the resource (so the remote file) was modified.


Atom feeds

Atom feeds are available from the registry with information about files that have changed. They are provided as a convenient wrapper to the search API, but they don’t offer more information than the API, and developers wanting full control should consider using the API.

There are different feeds available:

Changes across the whole registry:
http://iatiregistry.org/feed/registry.atom

Changes on a particular publisher:
http://iatiregistry.org/feed/publisher/dfid.atom

Changes on a particular country or region (e.g. AO):
http://iatiregistry.org/feed/country/AO.atom

Changes on a particular organisation type:
http://iatiregistry.org/feed/organisation_type/21.atom

Changes on a custom query:
http://iatiregistry.org/feed/custom.atom?groups=dfid&country=289

Changes on a particular dataset:
http://iatiregistry.org/package/history/dfid-289?format=atom&days=7

Thanks to Adria, Dan and Alexandru, for helping get this guidence together. Any questions? Please ask.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.