IATI Consultations Archive

Live discussions and consultations can be found at discuss.iatistandard.org.

Toolsets to make IATI more reusable

I've been having conversations recently with people (Steve Davenport, Owen Barder, Tim Davies, Neil Fantom) about toolsets to help people make better use of IATI datasets. While XML is an extremely powerful format for holding complex structured and hierarchical data, it has a fairly high barrier of entry for people trying to do simple things with the data.

 

The issues

  1. Developers need an advanced knowledge of XSL, or a combination of programming and XPath, to pull out even simple lists of data ("give me all the project titles and IDs in a list"). Pulling out more complex information ("give me all transactions from projects currently active") are more challenging.
  2. Many developers work best with simple tabular data in CSV format, rather than the hierarchy supported by XML. Common toolsets (such as Google Spreadsheets) will prefer CSV
  3. For more advanced queries, it may be simpler to import the IATI XML into a database (something like SQLite or MySQL).

 

Possible toolsets

To address these issues, how about the following toolsets?

CSV converter

  • Provides CSV outputs for basic information on a dataset: for example list activities with total budget/spend figures, list all transactions by activity.
  • Would allow direct import into Google Apps for charting and other analysis

LIst provider

  • Similar to CSV converter but outputs unordered lists for use within web pages.

Database import tools

  • Import IATI structured data into a SQLite/MySQL database for further querying.
  • Common structure for IATI activity files to allow sharing of queries across donors.

 

What other ideas do you have?

 

 

Have more questions? Submit a request

7 Comments

  • 0
    Avatar
    Mitesh Shah

    This guys didn't think about all that you have mentioned above. They just made a complex standard which even publishers have difficulty following. Many of the columns for example will not contain any data as they don't pertain to the data being published. While XML is great for developers to transport data from one tool to another but it just sucks in this case. None of the researchers or non-technical people are able to use XML for research purposes or in other softwares. I am really pissed as a developer, I don't even know how to get an aggregated list of all the urls the registry is holding. i am not going to manually click on a single link!

    They recently released a xml to csv converter tool which does half a decent job in converting the xml and takes ages in some instances. Have you tried to import the csv data into a sql database. Look at the number of columns? It's mind boggling and some of them contain no data.

    No API ? Really ?

    From my point of view the IATI standard needs to be thoroughly restructured, made simple before going on a marketing spree to get more organisations to publish their data.

  • 0
    Avatar
    David Carpenter

    Hi Mitesh

    The registry is an instance of CKAN software. It has an API, so it's quite straight forward to get a list of all the URL's on the registry. Documentation on how to use the CKAN API is here: http://docs.ckan.org/en/latest/api.html

    There is sample code lying around in various places on/via our wiki:

    http://wiki.iatistandard.org/tools/toolkit/sync_tools
    http://wiki.iatistandard.org/tools/iati-registry-refresher/start 

    There are also various 3rd party implementations of data agregators that offer API's onto the data: Check out:

    AidView - uses a backend eXistdb with an API  - http://wiki.iatistandard.org/tools/aidview/start
    Openaid IATI Parser and API - http://wiki.iatistandard.org/tools/oipa_v2/start 
    IATI Explorer - http://iatiexplorer.org/ 

    I think the OP's original suggestion that "Developers need an advanced knowledge of ..." is a little over the top. There are great XML parsers in many programming languages, and the code for the 'preview' tool on an xml file is more or less a single file. (see http://wiki.iatistandard.org/tools/show_my_iati_data/start )

    I think that everything I have cited here is available as free software, so there are examples and opportunities to dive in and have a go.

    Finally, I would add, that the data is complicated because it is trying to tell a story about complicated issues, but your feedback and opinions are appreciated. There is ongoing development of the standard, and if you would like to get involved in that process you can start here: http://wiki.iatistandard.org/standard/start

    All the best
    David 

  • 0
    Avatar
    Mitesh Shah

    Hi David,

    Thank you for the information you provided above. Is the api information listed on IATi website, I am asking this since I couldn't find any information which I may have overlooked. In case if it doesn't it would be really helpful for others to post this information under developers section. I used your IATI Registry Refresher - https://github.com/caprenter/IATI-Registry-Refresher. The only problem it takes too long to get the entire list of urls, secondly downloading the XML files with curl takes long. I then use IATI XSLT transformations to convert the xml to csv - https://github.com/aidinfolabs/IATI-XSLT. Once in csv I merge all the csv files into one and insert the merged csv into a mysql database. The whole process currently takes over 30 minutes. 

    I am looking for a quick way to access the api. Will look into the other api's you mentioned. If you know of any other quicker ways to download the xml files do let me know.

  • 0
    Avatar
    David Carpenter

    Hi Mitesh

    Yeah, the registry refresher works but is a bit slow! Another approach I have used is to grab all the URLs then use wget to get them fast. The refresher uses CURL because I thought by fetching it as a stream it was more reliable (but it may not be!) Also the call I use for the CKAN API is slow. This is fatser, but you need to get 2 pages of results:

    http://iatiregistry.org/api/search/dataset?fl=name,download_url,metadata_modified,groups,id,data_updated&offset=0&limit=1000

    The offest and limit parameters are used to get your 'pages' of results, the fl parameter is a fields request, so you might only want download_url. This returns a json file.

    I have some code I'm using to check for changes on the registry that I need to get on github. I'll do that this week.

    Are you aware of the iati-technical mailing list? It might be worth joining if you haven't already. Would be interesting to know what you are doing with all the data once you've got it as csv - there may be other people trying to do similar things.

    People are always reviewing our documentation, so I'll add this conversation to their workload and see if we can get it improved. Hope that helps. Get in touch if you need any further clarification.

  • 0
    Avatar
    Mitesh Shah

    Any new information/tools to access the IATI REGISTRY via an api which is faster similar to what the World Bank are doing?

  • 0
    Avatar
    David Carpenter

    Hi Mitesh

    I think all of the projects listed above are still going since my last post. 

    A new development is a new API that is being developed by the Open Knowledge Foundation (the people that run the IATI Registry) It's not ready yet, but you can follow it's progress here:
    https://github.com/okfn/iati-datastore 

    The other place to look for new developments is http://wiki.iatistandard.org/tools/start

    The IATI Registry Data Analyzer: http://wiki.iatistandard.org/tools/iati_registry_data_analyzer/start is new since my previous post. 

    Hope that helps.

  • 0
    Avatar
    Essi Lindstedt

    I've had success using Open Refine http://openrefine.org/download.html .  Still takes a lot of work to get the data organised so it's easily searchable but it's a good start.

Article is closed for comments.