IATI Consultations Archive

Live discussions and consultations can be found at discuss.iatistandard.org.

General : Standardise multi-lingual text fields

Discussions moved forward at the TAG in Montreal in ensuring that the standard has a comprehensive and consistent approach to the reporting of names, text and code descriptions in multiple languages.

This is a holder for work on this proposal being prepared by Owen Scott. 

Have more questions? Submit a request

15 Comments

  • 0
    Avatar
    Mark Brough

    Agree in principle. It should be possible to use multiple languages for more fields, where the data cannot simply be translated by using standard codelists.

  • 0
    Avatar
    Owen Scott

    Proposal

    Sometimes the element/text() value for an element adds explanatory value in addition to element/@code. For instance, take an organization which uses their own sector codes ("RO") and reports each sector in two languages:

    <sector vocabulary="RO" xml:lang="EN">Water and Sanitation</sector>  <!--water and sanitation-->

    <sector vocabulary="RO" xml:lang="FR">Eau et assainissement</sector> <!-- water and sanitation-->

    This is ambiguous because it is impossible for a parser to distinguish the also-valid case of an organization, for some reason, reporting two distinct sectors using different languages (which is not forbidden by the standard).

    <sector vocabulary="RO" xml:lang="EN">Water and sanitation</sector> <!--water and sanitation-->

    <sector vocabulary="RO" xml:lang="FR">Santé</sector> <!--health-->

    To eliminate this possible ambiguity, suggesting instead the following pattern:

    <sector vocavbulary="RO">

        <text xml:lang="EN">Water and sanitation</text> <!--water and sanitation-->

        <text xml:lang="FR">Eau et assainissement</text> <!--water and sanitation-->

    </sector>

    By comparison, the other case would be:

    <sector vocavbulary="RO">

        <text xml:lang="EN">Water and sanitation</text> <!--water and sanitation-->

    </sector>

    <sector vocavbulary="RO">

        <text xml:lang="FR">Santé</text> <!--health-->

    </sector>

    This would make it clear when a translation for a single code is being provided and when two distinct sector codes are intended.

    For overall consistency within the standard, I would suggest that this pattern of exploded-out element text be applied to all elements in the IATI Standard.

  • 0
    Avatar
    Ben Webb

    For consistency, should the same pattern of exploded-out element text be applied to all textual elements in the codelist XML? - https://github.com/IATI/IATI-Codelists/blob/master/xml/ActivityStatus.xml

  • 0
    Avatar
    Owen Scott

    That makes sense to me. It would give us the scaffolding for multilingual code lists, and keep the pattern more consistent for those writing IATI-related parsers.

  • 0
    Avatar
    Owen Scott

    Perhaps only tangentially related (e.g. not related to multilingual, but is related to parsing ambiguity), but what about other areas of parsing ambiguity? For instance, say I have a project that works in two sectors: Health and Water Supply. I can legally do this:

    <!--xml (1)-->
    <sector vocabulary="COFOG" code="07"/> <!--health-->
    <sector vocabulary="DAC" code="140"/> <!--water and sanitation-->

    or
    <!--xml (2)-->
    <sector vocabulary="DAC" code="121"/> <!--health-->
    <sector vocabulary="DAC" code="140"/> <!--water and sanitation-->
    <sector vocabulary="COFOG" code="07"/> <!--health-->
    <sector vocabulary="COFOG" code="06.3"/> <!--water and sanitation-->

    Now take two UI approaches. (a) displays lists of sectors for a single vocabulary only. (b) displays all sectors provided. For XML (1), parser (a) would only display either health or water, despite the project's focus on health *and* water. For XML (2), parser (a) would display both sectors, but parser (b) would display all four sectors - essentially giving the user the same information twice. The ultimate result is ambiguity, which means different UIs will give different views onto the same data. This also discourages reporting with multiple sector classifications, and makes using percentages a lot harder.

    To me a good alternative would be to add a higher-level element, so for the two cases above:

    <!--xml (1)-->
    <sector-group>
        <sector vocabulary="COFOG" code="07"/> <!--health-->
        <sector vocabulary="DAC" code="140"/> <!--water and sanitation-->
    </sector-group>

    <!--xml (2)-->
    <sector-group>
        <sector vocabulary="DAC" code="121"/> <!--health-->
        <sector vocabulary="DAC" code="140"/> <!--water and sanitation-->
    </sector-group>

    <sector-group>
        <sector vocabulary="COFOG" code="07"/> <!--health-->
        <sector vocabulary="COFOG" code="06.3"/> <!--water and sanitation-->
    </sector-group>

    I would see similar applications in other parts of the standard that suffer from similar ambiguity (for parsers). Any thoughts?

  • 0
    Avatar
    Ben Webb

    Your suggestion for sectors does look less ambiguous. I can't think immediately of any similar elements, but I've not looked thoroughly.

    Wtih regards to your languages proposal, must the text element have the xml:lang attribute, or if it's missing, is the default language used?

  • 0
    Avatar
    Owen Scott

    For the text element, I'd defer to you, but my recommendation would be that the default is used if there is no value, but if two text elements within the same parent have the same language attribute (even if one of them gets the attribute implicitly from the default) then that should be considered invalid.  Does that make sense?

    For sectors, I will look through the standard and see about any other elements which might be similar. This is definitely the kind of light-touch thing that doesn't really change the pre-publication data structure for publishers, just the final step of XML production, but would add a tonne of simplicity when it comes to deciding how to parse, display, and interpret information. 

  • 0
    Avatar
    Ben Webb

    That makes sense, thanks.

  • 0
    Avatar
    Bill Anderson

    Iteration 1: Proposal 1 - scrapping text of purely code elements

    This proposal is to remove the ability to supply multilingual text between element braces on elements that require the use of codelists. In these cases the code itself should be seen as unambiguous and authoratitive, and the codelists  available on iatistandard.org should be upgraded to supply all necessary translations of both code names and descriptions.

    This will apply to the following elements:

    • Activity Standard
      • activity-status
      • activity-scope
      • policy-marker
      • collaboration-type
      • default-finance-type
      • default-flow-type
      • default-aid-type
      • default-tied-status
      • recipient-region
      • recipient-country
      • transaction/transaction-type
      • transaction/flow-type
      • transaction/aid-type
      • transaction/finance-type
      • transaction/tied-status
      • transaction/disbursement-channel
      • document-link/category
      • document-link/language
      • related-activity
      • crs-add/loan-terms/repayment-type
      • crs-add/loan-terms/repayment-plan
      • location/exactness
      • location/location-id
    • Organisation Standard
      • recipient-country
      • file-format
      • document-category
  • 0
    Avatar
    Bill Anderson

    Iteration 1: Proposal 2: Include nested, multi-lingual text elements for all elements containing free text

    • Example 1
      • Current Usage
        • <title xml:lang="en">English Title</title>
        • <title xml:lang="fr">Titre français</title>
      • Proposed Change
        • <title>
        •    <text xml:lang="en">English Title</text>
        •    <text xml:lang="fr">Titre français</text>
        • </title>
    • Example 2
      • Current Usage
        • <sector vocabulary="RO" xml:lang="en code="456">Water and sanitation</sector>
        • <sector vocabulary="RO" xml:lang="fr" code="456">Eau et assainissement</sector>
      • Proposed Change
        • <sector vocabulary="99" code="456">
        •     <text xml:lang="en">Water and sanitation</text>
        •     <text xml:lang="en">Eau et assainissement</text>
        • </sector>

    Elements to be modified are:

    • Activity Standard
      • title
      • description
      • contact-info/organisation
      • contact-info/person-name
      • contact-info/job-title
      • contact-info/mailing-address
      • location/name
      • location/description
      • location/activity-description
      • country-budget-items/budget-item/description
      • transaction/description
      • document-link/title
      • conditions/condition
      • result/title
      • result/description
      • result/indicator/title
      • result/indicator/description
      • result/indicator/baseline/comment
      • result/indicator/period/target/comment
      • result/indicator/period/actual/comment
    • Organisation Standard
      • reporting-org
      • iati-organisation
      • document-link/title
  • 0
    Avatar
    Yohanna Loucheur

    Someone here just remarked that the list of elements to be modified at the Activity level should include "participating organization" fields, to enable publishing the org name in different languages. 

    "Reporting-org" does appear in the Organisation Standard list, but not in the Activity list.

  • 0
    Avatar
    Bill Anderson
  • 0
    Avatar
    Bill Anderson

    We have changed the name of the <text> element to <narrative> to avoid confusion with the xml text()

    Therefore...

    • Example 1
      • Current Usage
        • <title xml:lang="en">English Title</title>
        • <title xml:lang="fr">Titre français</title>
      • Proposed Change
        • <title>
        •    <narrative xml:lang="en">English Title</narrative>
        •    <narrative xml:lang="fr">Titre français</narrative>
        • </title>
    • Example 2
        • Current Usage
          • <sector vocabulary="RO" xml:lang="en code="456">Water and sanitation</sector>
          • <sector vocabulary="RO" xml:lang="fr" code="456">Eau et assainissement</sector>
        • Proposed Change
          • <sector vocabulary="99" code="456">
          •     <narrative xml:lang="en">Water and sanitation</narrative>
          •     <narrative xml:lang="en">Eau et assainissement</narrative>
          • </sector>
  • 0
    Avatar
    Martin Akerman

    The UN Working group on Transparency has agreed to request that we keep the explicit text associated with the reporting-org , recipient-organisation and participating-org elements.

  • 0
    Avatar
    IATI Tech Team

    Organisation identifiers are regarded by the schema as references, not codes, (@ref as opposed to @code) and text options for all organisation-related elements have NOT been removed.

Please sign in to leave a comment.