SDMX Users Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

Author Topic: SDMX codes do not allow spaces and dots  (Read 13665 times)

Erik van Ingen - FAO

  • Newbie
  • *
  • Posts: 35
  • OpenSDMX
    • View Profile
    • OpenSDMX
SDMX codes do not allow spaces and dots
« on: December 22, 2009, 03:39:14 AM »

Currently the specs do say this:
IDType: IDType provides a type which is used for restricting the characters in codes and IDs throughout all SDMX-ML messages. Valid characters include A-Z, a-z, @, 0-9, _, -, $.

Which means dots and spaces are excluded. For instance for fish species we are using Latin names which are unique and are used as code and id as well. For geographical layers we are using codes with dots to indicate different levels.

What to do? Should I replace them with underscores? Or wait until the next version of SDMX, which may allow or not dots and spaces?
Logged

Duraid Abbas - UNESCO IS

  • Newbie
  • *
  • Posts: 6
    • View Profile
Re: SDMX codes do not allow spaces and dots
« Reply #1 on: December 22, 2009, 05:30:03 AM »

Is there a reason why you're using the same value for both the id and the name?

ID values should be more succinct. for example for countries, the id should be something like the ISO code and the name should contain the full country name. Maybe you should do the same for the fish species code list.

here is a example of the county list:

<structure:CodeList id="CL_REF_AREA" agencyID="MY_AGENCY">
<structure:Name xml:lang="en">Reference Area</structure:Name>
<structure:Code value = "EU">
<structureescription xml:lang="en">European Union</structureescription>
</structure:Code>
<structure:Code value = "AT">
<structureescription xml:lang="en">Austria</structureescription>
</structure:Code>
<structure:Code value = "BE">
<structureescription xml:lang="en">Belgium</structureescription>
</structure:Code>
<structure:Code value = "DE">
<structureescription xml:lang="en">Germany</structureescription>
</structure:Code>
<structure:Code value = "DK">
....
Logged

San Cannon - FRB

  • Global Moderator
  • Newbie
  • *****
  • Posts: 2
    • View Profile
Re: SDMX codes do not allow spaces and dots
« Reply #2 on: December 22, 2009, 10:53:38 AM »

Currently the specs do say this:
IDType: IDType provides a type which is used for restricting the characters in codes and IDs throughout all SDMX-ML messages. Valid characters include A-Z, a-z, @, 0-9, _, -, $.

Which means dots and spaces are excluded. For instance for fish species we are using Latin names which are unique and are used as code and id as well. For geographical layers we are using codes with dots to indicate different levels.

What to do? Should I replace them with underscores? Or wait until the next version of SDMX, which may allow or not dots and spaces?

We're facing a similar issue in moving from SDMX1.0 to SDMX2.0.  We have very good reasons for wanting to use dots in our codes but the standards folks had their own good reason for excluding it (which they have told me but I can't seem to remember).  So we are using the underscore instead as it is unlikely that the ability to use a dot (or any of the other special characters we wanted to use) will be back in the next version.  I can't speak for the developers themselves but my read is that we aren't getting the dot back.

As for why the code and text might be the same - there are good reasons why that might be desireable.  We have a lengthy list of units (300+) and I find arbitrary numbers for the codes to be less than useless so we tried to make the codes have some mnemonic meaning: they were stripped down versions of the text so that users could know immediately what the unit was without having to look it up on a code list.  This scheme has become more problematic with the loss of several delimiter characters for the code value.

Logged
Re: SDMX codes do not allow spaces and dots
« Reply #3 on: December 29, 2009, 06:14:22 AM »

The restriction on the IDType allows for the value to comply with the URN specification - and the reason that "." is not allowed to appear is that SDMX is using this as a separator in the URN syntax. Every identifiable artifact is technically fully identified by its full URN or its components.

As for the code values, this has been raised as an issue and was suggested that a separate code value be allowed in addition to the code id (for display purposes). This code value would be much less restricted, allowing spaces, dots, etc.
Logged

Xavier Sosnovsky - ECB

  • Newbie
  • *
  • Posts: 23
    • View Profile
Re: SDMX codes do not allow spaces and dots
« Reply #4 on: January 05, 2010, 06:42:56 AM »

The restriction on the IDType allows for the value to comply with the URN specification

Hi J,

I have two comments regarding this approach:
- Identifiable artefacts have (among others) a urn attribute. Would that not be the appropriate place for storing the URN (rather than the id attribute)? Of course, nothing should prevent someone from using a URN as the id for an artefact, but why limiting the set of valid characters for the id attribute to those allowed by the URN syntax, when there is already a urn attribute that can (should?) be used for that purpose [Edited: clarified by Ken & Pascal posts below]?
- The IDType is also used for the ID element in the message header. In a RESTful scenario, where all the parameters for a query are stored in the URL, it might be seen as a good option to use the URL for the query as the ID for the returned message (for example, if users have questions about the data they downloaded, it's very easy to see the query they performed to get the data, as this would be displayed as the message ID. This information could be very helpful for debugging purposes). However, this is not possible, as characters allowed in the URL are not allowed in the IDType. Is there any reason why the id in the message header needs to be restricted to the characters allowed in the URN syntax?

Many thanks,

Xavier
« Last Edit: January 13, 2010, 07:12:26 AM by Xavier Sosnovsky - ECB »
Logged

kenagross

  • Newbie
  • *
  • Posts: 2
    • View Profile
Re: SDMX codes do not allow spaces and dots
« Reply #5 on: January 06, 2010, 04:51:45 AM »

Xavier,

My understanding of registries is limited, however, I think the issue is not storing the URN in the id attribute, but using the id attribute value to construct a URN that identifies the object. The URN is constructed with dots to separate the levels of the URN hierarchy. If this is what you meant, and I misunderstood you, I apologize.

If you are saying, however, that we should store a URN in the urn attribute for an object, and leave the code or message id to use whatever value we choose, I would agree with that. The issue here would be that the object identifier on the URN would be different than the id attribute value (again because the id attribute value would contain a dot, and the URN would not).

Ken

 
Logged

Pascal Heus - Metadata Technology

  • Administrator
  • Newbie
  • *****
  • Posts: 3
    • View Profile
Re: SDMX codes do not allow spaces and dots
« Reply #6 on: January 12, 2010, 07:15:45 AM »

Xavier:
the @id is a component of the URN (constructed from the agency/version/id) and therefore its value must comply to the URN syntax.
In the latest version of the DDI3 specifications, we have introduced a repeatable UserId element that can be attached to any Identifiable and allows for custom identifiers (organization or implementation specific).
best
*P
Logged

Xavier Sosnovsky - ECB

  • Newbie
  • *
  • Posts: 23
    • View Profile
Re: SDMX codes do not allow spaces and dots
« Reply #7 on: January 13, 2010, 07:11:27 AM »

Ken, Pascal: Thanks for the input and clarification :).

Actually, the problem we reported a while ago was not with the id of identifiable artefacts (the ids we use for our structural metadata are compliant with the IDType syntax) but with the ID element of the message Header (comment 2 in my inital post). Is there any reason why the message ID needs to comply with the URN syntax? We find it to be too restrictive, specially in a RESTful web service scenario, where it would make a lot of sense to use the query URL as message ID...

Thanks!

Logged