ForwardCompatibilityAndScientificDataExchange
Context: At the
TagMeeting1 we discussed the need that TAG standards should be completely forward and backwards compatible as a requirement of a bioscience data architecture. Backward compatible meaning new software can read old documents,
forward compatible meaning old software can read any newer version of data. The problematic issue is forward compatibility. Technically document formats that achieve forward compatibility can be created both in w3c xml schema and using RDF/OWL technologies, the latter being more flexible and often more intuitive. Although forward compatibility may in many situations a desirable feature, it occurs to me that we have not properly discussed the exact expected behaviour and trade-ins.
Forward compatible may mean that a data structure is defined in a way that all future requirements are already anticipated. This causes no cost, but is highly unlikely to achieve.
The kind of forward compatibility discussed at the
TagMeeting1 is that old software only evaluates previously defined datastructures from newer documents, ignoring any data defined in the newer, unknown format. This is more realistic, but I believe its desirability depends on the use case.
In an aggregator/indexing/search use case this is probably desirable, since the results are usually not viewed as scientific data, but as "tentatively proposed" search results. Having forward compatibility here means software installations are more stable.
In contrast, in a scientific data exchange situation this can be highly undesirable. Imagine that Adobe would make all future versions of Acrobat PDF Reader software forward compatible in this sense. Future versions of PDF would be displayed in older software, they would look ok, but part of the data would be missing - without the consumer noticing this. This would certainly undermine the trust in the data format. PDF has reached its acceptance despite all its inappropriateness for digital exchange, because it is known to trustfully mirror the reliability of paper-printed information exchange.
Note that in the scenarios discussed the xml data format (w3c schema or RDF/OWL limited to a bounded description defined by a single provider) would always contain the complete data. However, I understand that human users should not experience raw xml, but that access should be provided through software clients. With current technology (inability to communicate methods together with new data structures) it seems to me unlikely that it is possible to write clients that are able to appropriately communicate the content of newly added features to humans.
Clients could with little effort warn users that the data they see or base their analysis upon may be misleading, because the client understands it only incompletely. I believe however t will be very hard to analyze when partial information should be displayed, and when it will actually lead to entirely wrong conclusions. The difficulty to manage this both by software and human consumers may easily undermine trust in the entire class of data sources.
--
GregorHagedorn - 24 Apr 2006
Linking Topics