Graphical design component

r6 - 13 Apr 2006 - 16:41:41 - RogerHyamYou are here: TWiki >  TAG Web > TagMeeting1 > TagMeeting1Agenda > TagMeeting1ReportDraft

TagMeeting1ReportDraft

High Level Vision of TDWG Architecture

  • Biodiversity data will be modeled as a graph of identifiable objects.
  • The semantics of these objects will be encoded in a series of shared ontologies.
  • These ontologies will be related to each other on the basis of a shared Base and Core ontologies as a minimum.
  • A series of interface/protocols will specify how services on the network will expose objects.
  • These interfaces/protocols will preferably be adopted from existing technologies but created by TDWG if necessary.
  • Standards will define how objects should be serialized for exchange over the network.

Data Not in 'Identifiable Objects'

  • Are we interested in sharing data that is not part of an identifiable object?
  • Yes: For querying. Results of a query (SQL for example) are not identifiable objects.

Objects

  • Objects are identifable and machine readable.
  • Objects should be semantically rich but can have opaque binary components.
  • The result of resolving a GUID will be an object. There needs to be a simple way of identifying the type these objects.
  • In XML Schema spec.. supplying schema location is not required. In TDWG All XML objects must have a schema location that resolves to a standard schema that the object will validate against.
  • TDWG must permanently host schemas that are considered standards but application providers may host their own schemas if they need to.
  • All TDWG objects in RDF/S must have at least one rdfs:type property.
  • Libraries need to handle RDF embedded in XHTML.
  • Need to check that RDDL does not break libraries if we use it - later recommend its adoption.

How do we define objects 1 - XML Schema.

  • object structure must be defined as a top level element (- current Schema would have to be modified).
  • top level elements that define objects should be defined by global complexTypes - this allows automated tools to build binding code.
  • (Could be done on 'global' element variant versions of existing schemas.)
  • Whatever these top level objects are they must have a GUID attribute.
  • A standard pointer structure must be adopted to reference objects.

How do we define objects 2 - Semantic Web Technologies.

  • How do we define the standard? - An object is an instance of a class in the ontology.
  • Objects should be bounded by Concise Bounded Descriptions and identified by a GUID.
  • Anyone can make assertions about a resource. The definitive form is the one that is returned when the GUID is resolved.

Data Modeling

  • This is key to integration of TDWG standards.
  • UML accompanied by natural language description will be used to define a data model for TDWG.

Recommendations

  • Natural language descriptions accompanied by UML class diagrams will be used to define the data model.
  • There will be three levels in the ontology:
    • Base = Abstract base class and properties for all TDWG objects. (e.g. GUID, title etc)
    • Core = Extends base to define classes and properties that are common to multiple domains.
    • DomainOntologies? = Concrete classes for use.
  • One of TAG's roles is to ensure redundancy does not creep into new standards/ontologies.
  • Classes and properties within the Base and Core ontologies will have a status attribute that indicates their level of stability/adoption.
  • Recommend all subgroups present data modeling in this way.
  • Recommend all data models should extend the Base and Core ontologies and make use of existing ontologies.

Risks

  • Exchange of UML diagrams other than as pictures may be problematic because of interoperability issues between UML tools.
  • Although there are numerous commercial UML tools available the open source / free tools appear to be of lower quality.
  • It may be desirable to use modeling constructs that are not supported by UML.

Actions

  • Jessie's group to coordinate development on non-normative 'first-pass' ontology from existing schemas and make recommendation for proceeding with base and core ontologies.
  • Multiplicity relationships may be key to identifying primary objects.
  • Jessie's group to come up with recommendations for conversion of UML to semantic web and XML Schema representations.

Things we are worried about

  • changing ontologies through time. How do minimise cost and maximise interoperability. Ontology change scenarios. Thought experiment. This needs to be implimentation specific.
  • integration with GIS, SIS etc with other stuff... Can we express it as GML
  • Distributed queries will be difficult. Identify different typical types of distributed query within our community.
  • Evaluate caching - how suitable are the technologies for building big caches or thematic.

  • Building thematic caches as we think this is a primary use case.
    1. Take 2 or 3 provider-instances and make them work with the green box ontologies.
    2. Build thematic caches

GUIDs

  • There is a clear line between classes and instances ( vocabulary and data) but this line will be in different places depending on the application. Some people may consider taxon concepts as classes or descriptive terms etc...

  • We can talk about data and we can talk about vocabularies and we are referring to different sides of this line without specifying where the line is.

  • There are certain things for which LSIDs are not appropriate. It would be legal to use them for RDF resource identifiers for controlled vocabularies and XML Schema locations BUT we would have to extend existing software libraries to do this which is not desirable.
  • We recommend that LSIDs are not used for controlled vocabularies.
  • LSIDs should refer to instances.
  • LSIDs should be limited to URI not IRIs at the moment.

Exploration of Primary Objects

List of Primary TDWG Objects. Results of a discussion around what the primary objects might be and where they defined in the existing schemas. Later the decision was taken to do this formally, from the bottom up, by looking for primary objects within schemas.

  • Specimen - ABCD + DarwinCore?
  • GeoEvent? - ABCD + DarwinCore?
  • TaxonConcept? - TCS + (ABCD) + (SDD)
  • TaxonName? - ABCD + DarwinCore? + TCS + ...
  • Agent...
    • Institution - ABCD + NCD + DarwinCore? + SDD
    • Collection - NCD + ABCD + DarwinCore?
    • Person - UBIF
    • Service
  • DescriptiveTerm? (Character, CharacterState?, Modifier...) - ABCD + SDD
  • PublicationCitation? - everyone has it.
  • Description - SDD
    • SummaryData? - SDD
  • Observation - ABCD, DarwinCore? in obs group.
      • Identification - ABCD, DarwinCore?
      • Measurement - SDD
  • IdentificationKey? - SDD
  • MediaObject? - ABCD + UBIF
  • Methodology - SDD
  • MolecularStuff?
  • Phylogenies
  • Classification

Naming Problems

  • Generally confusing for non-technical people to differentiate between protocols and data formats - we should help them by using consistent clear naming.
  • We are not consistent in the way we name things or in the way we use names.
  • DiGIR? is a protocol but DiGIR2? is an application.
  • Tapir is a protocol but is usually discussed as if it were an application.
  • Suggest we use the words 'Format', 'Protocol' and 'Application' whenever a term is first mentioned or when it may clarify things. e.g. TAPIR Protocol, DarwinCore? Format, DiGIR? Protocol, DiGIR? PHP Application.

DiGIR2? => ?? PyWrapper? and Tapir

  • Interfaces?

Namespace conventions

  • http://ns.tdwg.org//
  • All lower case.
  • Only alpha numeric plus -
  • there will be a http://ns.tdwg.org/tag/core and http://ns.tdwg.org/tag/base
  • Same URL convention used for namespaces and schemaLocation
  • Debate over what should be at the namespace URL.
  • Action: survey current namespaces and probably set them up as re-directs.
  • directory listings supported in namespace directories.
  • Ricardo to look into RDDL

High priority

  • Urgent need to adopt existing harvesting protocol that could be used alongside LSID resolution to fascilitate new data providers.

Near Term Advice to Implementors

  • Do not recommend short term replacement of existing technologies as their potential replacements are not mature.
  • Recommend that any new deployments or changes to deployments address the need for migration to GUID based technologies in the near future.

Benefits of GUIDs

Recommendation that GUIDs group issue a clear set of benefits for the take up of GUIDs. (or people won't do it)
  • Linking by resolution.
  • Merging
  • Stamping Ownership
  • Tracking/Reporting usage
  • Deduplication: Object equivelence reduces to LSID equivelence.
  • Language independence

Benefits of Semantic Web Way

  • Relationships 'come for free'
  • Appears to be logical to do it at same time as GUIDs as the technologies fit so well especially as RDF is recommended return type for LSID getMetadata() call.

What is going to happen by October?

  • Jessie's group will present work on Thematic things and ontology.
  • Core ontology 'alpha' version
  • Results of LSID authority implementations.

Linking Topics

Edit | Attach | Printable | Backlinks: Web, All Webs | History: r6 < r5 < r4 < r3 < r2 | More topic actions
 
Back to TDWG Homepage TDWG Wiki > TAG
This site is powered by the TWiki collaboration platform

Valid XHTML 1.0 Transitional
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback