twiki/data/DarwinCore/WebHome.txt

%META:TOPICINFO{author="JohnWieczorek" date="1255103929" format="1.1" reprev="1.46" version="1.46"}%
---+!! Historical <noautolink>%WEB%</noautolink> wiki site. Deprecated.
------

*Note*: These Wiki pages are for historical purposes, they *do not* reflect the content of the current standard, which can be found at

* http://rs.tdwg.org/dwc/index.htm


---++ Table of Contents

---++ Definition
The Darwin Core (sometimes abbreviated as !DwC) is a standard designed to facilitate the exchange of information about the geographic occurrence of species and the existence of specimens in collections. Extensions to the Darwin Core provide a mechanism to share additional information, which may be discipline-specific, or beyond the commonly agreed upon scope of the Darwin Core itself.

   * [[BackgroundAndHistory][Background and History]]
      * [[DarwinCoreVersions][Darwin Core Versions]]
   * [[DesignAndPurpose][Design and Purpose]]
   * [[Darwin Core and Extensions]] (Concept Lists)
      * [[DarwinCoreDraftStandard][Darwin Core]]
      * [[CuratorialExtension][Curatorial Extension]]
      * [[GeospatialExtension][Geospatial Extension]]
      * [[PaleontologyElement][Paleontology Extension]]
      * [[InteractionExtension][Interaction Extension]]
   * Darwin Core Group Task Group
      * [[DwCInterestGroupCharter][Draft Charter]]
   * [[PendingIssues][Pending Issues]]
   * [[CircaNewsgroupsDiscussions][GBIF Circa Discussion (archive 29 Sep 2005 to 09 Jun 2006)]]
   * [[http://digir.net][Distributed Generic Information Retrieval]] (!DiGIR)

---++ Using this site 
If you are new to wikis, learn how to use it in Main.ThreeEasySteps. This wiki is open to anyone interested in the topic, but you must be a registered user to edit pages and add content. Register at the TWiki.TWikiRegistration page.

This site is meant to provide access to everything anyone needs to know about the Darwin Core. If you find these pages lacking in any respect, please feel free to contribute comments at the bottom of any of these pages or to contact any of the leads for the Darwin Core Interest Group  (see list below) with your concerns.

For more information, please contact any of the interest group's active leads:
   * Stanley Blum (sblum-at-calacademy-dot-org) 
   * John Wieczorek (tuco-at-berkeley-dot-edu),
   * Renato De Giovanni (renato-at-cria-dot-org-dot-br)
   * Stephen Long (diomedea-at-berkeley-dot-edu), Documentation
   * Ricardo Pereira (ricardo-at-tdwg-dot-org), System Administrator

------
---++ Comments 
Use the space below to make comments about this page. - Main.JohnWieczorek - 24 Aug 2006

------
------

%ICON{bubble}%
Subject: *Advancement of !DarwinCore on TDWG standards track postponed*     Posted by: Stan Blum (sblum-at-calacademy-dot-org)

DATE:	 Thu, 29 Sep 2005 21:02:20 +0000 (UTC)
ID: dhhkos$fdk$1@afroditi.gbif.org

[Apologies for cross-posting. I sent this earlier to the main TDWG list, but I realized later (thanks John) that some people on this list list (!DwCRev) might not be subscribed to the main TDWG list. If you know of other lists with a high proportion of people interested in !DarwinCore, please let me know. -Stan ]

A surprise at the recent TDWG annual meeting was the withdrawal of !DarwinCore2 from the ballot of standards up for recommendation. This was done because recent changes need more time for review and explanation; it does NOT mean that support for the !DarwinCore has been withdrawn, nor does it mean that we have to wait until next year to finalize the schema. With some diligent work over the next two to three months, we can finalize the !DarwinCore2, test it, and begin broader deployment. If you are a current or potential user of the !DarwinCore, please note that additional work needs to be done and your participation would be appreciated. The longer version of what transpired in St. Petersburg follows below.

On July 10th, more than 60 days before the annual meeting as prescribed by the TDWG bylaws, John Wieczorek and I posted a revised version of the !DarwinCore2 on the review web site ([[http://darwincore.calacademy.org/Documentation/DarwinCore2Draft_v1-4_HTML]]). This draft incorporated the advisable changes proposed since the draft of October 2004. We then informed the TDWG Executive Committee, that the !DarwinCore2 was ready to be considered for recommendation as a full TDWG standard.

On September 12th at the annual meeting, conveners of the respective subgroups presented their proposed standards in a plenary session. I summarized the current state of the !DarwinCore2, the changes made since last year, and the rationale behind them. The most significant change was to remove the geospatial elements from the core and place them in a geospatial extension. We did this for two reasons:

1) to follow the emerging best practice of constructing schemas to be interoperable across domains; and 

2) to improve the stability of the core by making it smaller.

Unfortunately, this change caught many people unaware, and serious questions were raised in the discussion period. These questions and my responses to them are summarized here.

1) Geospatial elements are critical to many users of the !DarwinCore, so why remove them?
*Response*: There is broad agreement among data architects, within TDWG and beyond, that the best way to achieve interoperability across domains is to import external schemas (and thereby the elements they contain), for example GML, rather than redundantly defining conceptually equivalent elements in our own namespace, as we have done in the past. An alternative we did not discuss, but will consider in the coming weeks, is whether a better strategy would be to import the GML elements directly into the core rather than putting them into an extension. In any case, we are certain that the !DarwinCore, or application schemas based on the !DarwinCore, will import their geospatial elements from GML.

2) These critical geospatial elements are now relegated to a specification that is not at the same level of maturity as the !DarwinCore. Is this a good thing to do?
*Response*: This will be inevitable if we develop our information domain by solidifying the areas with greatest commonality and defer more specialized elements to subsequent work by appropriate stakeholders.  [A point I didn't make at the meeting, but would like to make now is that GML is actually more mature, or at least more broadly deployed, than any TDWG standard. So the elements in the geospatial extension are more stable than the !DarwinCore, not less stable.]

3) Can the geospatial extension include the GML elements for lines and polygons as well as those for points?
*Response*: This will need further investigation, but would certainly be desirable in the long run. Another cautionary marker appeared the next day among the contributed papers. Roger Hyam gave a presentation about managing change among interdependent schemas. He described a situation in which version dependencies among schemas could require them to be upgraded simultaneously, which effectively eliminates the main benefit of separating them in the first place. Gregor Hagedorn challenged the generality of Roger's point saying that proper referencing of schemas could ensure the required flexibility. The issue was left open and obviously requires further study and resolution, preferably in the form of a recommendation from a group of technical architects to the groups constructing references among schemas.

As the deadline approached to open the vote on standards up for recommendation, both Adrian Rissone and Walter Berendsohn approached me (Stan) separately and asked me to withdraw the !DarwinCore from the ballot. Both of them said that people had expressed to them the opinion that the recent changes to the !DarwinCore were too large relative to the earlier draft, and too recent to warrant putting the schema up for recommendation as a standard. Although I believe the !DarwinCore would have received at least the required simple majority of votes, in the interest of broader consensus I agreed to withdraw it from the ballot. Therefore, at the beginning of the final session on Tuesday I "announced" that we had agreed to take the !DarwinCore off the ballot while the larger architectural issues were settled. I went on to explain that in my judgment the best course of action would include the following tasks:

1) convene technical experts to develop an explicit recommendation about how to incorporate elements from one schema into another (_i.e._, to address the issue Roger Hyam raised);

2) work with the various !DarwinCore user communities to determine the most appropriate allocation of elements to the core and its extensions, producing explicit lists of elements that would be available to each specialist community by constructing their schema from the core, GML, and their own extensions;

3) resolve these issues as quickly as possible (2-3 months), and fix the resulting draft as version 2.0;

4) Develop explicit instructions for upgrading providers with !DarwiCore2 and begin testing.

At the end of the discussion I asked for vocal expressions with agreement or disagreement for withdrawing the darwincore from the ballot. About 10 people voiced support for withdrawing it from the ballot, no one expressed disagreement, and most were silent (perhaps stunned). It was done, and now the work goes on. An architecture interest group is being established and will be announced on the main TDWG mailing list. Roger Hyam will lead that group until more formal arrangements are made. Finally, I want to point out some problems we have had with group dynamics. I was too busy this last year to devote sufficient time to promoting discussion and developing consensus. Things are only going to get worse for me with the need to revise the general TDWG standards development process. Therefore, John Wieczorek and Renato de Giovanni are going take over managing the !DarwinCore, though I will do my best to continue to participate. I think passive publishing (_e.g._, via a web site or wiki) is ineffective if the tempo of contributions is episodic. People stop visiting the web site when activity drops off, and they don't come back unless something is pushed into their inbox. Therefore I would like to encourage stakeholders in the !DarwinCore to subscribe to the email list. Instructions can be found at [[http://circa.gbif.net/tdwg]]. We will do our best to respond to your concerns and keep the discussion moving.

Sincerely, Stan Blum

------
%ICON{bubble}% Subject: *Comments on Stan's message to the TDWG mailing list*  Posted by: Donald Hobern (dhobern-at-gbif-dot-org)

DATE: Thu, 29 Sep 2005 07:44:01 +0000 (UTC)
ID: dhg601$nu2$1@afroditi.gbif.org

Stan has just circulated an explanation of the postponement of voting on Darwin Core.  I have a couple of comments I would like to add.

1. I believe that there are excellent architectural reasons for supporting the separation of the geospatial elements from the core.  If we consider our data as a web of interrelated information that could be represented efficiently by a set of data objects, it seems clear that one of the key object classes of interest would be the Locality.  Separation of the geospatial component of !DwC into an extension allows us to be much more flexible in the future about how we model the relationship between a taxon occurrence and its locality.  Right now the debate is around replacing proprietary elements with GML replacements, but still largely seeing these as attributes of the occurrence.  In the future we may prefer to use the restricted Darwin Core and a much simpler extension schema which might contain little more than a single element serving to identify a Locality object served by some external gazetteer service.  It is only by making this (eminently logical) separation that we can really start to experiment seriously with new and possibly more powerful ways to relate our specimens and observations to the wider world of GIS.

2. In his message, Stan mentions &#8220;a situation in which version dependencies among schemas could require them to be upgraded simultaneously, which effectively eliminates the main benefit of separating them in the first place&#8221;.  This is certainly a general issue for us to debate in regard to TDWG standards in general, but it should be completely irrelevant here.  The beauty of the !DiGIR record-based approach to data exchange is that our record structures are effectively envelopes which can transfer data using concepts from any appropriate schema.  The schemas concerned do not need to be aware of each other at all.  I would not expect the geospatial extension in any way to reference !DwC or vice versa.

Donald Hobern

Programme Officer for Data Access and Database Interoperability

Global Biodiversity Information Facility Secretariat

Universitetsparken 15, DK-2100 Copenhagen, Denmark

Tel: +45-35321483   Mobile: +45-28751483   Fax: +45-35321480

------

%ICON{bubble}%  Subject: *Re: Comments on Stan's message to the TDWG mailing list*  Posted by: Hannu Saarenmaa (hsaarenmaa-at-gbig-dot-org)

DATE: Thu, 29 Sep 2005 08:31:46 +0000 (UTC)
ID: dhg8pi$pv2$1@afroditi.gbif.org

I can second that.  However, the problem if any, is lack of understanding of how these extensions of Darwin Core in general are managed.  That is, we need guidance on how they can be proposed, how they can be used in providers, and how they will be picked up by various portals.  I hope these issues will become clear with the new TAPIR implementations and packages.

I know that there is this page [[http://darwincore.calacademy.org/Extensions/]] which is good but a lot more explanation is needed there, also with regard how to include GML elements as extensions.

Moreover, shall these extensions also be endorsed as TDWG standards?

Regards, Hannu

-------
-------

%ICON{bubble}% Subject: *modules vs extensions*  Posted by: Steven Ginzbarg (sginzbar-at-biology-dot-as-dot-ua-dot-edu)

DATE: Wed, 19 Oct 2005 17:37:52 +0000 (UTC)
ID: dj609g$dc6$1@afroditi.gbif.org

John Weiczorek wrote:
>"The basic idea is to create reusable schemas - schemas that can be used in more than one place, thereby promoting true standardization rather than re-creation based on a model. The intention is to create modules based on different classes of questions that one might want to ask of the underlying data."

An idea: What if instead of using extensions which both inherit elements and define new elements, there were modules which define elements but do not inherit them and community schemas which inherit elements but do not define them? There could be two classes of modules: Core modules, _e.g._ the Geospatial module and the !DarwinCore module, would be modules which all communities would inherit. Class modules _e.g._ the Curatorial module could be inherited by communities sharing the same type of data, in this case specimen data.

If botanists wanted a profile they could use to provide data to a portal specifically for botanical data they would first create a class module, the Botanical module, which would contain elements specific to botanical data. Then, if their portal would be dealing only with specimen data, they could create a schema that would inherit the elements of the core modules, the curatorial module and the botanical module. If both observational and specimen data would be provided to their portal, they could create a schema which would inherit the elements of the core modules, the observation/monitoring module, the curatorial module and the botanical module.

I first thought of having a third class of modules, community modules that would be the terminal "extensions". The problem is, how do you know when a module is terminal? The phycologists (algal names are also governed by the ICBN) might decide to set up their own portal. They would create a phycological module and write a schema that would which would inherit the elements of the core modules, the curatorial module, the botanical module and the phycological module. For each module the type of data handled by its elements would be clearly described. Examples could be given of which modules a community schema might want to inherit its elements from. However the only requirement would be the inheritance of the core modules.

------

%ICON{bubble}% Subject:	*Re: modules vs extensions*   Posted by: Renato De Giovanni (renato-at-cria-dot-org-dot-br)

DATE: Wed, 19 Oct 2005 20:19:58 +0000 (UTC)
ID: dj69pe$ktv$1@afroditi.gbif.org

Steven,

I agree with your comments, but I think there's a naming problem here. The current "geospatial extension" does not inherit anything from !DarwinCore, so actually it should be better called "geospatial module" as you said, or something else. It could definitely be used in conjunction with other sets of concepts completely different from !DarwinCore.

However, the curatorial extension seems to be a real !DarwinCore extension (or would it make sense to use its concepts without using !DarwinCore?). If that's true, I've just noticed that the curatorial XML Schema lacks an "import" statement to indicate that.

When you say "community schema", if I understood well you mean a schema that simply imports a set of schemas which are related to a specific community. So for instance, if a member of that community wants to configure a data provider, she could simply choose that community schema and the software should be able to automatically present data mappings associated to all underlying schemas.If that's what you meant, I think it's perfectly valid.

Regards, -- Renato

------

%ICON{bubble}% Subject:	*Re: Re: modules vs extensions*   Posted by: Steven Ginzbarg (sginzbar-at-biology-dot-as-dot-ua-dot-edu)

DATE: Wed, 19 Oct 2005 22:34:51 +0000 (UTC)
ID: dj6hmb$s4v$1@afroditi.gbif.org
 
Renato De Giovanni wrote: 
>"... the curatorial extension seems to be a real !DarwinCore extension (or would it make sense to use its concepts without using !DarwinCore?)."

No, it wouldn't make sense. It wouldn't make sense to use it without the Geospatial module either. I suggested that the Darwin Core module and the Geospatial module be considered core modules which would be required to be inherited. The inheritance would be directly by a community schema rather than second hand because they were inherited by the Curatorial module. Just as certain elements in a profile are not nullable, certain modules, the core modules, Geospatial and !DarwinCore, could be required to be inherited. 

>"If that's true, I've just noticed that the curatorial XML Schema lacks an "import" statement to indicate that."

The current Curatorial XML Schema, v.1.4 does have an import statement:  [[http://digir.net/schema/protocol/2003/1.0"schemaLocation=]]  [[http://digir.sourceforge.net/schema/protocol/2003/1.0/digir.xsd/]] I'm suggesting that it not have one.

>"When you say 'community schema', if I understood well you mean a schema that simply imports a set of schemas which are related to a specific community. So for instance, if a member of that community wants to configure a data provider, she could simply choose that community schema and the software should be able to automatically present data mappings associated to all underlying schemas.  If that's what you meant, I think it's perfectly valid."

That is what I meant. In addition, the community schema would only import modules and would not define any additional elements. For instance, if I wanted to set up a portal for botanical data, I would first create a botanical module containing the new elements needed to supplement those of existing modules. Then I would create a community schema which only imported modules, _e.g._ Geospatial, !DarwinCore, Curatorial, and Botanical. If I had defined the supplemental fields and imported modules in the same community schema, a phycological portal that wanted to inherit the supplemental botanical fields would be forced to also inherit the same modules as the botanical schema did. I think keeping element definitions and import of modules separate will allow for greater flexibility.

Steve

------

%ICON{bubble}% Subject:	 *Re: Re: Re: modules vs extensions*   Posted by: Renato De Giovanni (renato-at-cria-dot-org-dot-br)

DATE: Thu, 20 Oct 2005 18:26:23 +0000 (UTC)
ID: dj8ngf$upl$1@afroditi.gbif.org

Hi Steve,

The import statement you pasted refers to the !DiGIR schema, not !DarwinCore. Importing from !DarwinCore would be a way of indicating the inheritance I mentioned.

So community schemas would actually be a kind of data profile for specific groups. Even if those schemas will need to be revisioned after any change in one of the imported modules, they could definitely help to keep things organized and to ease data providers life. But data providers could still be free to individually pick up the modules they want and build the puzzle from scratch, which is also OK, I think.

Regards,

Renato

------

%ICON{bubble}% Subject:	 *Re: Re: Re: Re: modules vs extensions*   Posted by: Steven Ginzbarg (sginzbar-at-biology-dot-as-dot-ua-dot-edu)

DATE: Thu, 20 Oct 2005 19:42:55 +0000 (UTC)
ID: dj8rvv$2h3$1@afroditi.gbif.org

Renato wrote: 
>"The import statement you pasted [from the Curatorial Extension] refers to the !DiGIR schema, not !DarwinCore. Importing from !DarwinCore would be a way of indicating the inheritance I mentioned."

There may be community schemas that import the Observation/Monitoring module and do not import the Curatorial module. To insure that these community schemas also received the core elements (!DarwinCore and Geospatial modules), they would have to be imported by the Observation/Monitoring module as well as by the Curatorial module.

I suggested that only the !DarwinCore and Geospatial modules be required. If these are imported by the Curatorial and Observation/Monitoring modules rather than directly by community schemas then the Curatorial and Observation/Monitoring modules would become a new class of module, one of which would be required. If a community schema wanted to support both specimen and observational data they would import both the Curatorial and Observation/Monitoring modules and would end up with two copies of the core elements.

Steve

------
%ICON{bubble}% Subject:	 *RE: Re: Re: Re: modules vs extensions*   Posted by: Donald Hobern (dhobern-at-gbif-dot-org)

DATE: Thu, 20 Oct 2005 21:26:20 +0000 (UTC)
ID: dj921s$7e1$1@afroditi.gbif.org

Steve wrote:
> If a community schema wanted to support both specimen and&#8230; <snip>

Precisely. If we keep all modules at the same level, as proposed sets of attributes that applications may then choose to use, we avoid any problems with multiple imports (potentially of different versions) and with versioning in general.

Different communities then define the sets of modules that they wish to use for their data exchange.

The current use of Darwin Core follows exactly this model. Darwin Core is not a definition of a specimen record. It is the definition of a set of attributes which are useful when describing most specimens. The inclusion of these attributes and of other attributes from the various extensions (or better, modules) takes place in another block of schema definition when a format for the !DiGIR/TAPIR response record is defined.

To illustrate this, the Darwin Core 1.2 schema is at:  [[http://digir.sourceforge.net/schema/conceptual/darwin/2003/1.0/darwin2.xsd]]

One popular schema that uses the Darwin Core 1.2 attributes to define a specimen record is at:  [[http://digir.sourceforge.net/schema/conceptual/darwin/full/2003/1.0/darwin2full.xsd]]

The beauty of !DiGIR/TAPIR (and of this type of XML data exchange in general) is that different communities can construct their own record definitions bringing together Darwin Core, any of the standard modules, and any locally-defined modules in any way that will meet their purposes. Their data providers will then be exposing a corresponding set of attributes. Other communities (which may use different record definitions and a different set of modules) will still be able to query these providers for whatever subset of the data interests them both.

Provided we build the tools correctly (and especially if we back them up with an ontology into which the attributes from the modules get defined), we will all be able to discover and use data from a bewildering range of sources. I believe that modularisation of content attributes is essential to making us really successful in developing biodiversity informatics and that it can open up really exciting networking possibilities for us all.

Donald Hobern

------

%ICON{bubble}% Subject:	 *Re: RE: Re: Re: Re: modules vs extensions*   Posted by: Steven Ginzbarg (sginzbar-at-biology-dot-as-dot-ua-dot-edu)

DATE: Fri, 21 Oct 2005 18:01:54 +0000 (UTC)
ID: djbaei$7i9$1@afroditi.gbif.org
 
Donald Hobern wrote:
>"The current use of Darwin Core follows exactly this model. Darwin Core is not a definition of a specimen record. It is the definition of a set of attributes which are useful when describing most specimens. The inclusion of these attributes and of other attributes from the various extensions (or better, modules) takes place in another block of schema definition when a format for the !DiGIR/TAPIR response record is defined... <snip>

The first schema is what I am calling a module. It defines elements but doesn't import any other modules. The second schema is what I am calling a community schema. It imports modules (in this case one module) and doesn't define any new elements. The community schema imports the module: [[http://digir.net/schema/conceptual/darwin/2003/1.0"schemaLocation="]]

[[http://digir.sourceforge.net/schema/conceptual/darwin/2003/1.0/darwin2.xsd&#8221;]]

It assigns the alias "darwin" to the imported namespace: xmlns:darwin=[[http://digir.net/schema/conceptual/darwin/2003/1.0]]

The block of schema definition where the format for the !DiGIR/TAPIR response record is defined contains no references to newly described elements. It contains only references to elements in the imported module. These are preceded by the alias "darwin": The separation of module import from element definition maximizes flexibility for community standards in selecting elements that meet the needs of their community. In this case all elements of the module are included in the format of the response record except for the !BoundingBox element.

The author of the community schema could have also omitted the element !ScientificName from the format of the response record. The author didn't do so because he/she knew that !ScientificName was not nullable. No import specification forced the inclusion of !ScientificName in the format of the the response record. Similarly, I think that a simple statement that the !DarwinCore module and the Geospatial module are core modules which will be imported by all !DarwinCore community schemas will be sufficient without invoking import specifications to force their inclusion. "Provided we build the tools correctly (and especially if we back them up with an ontology into which the attributes from the modules get defined), we will all be able to discover and use data from a bewildering range of sources." Can you explain what you have in mind by an ontology? Versioning?

------

%ICON{bubble}% Subject:	*Re: Re: RE: Re: Re: Re: modules vs extensions*     Posted by: Donald Hobern (dhobern-at-gbif-dot-org)

DATE: Fri, 21 Oct 2005 21:39:06 +0000 (UTC)
ID: djbn5q$k47$1@afroditi.gbif.org
 
I wrote:
>"Provided we build the tools correctly (and especially if we back them up with an ontology into which the attributes from the modules get defined), we will all be able to discover and use datafrom a bewildering range of sources." 

Steve replied: 
>"Can you explain what you have in mind by an ontology? Versioning?"

By an ontology I mean an external queriable system which provides machine-readable information about the classes of object of interest to our community and the relationships between them, as well as their attributes. Such a system would allow us to define that a _specimen_ belongs-to a _collection_ and that _specimens_ have an _identification_ and that an _identification_ can be realised as a Darwin Core !ScientificName (or some other representation). This allows software applications to reason about the relationships between data from different sources and determine the applicability of data for their needs. It would allow us to model the relationships between the objects of interest to different sub-communities. I hope this clarifies a little.

Donald

------

%ICON{bubble}% Subject:	 *Re: modules vs extensions*     Posted by: Donald Hobern (dhobern-at-gbif-dot-org)

DATE: Wed, 19 Oct 2005 20:39:27 +0000 (UTC)
ID: dj6atu$llt$1@afroditi.gbif.org

Renato wrote: 
> &#8220;However, the curatorial extension seems to be a real !DarwinCore extension (or would it make sense to use its concepts without using !DarwinCore?).&#8221;

I firmly believe that we should leave all &#8220;extension&#8221; schemas as separate modules that allow providers to select the set that best meets their needs without having to worry about import relationships.  This is very much in the spirit of !DiGIR and TAPIR and will allow the greatest flexibility.  It allows for the possibility of picking up upgrades to Darwin Core without having to reversion the curatorial extension.  It also allows for the possibility that the curatorial extension may be used alongside alternate representations such as some of the ABCD elements.

I think that Steven Ginzberg&#8217;s idea is also important.  As Renato noted, the existing &#8220;extensions&#8221; are really modules in Steven&#8217;s sense.  In the long run TDWG should develop a set of schemas of this type, but also publish documents and/or schemas that define recommended uses of these different schemas for different communities and different purposes.

Donald Hobern 

------

%ICON{bubble}% Subject:	 *Re: modules vs extensions*   Posted by: Renato De Giovanni (renato-at-cria-dot-org-dot-br)

DATE: Thu, 20 Oct 2005 17:28:34 +0000 (UTC)
ID: dj8k42$sff$1@afroditi.gbif.org

Hi Donald,

On 19 Oct 2005 at 22:39, dhobern@gbif.org wrote:
> I firmly believe that we should leave all extension schemas as... <snip>

That's a very important point. Inheritance relationships will force reversioning of all "sub modules" when a "module" changes.  So I tend to agree with you that loose coupling here could bring significant benefits.

Regards,
Renato

------
------
%ICON{bubble}% Subject:	 *RE: Cancellation of GBIF IG EG meetingn - a few points*   Posted by: Patricia Mergen (p_mergen-at-yahoo-dot-com)

DATE: Mon, 31 Oct 2005 06:35:21 +0000 (UTC)
ID: dk4dv9$s83$1@afroditi.gbif.org

The change to include authors in !DwC 2 was on my suggestion. It was for compatibility with ABCD where the corresponding field, <noautolink>TaxonIdentified/ScientificName/FullScientificNameString</noautolink> has documentation:
Concatenated scientific name, preferably formed in accordance with a Code of Nomenclature, _i. e._ a monomial, bionomial, or trinomial plus author(s) or author team(s) and - where relevant - year, or the name of a cultivar or cultivar group, or a hybrid formula, as fully as possible.

The rational was that the atomized elements are available for searching which leaves the !ScientificName field free to provide the verbatim taxonomic identification as completely as possible.

There has been the following subsequent discussion in the comments below the !DwC2 Documentation HTML table, [[http://darwincore.calacademy.org/Documentation/DarwinCore2Draft_v1-4_HTML:]] Scientific Name element.

Posted by munrodb at 2005-08-10 07:50 AM
The older versions of Darwin core required the full name of the lowest level of the taxon only and did not include the author and year etc as suggested in the current draft. I strongly recommend retaining the older definition (without author year etc.). On our CBIF !DiGIR portal we hyperlink the !ScientificName field to our ITIS nomenclator and search will fail (we only search on the taxon name). As far as I have seen most online nomenclators work this way. Adding the authorship will required alot of rewriting of nomenclators. The authorship can be presented in a separate element and it is very easy to concatonate the two together in applications.


##############

Re: !ScientificName

Posted by sginzbar at 2005-11-20 03:11 PM

If the parts of the name are provided, can the portal concatenate them into a new searchable field !ScientificNameWithoutAuthors? If not, then perhaps two elements are needed: the current !ScientificName renamed !TaxonomicIdentificationVerbatim which would correspond to ABCD 2.06 !FullScientificNameString and a !ScientificNameWithoutAuthors element.

###############

From: charlie Lapham lapham@scrtc.com
Sent: Thursday, June 08, 2006 12:24 PM
To: Ginzbarg, Steve
Subject: FW: !ScientificName

I&#8217;m forwarding a copy of a dialog I have going with Larry Spears and Arthur Chapman of GBIF. It has to do with what I consider excessive flexibility in the !DwC !ScientificName field.

Charles J. Lapham

16 Winn School Rd

Glasgow, KY 42141

(270)-646-4060

lapham@scrtc.com

################

From: Larry Speers lspeers@gbif.org
Sent: Thursday, June 08, 2006 2:59 AM
To: 'Charlie Lapham'
Subject: RE: !ScientificName

Charlie-

Thanks for the explanation. Both Arthur and I agree you have raised some very interesting points and I will forward your query to the !DwC team and to the data interoperability group here at the secretariat for comment. ...

Larry Speers

Senior Programme Officer

Digitization of Natural History Collections

Global Biodiversity Information Facility

Universitetsparken 15,

2100 Copenhagen Ø

Denmark

Tel: +45 35 32 14 75

Fax: +45 35 32 14 80

Email: lspeers@gbif.org

###########################

From: charlie Lapham lapham@scrtc.com
Sent: 07 June 2006 17:59
To: 'Larry Speers'
Subject: RE: !ScientificName

The only required taxonomic filed in !DwC is the !ScientificName. Atomized data may, or may not, be provided. I agree with Arthur it should be and SERNEC will be doing this, but lots of folks aren&#8217;t.

My question is: Should this !ScientificName string include the author, or not, or does it even matter?.

I am building a global IUCN T&E taxa table so folks can know what to dummy up for the net. Arthur Chapman&#8217;s how to dummy data up for the net project needs to have a companion what to dummy up list to go with it. ... As things sit now we will need taxon without author (ICUN format), taxon with single author (ITIS format) and taxon with multiple authors (Plants format) to match what your providers are providing as you illustrated in your example. We will be linking taxon and country (ISO 3166 list) to set a rare taxa flag. ...

Providers want to be able to use what they have which varies with the institution. TDWG is currently on their side and their definition is wide open. If we don&#8217;t restrict some of these options, the current complications regarding taxon searches will become permanent. ... I thought !ScientificName was being slightly restricted when the GBIF data cleaner (proposed, as I recall), was flagging taxon with author as an error in one of the examples Arthur Chapman used.

Charles J. Lapham

##########################

From: Larry Speers lspeers@gbif.org
Sent: Wednesday, June 07, 2006 7:16 AM
To: lapham@scrtc.com
Subject: !ScientificName

Charles-

Dr. Edwards asked me to respond to your query concerning &#8216;best practices&#8217; as to representing scientific names in the !DwC. I apologize for the delayed reply but I wanted to check with Arthur and the !DwC team to make sure we are on the same page.

I&#8217;m not too sure just what you are referring to as how the workshop at the SPNHC disagrees with the current !DwC documentation.

As Arthur responded to my query:  &#8220;From my point of view and what I advocate in the Data Cleaning (and as given in Albuquerque at SPNHC) one should always atomize the taxonomic information into separate <noautolink>Genus/Species/Rank/Infraspecies/Author</noautolink> fields etc. wherever possible. As I see DWC2 &#8211; the Scientific Name Field as given in the original enquiry is a concatenation of the other fields - _e.g._ Family, Genus, !SpecificEpithet !InfraspecificRank, !InfraspecificEpithet, !AuthorYearOfScientificName, etc. Because there is this atomization - I don't see !DwC2 at anyway in conflict with what I am advocating - just the opposite - it supports it. I would always advocate both storing information and exchanging it in atomized form wherever possible. Any exchange in concatenated form would be additional information.&#8221;

There was a change from !DwC 1 and !DcW 2 and the inclusion of the authority information but I have not spoken to the developers as to the reasons for this change.  When I posed your question to Dave Remsen from UBIO he was wondering if you were referring to author information associated with different levels of the hierarchy.

As he pointed out:

&#8220;Many non-animal subspecific combinations cite authorship for both species and subspecific epithets and in this case separation of name and authority results in a loss of the species-level authority. Do plant people have a problem with this loss of information when it's passed?

ITIS refers to _Quercus agrifolia_ var. _oxyadenia_ (Torr.) J.T. Howell

USDA plants refers to _Quercus agrifolia_ Née var. _oxyadenia_ (Torr.) J.T. Howell.

ITIS refers to _Acacia angustissima_ var. _cuspidata_ (Schlecht.) L. Benson

USDA plants refers to _Acacia angustissima_ (P. Mill.) Kuntze var. _cuspidata_ (Schlecht.) L. Benson  

This variation is accepted in !DwC2. &#8230;

Sincerely, 
Larry Speers

###############################

From: charlie Lapham lapham@scrtc.com
Sent: Monday, June 05, 2006 4:38 PM
To: gbif@gbif.org
Subject: !ScientificName

Is the best practice to include the author in the scientific name or is it best not to include the
author in the name?

The !DwC documentation at calacademy.org disagrees with Arthur Chapman&#8217;s error checking
presentation at SPHNC.

Charles J. Lapham
------
---++Subject:	 Separating Spatial Data
Subject	 RE: Separating spatial data
 FROM	 "Mary Barkworth" <Mary@biology.usu.edu>
 DATE	 Sun, 2 Oct 2005 12:49:01 +0000 (UTC)
 ID	 <dhokvt$3ru$1@afroditi.gbif.org>

Thank you. As someone who is really longing to be able to look at co-occurrences of various critturs (seed dspersal critturs) and plants, I am very interested in the spatial data. What I also want to do, to the extent that it is feasible, it modify the specimen database (we use a modification of IK) so that we start capturing the information people are looking for now even if we have to modify its export for GBIF down the road. I know that there are some fields that we must add but our top priority has been taking care of a more urgent matter. But there is no way we want to go back through the specimens to add information if we can start putting it in now.. Steve, because we have modified IK, we have not installed the latest upgrades for fear we would then lose our own changes. This may mean that we are not storing information that other users of IK are storing. Our biggest problem was revising it to allow for data entry from multiple sites. At least, I think that has been our biggest problem.

Russell: Please would you send Steve and Charlie a copy of IK as we use it - without data and unlocked? I tried sending one to Charlie this summer but it was locked so he could not get into it. You might add a note as to what you have changed and what you are working on.

Mary
-------
---+++ Re: Separating spatial data
Subject	 Re: Separating spatial data
 FROM	 "WIECZOREK, John R." tuco@berkeley.edu
 DATE	 Sun, 2 Oct 2005 00:11:23 +0000 (UTC)
 ID	 dhn8jb$nus$1@afroditi.gbif.org
 Steven, Mary, and all,

There are a number of reasons for separating the geospatial components from the Darwin Core. Some of them Mary already guessed.

The basic idea is to create reusable schemas - schemas that can be used in more than one place, thereby promoting true standardization rather than re-creation based on a model. The intention is to create modules based on different classes of questions that one might want to ask of the underlying data. The curatorial extension to the Darwin Core (CuratorialExtension), for example, was an attempt to create a module for information of interest to curators and those interested in the physical specimens without clogging the Darwin Core with concepts of limited interest. In this early extension proposal, the proposed elements were conceptually related, but the scope of interest in the concepts was limited to one aspect of the broader discipline of biodiversity informatics - specimen curation.

The geospatial extension (GeospatialElement) also consists of a set of related concepts. A major difference between this extension and the curatorial one is the fact that the geospatial extension represents a set of concepts that is of nearly universal interest - the representation of place in a way that can be used analytically. Some of these concepts (Latitude and Longitude) aren't new - they've been in the Core since the Species Analyst days. The removal of the coordinate information from the core caused a stir, and ultimately caused the retraction of the Darwin Core 2 as a proposed standard at this time. This is largely due to new thinking about standards and schema construction, which may be at odds with the mixed purpose and design goals of the Darwin Core:

"The Darwin Core is a specification of data concepts and structure intended to support the retrieval and integration of primary data that documents the occurrence of organisms in space and time and the occurrence of organisms in biological collections." (from DesignAndPurpose)

The basic problem is whether the Darwin Core should be that one minimalist answer to our common interests (albeit incomplete for everyone) or the basis for building areas of common interest. The struggle is reflected in the desire on one hand to have everything important in the Core itself, and on the other hand to have everything built from solidly constructed building blocks that can be used wherever they may be needed.

What I've represented here is not definitive, it's just one line of current thinking about conceptual architecture and how best to build and maintain libraries of schemas that represent our concept standards. TDWG will soon assemble an Architecture Group to discover and recommend a best practice in this regard. One clear outcome of the latest TDWG meeting is the need not only for this recommendation, but also for clear and concise documentation for the use and reuse of conceptual schemas so that special interests can present their special data well and share their common data broadly.

A final note for Mary, then, is that all of this talk of architecture is likely to have less of an impact on those providing data than on those who figure out how. Actually, that is part of the point. If we get the architecture right, then the standards should be easier to share and maintain, and providers will have to mess around less to keep their systems running in perpetuity. At least that's where I like to ground myself in the grand scheme.

I hope that helps.

John

On Sat, 1 Oct 2005 23:15:11 +0200 (CEST)

sginzbar@biology.as.ua.edu wrote:

>> -----Original Message-----
>> From: Mary Barkworth [Mary@biology.usu.edu
>> Sent: Saturday, October 01, 2005 11:38 AM
>> To: Ginzbarg, Steve
>> Subject: RE: [HERBARIA] new collaborative website
>>
>> What is the reason for
>> separating spatial data from record data? Concordance with
>> ABCD or visions of automating the process of georeferencing?
> At this point it's too technical for me to fully understand, but the idea is to write the
> !GeoSpatial schema in Geography Markup Language (GML), see
> http://en.wikipedia.org/wiki/Geography_Markup_Language, a standard used outside of !DwC and ABCD.
> A GML schema wraps geospatial properties in a geospatial object that could be served by a Web
>Feature Service (WFS), see http://en.wikipedia.org/wiki/Web_Feature_Service.
> If I understood what this meant I could use less jargon. Sorry.
>
>> If the latter, I trust the existing record - which may - and
>> only may - be more accurate - will not be affected.
>> Mary
------

------