Graphical design component

r3 - 21 Aug 2007 - 04:14:55 - BobMorrisYou are here: TWiki >  InvasiveSpecies Web > AgadirMeeting > TerminologyWorkingGroup > SampleDefinitions

Some Sample Definitions

(by Kevin Thiele)

Source       K.R. Thiele   CBD   Thomson 1991 (from GISP)   GISP
Term       Alien   Invasive   Introduced   Native   Adventive   Cryptogenic   Alien   Invasive   Invasive
Origin                                        
    Outside polygon of interest   Yes   Yes   Yes       Yes       Yes   Yes   Yes
    Inside polygon of interest               Yes                    
    Not relevant                                    
    Not known                       Yes            
Human agent in range change?                                        
    Yes   Yes   Yes           Yes               Yes
    No                                    
    Not relevant           Yes               Yes   Yes    
    Not known                       Yes            
    Not applicable           Yes                        
Impact                                        
    Has negative impact (at least sometimes)                                    
    Does not have negative impact                                    
    Not relevant   Yes   Yes   Yes   Yes   Yes   Yes   Yes   Yes   Yes
    Not known                                    
Establishment                                        
    Established       Yes       Yes       Yes       Yes   Yes
    Not established                                    
    Not relevant   Yes       Yes       Yes       Yes        
    Not known                                    
Geographic Range                                        
    Widespread and/or spreading       Yes                       Yes   Yes
    Not widespread, not spreading                   Yes                
    Not relevant   Yes       Yes   Yes       Yes   Yes        
    Not known                                    
Ecological Range                                        
    Restricted to human-modified habitats                                    
    Occurs in natural habitats       Yes                           Yes
    Not relevant   Yes       Yes   Yes   Yes   Yes   Yes   Yes    
    Not known                                    


Commentary

BobMorris wrote this commentary on Kevin's scheme in email to AnnieSimpson and KevinThiele.

Kevin's spreadsheet is a representation of the first part of an ontology, namely the terminology. It's enough for humans, but machines usually need a little more to make good use of the ontology in deciding whether two databases use of a term are comparable for various purposes. Nevertheless, specialists could and should start with it because it is well thought out, easy to understand, and at least as much as will ultimately be needed to resolve terminological differences between databases. Note that there are in fact four ontologies illustrated here: "K.R. Thiele", "CBD" , "Thomson 1991 (from GISP)" and "GISP". The point of Kevin's scheme is to make them comparable, as much as possible.

His spreadsheet expresses "concepts" (e.g. 'invasive', 'alien', etc.), by telling the values they have for a small set of properties ('Origin', 'Impact', ...). For each property, there are a small set of possible values to be used in these descriptions. Kevin's hope, which is probably very reasonable, is that with enough discussion, the discussants will agree that a small list of properties and values are expressive enough that most data providers can express the list of concepts coded in their own databases using just those properties and values the community agrees upon. Thus,rather than requiring the community of data providers (and consumers) to agree on a a single set of concepts, they instead agree on a set of terms that are used to express concepts. In turn, application writers can decide for themselves how to rationalize comparisons of those concepts. I might write an application that enshrines a notion like "If two concepts agree on the majority of their property-value pairs, the program will treat them as the same". You might write an application that is more conservative and enshrines a requirement that two concepts must agree on 80% of their property-value pairs before considering them the same concept. Depending on the ultimate representation of these property-value lists, more sophisticated machine reasoning is possible than just conclusions of the form "For my purposes I can assume that CBD 'invasive' is equivalent to Thiele 'alien'+'invasive'".

I understand that Kevin believes his property-value terms are a start, not an end, for discussion among the specialists and other interested parties, but regardless of this there are a few technical issues about how and where to represent such ontologies that the biologists shouldn't take a position about. In particular, there are arguments about whether the property-value terms belong in the schema or in some external document, and whether particular concept lists (the "definitions part of a database") belong in the data to be exchanged or in external documents. Those debates should not hinder specialists from discussing Kevin's notions and initially representing them in a spreadsheet as he has. The most important hidden technical issue is the extent to which proliferating terms also proliferates computation time unacceptably. What one desires is that if the terminology is increased N-fold, the time that applications need to compare data from different databases increases at most N-fold. Alas, it is easy in the ontology game for proliferation of terminology to become exponential in computation. For example, 10 times as many terms and concepts could lead not to 10 times as much computation, but 2 to the 10th power---1024 times as much computation. So there will always be a tension between adequacy of expression and small size---the two central disiderata of ontologies. There is a little bit of mathematical ontology theory that allows one to tell whether you are in this exponential situation, but I have the impression that for the few interesting ontologies coded in other ecology projects, the situation is, sadly, exponential and the advice is to try to limit terminology size if possible.

The good news about all this is that it is a well explored and very current topic (it is presently high on the radar of the TDWG Architecture Group (TAG), and also of the GBIF Data Access and Data Integration (DADI) group, which are debating various standards to adopt for ontology. The bad news is that ontology tools are very sophisticated, but not very suitable for other than those who make a living crafting ontologies. The challenge to the GISIN technical group will be to make a representation that is nimble enough to get somethng going in a short timeframe, yet limber enough to slip out and replace if TDWG decides on standards for ontologies. I'm pretty sure this is possible if we find that some large fraction of the target databases can get along with ontologies that are, oh, say, significantly smaller than the databases themselves, and significantly fewer in number than the number of databases!

In summary, my opinion is that initially, people should not be debating the concepts ("Invasive", "alien", ...) yet , but rather whether the primitive properties and values Kevin provides are powerful enough to express most of the concepts represented by the databases whose data would form part of GISIN.

An interesting document for that discussion might also be the Massachusetts Criteria for Evaluating Non-Native Plant Species for Invasiveness pointed out by JenniferForman?, who will shortly post some other things that comprise what amount to ontologies.

-- BobMorris - 15 Mar 2006

Edit | Attach | Printable | Backlinks: Web, All Webs | History: r3 < r2 < r1 | More topic actions
 
Back to TDWG Homepage TDWG Wiki > InvasiveSpecies
This site is powered by the TWiki collaboration platform

Valid XHTML 1.0 Transitional
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback