TDWG Phylogenetics Standards Workshop
The workshop took place on Oct 19, 2008, at the 2008 TDWG Annual Meeting in Fremantle (Perth),
Table of Contents:
Agenda
Workshop objectives, agenda, and initial participant list were published in advance and disseminated to all TDWG conference attendees 1 week prior to the event.
Workshop Notes
Attendees: 18 in total
Introductions
- Stan Blum: TDWG process
- Foster sharing of standards, data, and biodiversity informatics developments
- Process is fairly lightweight
- Put up (social) interfaces for other people to discover you and find out how to contribute
- Interest group is a group of people that have a shared set of problems that they would like to address.
- Key piece for a group to "declare itself" is a charter: what is it that the group aims to do.
- Then create Task Groups, each of which will also have a charter that says what the Task Group will do.
- Standards are essentially documents, specifications.
- Infrastructure in the form of mailing lists and wikis are provided.
- Ratification is initiated by the convener of the Task Group
- Submits the specification document to the Executive Committee
- Review manager is assigned and arranges review of the specification
- Public review follows after review recommendations are taken care of
- There is no voting process.
- Q: TDWG is reported to be very slow.
- Key to keep pushing on the committees, communication is key
- Process being in place helps in comparison to previously annual pace
- Q: how is participating in TDWG better than just putting stuff up in SourceForge??
- There's a lot of experience among TDWG that is related to many of data and types of data and problems we would be dealing with.
- Can take advantage of the participants/experts in other groups.
- Interface and communication between biologists and information technologists, which is what TDWG is well set up for
- Q: what is a good (or a bad) charter?
- Charters and process have only been instituted since 2 years. The experience and lessons learned from this are limited at this point.
- Charters ideally are updated at least once annually.
Session I
- David Kidd:
- proper representation of geographical areas
- position of observed and inferred nodes, branch paths
- vicariance-related metadata
- branches can be segmented, using different methods, representing for example time, or shortest distance (which in projections isn't necessarily a straight line)
- paleocontinental reconstruction methods and/or parameters and simulation metadata needed
- data from stratigraphy can have (dating) errors associated with it
- Pyramid of standards: space (geographic, GML), place (ecological, EML), time (stratigraphic), form (taxonomic): shouldn't phylogenetic standard be in the middle of all of this
- Rick Ree:
- where does X (extant, or ancestral taxon) live?
- where has it been found or collected?
- expert opinion (monographs, floras)
- describing the geographic range
- quantitative: lat/long, grid cell values
- qualitative: geopolitical units
- predictive: ecological niche, model?
- inspiration from use cases: historical biogeography, ancestral range estimation
- take advantage of analogy from standards and ontologies developed for characters?
- geographic range as an emergent trait of a taxon
- take standards that have already been developed (OGS geospatial standards) and look at them through the lens of phylogenetics)
- TDWG work can consist of recommending certain ways to applies an external standard
- One of the first questions could be to determine whether we can exchange the biogeographical data and reconstructions (e.g. from DIVA and LAGRANGE) that we already have
- Greg Riccardi
- 230,000 images at present, several TBs of space
- Capture and track the data that support phylogenetic inference based on characters
- Morphbank objects (images, collections, etc) have external links: specimen, sequence
- Morphbank is being used by external tools, for example MX, as the underlying image store
- Character state annotation: "sort a bale of plants" metaphor for images
- Character definitions in trees data files are typically much to short and limited to search databases such as Morphbank
- Linking images to anatomy ontology terms: ideally have an outline of the part linked, not just the whole image or a pointer within the image
- Defining characters and states by using ontology terms: capturing, and linking from states
- would also enable the ability to infer the relatedness of characters
- Q: Can we have ontologies that are based in phylogeny. It is impossible otherwise to simply combine different morphological ontologies.
- Q: what is the role of ontologies to informatics standards? -> metadata (property) meaning standardization
- There is no good way currently to link various annotations in various media about a digital or collection object
Session II
- Rutger Vos: NeXML format
- Chris Zmasek: PhyloXML, phylogenomics
- using phylogenies for functional inference
- Q: library support? -> Forester, BioPerl
- Hilmar Lapp: PhyloDB, BioSQL, PhyloWS
- Nico Cellinese: Phylogenetic nomenclature
- Define names not based on organismal traits but based on phylogenetic relationships
Breakout Groups:
The breakout groups were determined from a 45min group discussion and whiteboarding of suggestions, following by self-assignments to groups.
- Phyloinformatic Web services (Bill P., Rutger, Cindy)
- data services. which data or metadata are needed from providers
- data demands that ask for portals (such as GBIF, EOL)
- crosstalk & provenance between providers
- scope recommendations, workflows, use cases
- Metadata standardization & Ontology (David, Chris, Peter, Aaron, Bill, James)
- metadata uses, properties, semantics
- reuse possibilities for other TDWG standards
- expressing the domain model independent of technology
- Deposition to repositories
- incentives and standards to increase deposition rates
- reporting requirements to enable repurposing
Report-out from the groups
Group 1) Bill P.
- divided tasks between "tree decoration" and "tree delivery"
- tree decoration:
- coevolution (food web, pollinator, host-parasite): for a given set of hosts, give me the tree of parasites
- computational: calculate divergence times or ages
- use the tree as input, get it out decorated with certain metadata
- tree delivery
- types are parameter query and computational query
- results in ID list
- given an ID, the desired view (what elements are to be returned) and format, return the object(s)
- ability to say, give me all trees that contain human, but only those that are about apes
- ability to give a scope of the desired trees
- ability to dump data, be alerted to updates (e.g., RSS feed)
- different levels of hierarchy of objects: analyses, matrices, trees
Group 2+3) David
- attributes about what makes up a phylogeny, provenance, where data came from and what type, papers possibly associated, parameters used, when and where was it done
- branch length, support values, need multiple of these
- tree to tree relationships
- breaking branches into segments
- sets of nodes, within and between trees, and sets of trees, relationship between nodes (e.g., homologs, host-parasite)
- attributes of nodes (type of node, species or gene, taxon concept, area, more than object per node)
- gene trees are a big application of trees
- talk to geospatial group to learn about their objects and standards
- talk to technical architecture group to learn more about ontologies
- started with an exercise that needs to be followed up with
- there seems to be a taskgroup developing ontologies for this area
- formulating the elements and use cases would create momentum
Interest Group charter
- Keep it simple
- encouragement to list core members
- implementation is the proof, further those efforts
- intersection, don't over-think, what are the core elements
--
HilmarLapp - 31 Oct 2008