- Bulk access to occurrence records, including access to sensitive records not visible to the public but required for species distribution modelling purposes.
- New data quality control methods, and error reporting
- Ingestion of feedback and additional locality information
Bulk access
We have developed services that allow downloading of occurrence records:
Both of these services give bulk access. The downloadfromDB service gives a wider range of metadata associated with the record, whereas downloadFromIndex gives a subset of the data that is typically of use for researchers and anyone modelling with species (include scientific name, latitude, longitude).
The latter was developed to support Edgar's need for faster bulk downloads. To support the Edgar project's need to maintain a separate local cache of the data, we have also developed a service so that deleted records can be tracked:
When records are ingested by the Atlas a UUID is issued against the record, and the Atlas keeps track of the properties within that record that make it unique. This is typically a ID of some sort, but it may be a combination of lat/long, date and species name for example.
Lists of species can be retrieved from services listed here:
The service http://bie.ala.org.au/search.json is currently in use by Edgar to retrieve a list of species which then drives the data harvesting. Edgar is also making use of services to retrieve LSIDs, the identifier the ALA is using for a taxon.
Data quality
As part of the AP30 project, the ALA has developed some data quality methods, and exposed the outputs to aid the modelling work in Edgar.
Heres an example record in the ALA:
http://biocache.ala.org.au/occurrence/b07bbac2-22d7-4c8a-8d61-4be1ab9e0d09
The details for this record are here (in JSON format):
http://biocache.ala.org.au/ws/outlier/record/b07bbac2-22d7-4c8a-8d61-4be1ab9e0d09
http://biocache.ala.org.au/ws/outlierInfo/urn:lsid:biodiversity.org.au:afd.taxon:0c139726-2add-4abe-a714-df67b1d4b814.json
The tests use 5 environmental layers that have been chosen for their suitability for testing with this algorithm. Theres are write up of this work available here.
Is there an official definition of 'occurence data'? - not readily seeing it in the wikipedia :)
ReplyDeleteHuh, 'reverse jackknife' that's interesting :) thanks for sharing :)
ReplyDeleteSpatial outliers are really interesting, thanks for the link to the notes. Would be nice to see a shorter summary post explaining the subjectivity of outliers and how the dev and scientists are making this subjective algorithm work.
ReplyDeleteCheers David.
ReplyDelete"Occurrence" is a widely used term in biodiversity informatics that is covers specimen collections and observations of species. A record will typically consist of (at least) 4 components:
1) What - a species or subspecies or higher level taxon if identification to species level isnt possibility
2) Where - typically geographical coordinates, but could be a locality description
3) When - a single date or date range
4) Whom - who recorded the observation or collected the specimen.
I agree on the subjectivity of the outliers. Essentially the approach here is to allow researchers scientists to filter records marked by the ALA as outliers if they want, but not to prevent access to any of the data. Hence researchers can use our outlier annotations on the records if they think it meets there needs and/or they agree with the techniques used.