- Bulk access to occurrence records, including access to sensitive records not visible to the public but required for species distribution modelling purposes.
- New data quality control methods, and error reporting
- Ingestion of feedback and additional locality information
Bulk access
We have developed services that allow downloading of occurrence records:
Both of these services give bulk access. The downloadfromDB service gives a wider range of metadata associated with the record, whereas downloadFromIndex gives a subset of the data that is typically of use for researchers and anyone modelling with species (include scientific name, latitude, longitude).
The latter was developed to support Edgar's need for faster bulk downloads. To support the Edgar project's need to maintain a separate local cache of the data, we have also developed a service so that deleted records can be tracked:
When records are ingested by the Atlas a UUID is issued against the record, and the Atlas keeps track of the properties within that record that make it unique. This is typically a ID of some sort, but it may be a combination of lat/long, date and species name for example.
Lists of species can be retrieved from services listed here:
The service http://bie.ala.org.au/search.json is currently in use by Edgar to retrieve a list of species which then drives the data harvesting. Edgar is also making use of services to retrieve LSIDs, the identifier the ALA is using for a taxon.
Data quality
As part of the AP30 project, the ALA has developed some data quality methods, and exposed the outputs to aid the modelling work in Edgar.
Heres an example record in the ALA:
http://biocache.ala.org.au/occurrence/b07bbac2-22d7-4c8a-8d61-4be1ab9e0d09
The details for this record are here (in JSON format):
http://biocache.ala.org.au/ws/outlier/record/b07bbac2-22d7-4c8a-8d61-4be1ab9e0d09
http://biocache.ala.org.au/ws/outlierInfo/urn:lsid:biodiversity.org.au:afd.taxon:0c139726-2add-4abe-a714-df67b1d4b814.json
The tests use 5 environmental layers that have been chosen for their suitability for testing with this algorithm. Theres are write up of this work available here.