- Apache Cassandra Database. The database will house the full record details and will store the results of duplicate & outlier detection. It will also provide the persistence for the record vettings provided by Edgar.
- Apache SOLR search indexes. These indexes will support the searching capabilities required by Edgar.
- A processing chain implemented in Scala. This will include the algorithms for detecting duplicate records and environmental outliers. This custom code will then update search indexes to allow Edgar to filter for non-duplicates, and non-outliers hence improving the quality of the model and reducing the number of records to be vetted. The code for this component is accessible in the google code repository http://code.google.com/p/ala-portal/
- Java Spring MVC web services. These web services will provide the interface for the Edgar project to download snapshots of data for modelling and vetting purposes. They will also provide a write interface for submission of the vettings of bird records by expert users. The code for this component is accessible in the google code repository http://code.google.com/p/ala-portal/. An additional important functional requirement is for Edgar to be able to query for record deletions. Services will be developed to allow Edgar to keep track of these deletions that occur periodically when the ALA harvests from data providers.
ANDS funded collaboration between the Atlas of Living Australia and James Cook University
Showing posts with label JSON. Show all posts
Showing posts with label JSON. Show all posts
Wednesday, August 22, 2012
KEY TECHNOLOGIES and FEATURES
The stack we are putting together to support AP30 requirements includes:
Labels:
andsApps,
andsArchitecture,
andsFeatures,
andsFunctions,
andsTechnology,
andsTools,
ap30,
Cassandra,
DIISRTE,
Edgar,
fundedByAustralianNationalDataService,
GoogleCode,
JCU,
JSON,
Scala,
SOLR
Tuesday, July 31, 2012
PROJECT OUTPUTS #3 - VETTING SERVICES
The ALA has a web service that
accepts a POST request with a JSON body that contains the information to
support a vetting.
The URL for the service is:
http://biocache.ala.org.au/ws/assertions/query/add
Example JSON for the POST body:
The URL for the service is:
http://biocache.ala.org.au/ws/assertions/query/add
Example JSON for the POST body:
Validating the supplied information
When the JSON body is invalid a HTTP Bad Request (400) will be returned.
When an invalid or no apiKey is provided a HTTP Forbidden (403) will be returned.
Otherwise the supplied information will be validated in 2 ways; first to ensure that the species name exists in the current system and finally ensuring that the area is a valid WKT format. If either of these checks fail a HTTP Bad Request (400) is returned as the status with a message indicating the issue.
When an invalid or no apiKey is provided a HTTP Forbidden (403) will be returned.
Otherwise the supplied information will be validated in 2 ways; first to ensure that the species name exists in the current system and finally ensuring that the area is a valid WKT format. If either of these checks fail a HTTP Bad Request (400) is returned as the status with a message indicating the issue.
Insert/Update
When inserting a new validation a first
load date is populated. This date is
never updated. The purpose of this date
is to provide a context for “Historic” vettings. In the future the ALA may provide additional
QAs around records that appear in a “historic” region after the vetting first
loaded date.
Each vetting that is posted to the web
service will be stored in the database in raw JSON. Other fields populated include a last
modified date and query. The query will
be constructed using the species name and WKT for the area. This query will be used to identify the
records that are considered part of the vetting.
Deleting
When a “delete” is issued against an existing vetting it is marked as deleted in the database. It is not physically deleted until the action is filtered through to the ALA data. A "deleted" assertion can NOT be resurrected.Applying Vettings to ALA Data
It is not yet known the exact process that
will be used to apply the vettings to the ALA data. It will be a batch process run nightly that
updates a bunch of records based on the queries that were generated for each
vetting.
New/updated vettings that have not been
pushed through to the ALA data will be applied to all records that satisfy the
query. Old vettings will be applied to
records that have been inserted/modified since the previous batch process.
Subscribe to:
Posts (Atom)