Zaffar Sadiq Mohamed-Ghouse

Modelling Spatial Variation of Data Quality in Databases

ZaffirSadiq sq
University
University of Melbourne
Supervisor (Academic)
Dr Matt Duckham, University of Melbourne
Supervisor (Industry)
Geoff Lawford, Geoscience Australia
Projects
mysite
Employment
Business Development, Research and International Relations, CRC for Spatial Information
Thesis Abstract

The spatial data community relies on the quality of its data. This research investigates new ways of storing and retrieving spatial data quality information in databases. Given the importance of features and sub-feature variation, three different data quality models of spatial variation in quality have been identified and defined: per-feature, feature-independent and feature-hybrid. Quality information is stored against each feature in the per-feature model. In the feature-independent model, quality information is independent of the feature. The feature-hybrid is derived from a combination of the other two models. In general, each model of spatial variation is different in its representational and querying capabilities. However, no model is entirely superior in storing and retrieving spatially varying quality. Hence, an integrated data model called as RDBMS for Spatial Variation in Quality (RSVQ) was developed by integrating per-feature, feature-independent and feature-hybrid data quality models. The RSVQ data model provides flexible representation of SDQ, which can be stored alongside individual features or parts of features in the database, or as an independent spatial data layer. The thesis reports on how Oracle 10g spatial RDBMS was used to implement this model. An investigation into the different querying mechanisms resulted in the development of a new WITHQUALITY keyword as an extension to SQL. The WITHQUALITY keyword has been designed in such a way that it can perform automatic query optimization, which leads to faster retrieval of quality when compared to existing query mechanism. A user interface was built using Oracle Forms 10g which enables the user to perform single and multiple queries in addition to conversion between models (example, per-feature to feature-independent). The evaluation, which includes an industry case study, shows how these techniques can improve the spatial data community’s ability to represent and record data quality information.