Anna Boin

Exposing Uncertainty: Communicating Spatial Data Quality via the Internet

AnnaBoin
University
University of Melbourne
Supervisor (Academic)
Dr Gary Hunter, University of Melbourne et al
Supervisor (Industry)
Duncan Brooks & Susan Brown, Dept of Environment and Primary Industries
Projects
mysite
Employment
Business Analyst at Geomatic Technologies, a 43pl member
Thesis Abstract

After almost 30 years of theorizing about spatial data quality, there has been very little real-world empirical research conducted into how consumers actually determine whether or not data is suitable for them. Yet spatial databases are now accessible to members of the general public who have little formal training in the related quality issues. The theorizing has led to a data quality component in metadata standards and various studies have investigated ways to visualize these quality metrics. There is little evidence, however, indicating that the visualizations successfully communicate uncertainty. As a result, the creation and maintenance of metadata statements require the time and resources of data providers when they may not even benefit consumers of the data.

Given the shortage of practically-derived variables for experimenting with consumers’ perspectives, this research has employed qualitative, exploratory methods that are consistent with user-centred design (UCD). Furthermore, fitness for use was embraced as a subjective phenomenon because it is judged ‘as seen by the user’. The research design therefore consisted of an understanding stage followed by a verification stage.

The understanding stage investigated spatial data consumer goals, actions, perceptions, and terminology using (1) feedback emails, and (2) semi-structured  interviews from consumers with varying backgrounds and contrasting uses for data. This established that the consumers had two major goals: to determine the data content; then use the data. Perceptions of quality were thus a by-product of these overt goals and occurred as a result of using the data or having contact with people that had used the data. Other aspects that would effect whether a dataset was ‘suitable’ or ‘good enough’ were: the perceived authoritativeness of the data provider; and the window to the dataset, namely, whether the information interaction was such that the dataset was described in
understandable language, able to be found, and accessible in a timely manner. In this way previous use of the data affected other future perceptions of fitness for use because consumers used established reputations as one way to determine fitness for use.

The verification stage validated the findings so far both practically and theoretically. The practical component consisted of the creation of a prototype that aimed to bring aspects, which helped consumers determine quality outside the Internet environment, into the Internet environment as part of obtaining a dataset. The creation process included consultation with a data provider to ensure the information was relatively easy to generate. Then the prototype was reviewed by several prospective consumers from various backgrounds. Overall the aspects of the prototype that aimed to manage the consumers’ expectations, as part of describing the data content, yielded the most positive results. In this way, quality was portrayed as part of a quick, three-sentence description of data. On the other hand, there was a strong lack of interest in the illustrated, singlescreen page describing various manifestations of error and accuracies even though this
link had prime position as part of the three-sentence description. In fact the consumers who had repeatedly used spatial data before, and discovered errors, explained that (apart from basic indicators of positional accuracy, namely resolution, scale, or contour interval) they had no interest in learning about error and accuracy from the data provider – they would rather work it out for themselves.

From a theoretical perspective, these attitudes were then matched with some explanatory theories from outside spatial data research. The first theory,  sensemaking, was an alternative to rational decision making theory that suggested that rather than making a decision (about suitability of data for instance) at one point in time, sense is made as part of ongoing experiences that are affected by perceived cues and by taking action. In short, fitness for use is discovered through use. The second major theory was about trust in e-commerce and asserted that trust of Internet information is separate to the information itself. Instead, trust is established through the identity (or reputation) of data provider and aspects of personal experience, including the presence of an online
community of consumers. 

Consequently, this research suggests that the way to more effectively communicate spatial data quality over the Internet is to concisely express data quality as part of the definition of the data content. If more nuances of quality need to be reported then either: let communities of users do so (and examples are  included where this was already occurring); or communicate quality implicitly as part of use, that is, make the data of the features themselves appear inherently inaccurate. Overall, any quality information needs to be concise and en route to data users’ goals because with so much information available today, there is a severe shortage of attention for it.