Elizabeth-Kate Gulland

Spatial Semantic Search with Online Agents and Weighted Ontologies

WA Curtin E K Gulland
Curtin University
Supervisor (Academic)
Em Prof Geoff West & Dr Simon Moncrieff, Curtin University
Data analyst at RAC
Thesis Abstract

Information retrieval (IR) is a process of matching an information need to the contents of one or more data resources. This basic concept holds true for all types of search, including Web and spatial searches. Difficulties arise at two points: determining the true information need from a user request, and matching this need to data content.

These issues arise partly because both user and data source have contexts that affect how they interpret information. The goal of this thesis is to improve relevance of online searches, particularly for retrieval of online spatial datasets, by automatically extracting and comparing user and data contexts.

Two aspects of context are of particular interest within this research: terminology and location. Their impact is particularly noticeable in text queries, such as made familiar by search engines including Google, Bing and Yahoo. Simple text-entry interfaces are easier to learn and use than more complex forms that allow users to explicitly specify context about a query.

The flexibility of natural human languages makes textual mismatches between intent and results a common problem. For example, consider the paired terms `vaccinate'/`inoculate', which refer to the same concept, and `Uluru'/`Ayers Rock', which refer to the same place. A key term search for "inoculated at Ayers Rock" would not find data described as "vaccinated at Uluru".

Existing methods applied to the search synonym problem include natural language processing (NLP) techniques and Semantic Web ontologies. NLP can be used to compare text document contents without relying upon exact term matches, and ontologies
record meaning as networks of linked concepts, such as terms.

This research aims to semantically index datasets during a search, by using local textual and spatial context in addition to crowdsourced and other public natural language resources to estimate dataset relevance to an information request. For this purpose, methods were explored for the automatic production of contextualised weighted ontologies of terms and spatial proximity measures. A framework for local software search agents with public search interfaces was also explored.

This research contributes to contextual search in four ways, by 1) developing automated methods to produce a weighted ontology of terms from local content and crowd-sourced text resources, 2) adapting queries to rank text matches by using a weighted ontology of terms, 3) developing automated spatial proximity measures to compare user and data
locations, and 4) encapsulating data content and context within online search agents so that they can respond with ranked results to queries from sources with no prior knowledge of the data.

The overall goal is to automatically produce a confidence level in the relevance of a data source and its records to a user query, so that multiple results can be ranked and compared in response to a request for information.