"Expanding the Role of Metadata in the Integration and Analysis of Multi-dimensional Data"

Gilberto Z. Pastorello, Post-Doctoral Fellow, University of Alberta, Canada

December 10th (Monday), 11:00am
Harold Frank Hall (HFH) 1132

Data discovery and integration relies in adequate metadata availability and completeness. However, creating and maintaining metadata is time consuming and often poorly addressed or skipped altogether. This is particularly true for research fields in which metadata standards do not yet exist or are under development, or within smaller research groups without enough resources. Plant eco-physiology monitoring using in-situ and remote optical sensing is an example of such a domain. Data in this area are inherently multi-dimensional, with spatial, temporal and spectral dimensions usually being well characterized.

Other equally important aspects, however, might be inadequately translated into metadata. Examples include equipment specifications and calibrations; field/lab notes and field/lab protocols (e.g., sampling regimen, spectral calibration, atmospheric correction, sensor view angle, illumination angle); data processing choices (e.g., methods for gap filling, filtering and aggregation of data); quality assurance and checking details; and documentation of sources, ownership and licensing of data. Each of these aspects are important as metadata for search and discovery, but they can also be used in a more active fashion.

If each of these aspects is also understood as an extra dimension, it is possible to take advantage of them to simplify the data acquisition, integration, analysis, visualization and exchange cycle. Simple examples include zeroing-in data sets of interest early in the integration process (e.g., only data collected according to a specific field sampling protocol) or applying appropriate data processing operations to different parts of a data set (e.g., using different calibrations for data collected under different sky conditions). More interesting scenarios involve navigation and visualization of data sets based on these “extra” dimensions, as well as partitioning of data sets for highlighting relevant parts of the data to be made available for exchange.

We present a flexible metadata representation model that takes advantage of multi-dimensional data structures to translate metadata types into data dimensions, effectively reshaping data sets according to available metadata. With that, metadata is tightly integrated into the acquisition-to-exchange cycle, allowing for more focused exploration of data sets while also increasing the value of, and incentives to, keeping good metadata.

About Gilberto Z. Pastorello:

photo of g. pastorello Gilberto Z. Pastorello, received his PhD (2008), MSc (2005), and BSc (2003), all in Computer Science, from the University of Campinas, Brazil. He has held two Post Doctoral Fellow positions (2009-2011 and 2011-2012) at the University of Alberta, Canada, focusing in applied Computer Science for scientific data management in the Environmental Monitoring domain, and helping build data/metadata management portals and modelling/processing tools. His main research interests are in multi-dimensional and multi-scale data management, scientific data life-cycle management, use of metadata and data annotations to aid analysis and visualization of data, and Web-based systems for data analytics.

Hosted by: B. S. Manjunath