e-Science 2008 4th IEEE International Conference on e-Science

Exhibits, Demos & Posters

Schema-Independent and Schema-Friendly Scientific Metadata Management


  • Scott Jensen, Department of Computer Science, Indiana University
  • Beth Plale, Department of Computer Science, Indiana University


Computational science is creating a deluge of data, and the automated capture and cataloging of detailed descriptive metadata has been recognized as necessary to realize the fullest reuse of this data. A broad spectrum of scientific communities as varied as marine sciences, meteorology, astronomy, and social sciences have developed detailed domain-specific XML metadata schemas to capture metadata about data products. While the semantic content of the metadata cataloged by each domain differ substantially, a review of scientific schemas reveals that they share the trait of being composed of independent higher level concepts. It is this construction from independent concepts that differentiates scientific metadata from other data that is expressed in XML.

Current domain-independent metadata catalogs use a relational database to store domain-specific metadata as name/value pairs. However, with the complexity of scientific metadata schemas, both querying and reconstructing XML from name/value pairs in response to metadata searches is not efficient. Alternatives for storing schema-based XML yield tightly coupled solutions between the XML metadata schema and the database schema. Our work investigates the viability of maintaining a loose coupling between the metadata schema and the database schema, allowing for an easily adaptable framework. This is done by partitioning the metadata schema based on the independent concepts it contains. Using this partitioning, metadata can be captured using domain-specific metadata schemas and the domain-specific XML can be efficiently reconstructed in response to complex queries.

