|Ed Begley is a computer scientist in the Building and Fire Research Laboratory at NIST. His research has included extensive work in materials informatics and from 1999 to 2002 he was the project leader of the international effort to develop MatML, an extensible markup language for the management and exchange of materials property data. For more information please click here.
Cynthia Howard-Reed is an environmental engineer in the Indoor Air Quality and Ventilation Group at NIST. Before coming to NIST, she completed a post-doc with the National Exposure Research Laboratory of the U.S. Environmental Protection Agency. Her research interests include experimental and computational studies of contaminant transport in residential buildings.
The Application of MatML to Contaminant Emissions Data
Standard data formats are generally employed because they save time and money by promoting interoperability that is, by facilitating data exchange and preserving information for re-use and are available for a wide variety of business, scientific, and technological domains. MatML,1 for example, is an extensible markup language (XML)2 used for the management and exchange of materials property data.
During the October 2004 ASTM Committee D22 Conference on Indoor Emissions Testing, MatML was introduced as a format suitable for contaminant source emission rate data. This article describes how these data are composed into MatML documents and provides guidance to anyone who might want to use MatML to manage and exchange other forms of materials property data that are at present stored in legacy relational databases. Data on volatile organic compound (VOC) source emission rates are used to illustrate the method.
Most indoor air quality, or IAQ, models that estimate building contaminant concentrations require the user to provide data related to contaminant source strengths and other contaminant transport mechanisms. Many of the required model inputs are available in the literature; however, these data have generally not been compiled in a readily accessible source, thereby requiring users to furnish their own model input parameters. To facilitate the IAQ modeling process and allow for assessment of data quality and completeness, the National Institute of Standards and Technology (NIST) is exploring two ways to store and manage published model input data. These approaches may be viewed as complementary and reflective of the general need to link data aggregations with similar content but different storage formats. The first method uses a relational database management system, known as an RDBMS, to compile model input data into searchable databases. The second effort employs MatML to demonstrate how 1) the RDBMS data may be alternatively stored in a repository of searchable MatML documents; 2) MatML might be used as an exchange mechanism between disparate storage formats, especially in the context of providing a convenient way to access data via the Internet for any IAQ model; and 3) value may be added to the data by using an extensible stylesheet language transformation, or XSLT,3 to provide online links to the U.S. Census Bureau’s North American Industry Classification System (NAICS) and the NIST Chemistry WebBook via the NAICS and Chemical Abstract Services Number (CASN) specifications contained in the MatML documents.
A well-designed database stores a collection of information in a readily accessible format and also allows for the assessment of data quality, trends in observations, and data gaps. While there is an abundance of VOC source emission rate data available in the literature and other sources, no comprehensive database exists. There are, however, several abridged databases, two of which were used by NIST to provide the foundation for a good database design.
The first database, assembled by the National Research Council of Canada (NRC),4 includes VOC emissions data from tests conducted in their Indoor Environment Program laboratory chambers. This database represents a collection of data from a single testing facility and contains product manufacturer information, emissions testing conditions, chemical information, emission factors, emission profiles, and comments. Its design was used as a basis to build a database of emission rates from the published literature.
The second database, a collection of peer-reviewed source emission rates, was compiled by the Indoor Environment Management Branch of the U.S. Environmental Protection Agency (EPA).5 The data were stored in a spreadsheet containing information regarding emission source classification, emission testing conditions, chemical information and analytical methods, emission factors, emission modeling parameters if available, and comments.
Drawing from the existing NRC and EPA database designs, as well as the recommended data reporting requirements from several testing guides,6, 7, 8, 9 a VOC source-emission rate database was created. The new design includes nine tables: emission rate category (Category), type of material within category (Types), literature reference (Reference), material properties (Material), contaminant properties (Property), environmental test conditions (Testcond), material test conditions (ETest), source model equation (Equation), and contaminant emission rate factors (Contaminant). The tables are linked using a “one-to-many” relationship (Figure 1). While populating this database with available records, a further consideration was highlighted, the need for a consistent data reporting format. A constrained format serves to ensure the availability of all parameters necessary for the purposes of IAQ modeling and other analyses.
To maximize availability, emission rate data should be accessible on the Web and in a format that can be understood by any indoor air quality modeling program. Until recently, most data have been distributed on the Web using hypertext markup language (HTML), which only specifies how data are to be formatted for display and conveys no description of the data themselves. XML was developed in response to that situation so that communities could define their own domain-specific markup languages for data management and exchange and thereby permit efficient parsing and interpretation of those data via software. The XML developed to manage and exchange materials information is called MatML.
Initiated in October 1999 by NIST, the development of MatML was an international collaboration of industry, academia, government laboratories, and standards organizations. The effort leveraged a number of pertinent materials data resources including several ASTM guides (Table 1). Following its publication,10 MatML was transferred to a commercial development group, which has been conducting trial applications and pursuing certification through the Organization for the Advancement of Structured Information Standards (OASIS).11
Since MatML was designed to address any materials property data, it should be applicable to IAQ data as a means of managing and exchanging contaminant source emission rates. Because of its standard format, this representation of the data could be utilized by any IAQ computer model and is ideally suited for distribution on the Web.
Mapping the RDBMS Format into MatML
The mapping is illustrated using an artifact, a mapping table, and a listing of a MatML document.
The artifact (Table 2), derived from an inner join of the database tables using their ID fields, is an aid for understanding the mapping (Table 3) and the corresponding MatML document (Listing 1). Each record of the join contained the data associated with a single contaminant. An alternative join associating all contaminants for a single material could have been created but the single contaminant approach was chosen to illustrate the mapping in the clearest possible manner.
Using MatML requires an understanding of the language’s vocabulary (tags, formally called elements) and grammar (order of tags). Since thorough discussion of MatML is beyond the scope of this article, interested readers will find helpful the documentation contained in the MatML schema.12 The following high-level overview is provided as an aid for interpreting the mapping.
• The root element for a MatML document is named MatML_ Doc, which contains one or more Material elements and a Metadata element.
• The Material element contains a description of the material and its properties and is compartmentalized as follows:
* BulkDetails describes the bulk material;
* ComponentDetails describes each component of the bulk material, which, in this mapping, is a volatile organic compound;
* Graphs encodes two dimensional graphics, which is not needed in this mapping;
* Glossary contains definitions of terms found in the document, which also is not needed in this mapping.
The Metadata element contains information pertinent to a material encoded within the MatML document and is compartmentalized as follows:
• AuthorityDetails describes an authority;
• DataSourceDetails describes a data source;
• MeasurementTechniqueDetails describes a measurement technique;
• ParameterDetails describes a parameter;
• PropertyDetails describes a property;
• SourceDetails describes the source of a component;
• SpecimenDetails describes a sample specimen;
• TestConditionsDetails describes the test conditions.
The mapping relates the table fields to the MatML tags. ID fields, fields containing no data, and other fields only useful to the database designers have not been mapped. The shorthand used for the MatML tags is read hierarchically. “Material/ BulkDetails/Class,” for example, indicates the Class element within the BulkDetails element within the Material element. All tags are contained within the MatML_Doc element, which, for ease of reading, is not redundantly listed in the mapping.
The listing presents the MatML document resulting from the mapping using the data from the artifact. The document was generated by a program written to produce a repository of MatML documents from the inner join of the database tables using the mapping’s rules.
As with all extensible markup languages, MatML separates information content from its display format. This feature essentially renders the data “future-proof,” since documents are written using simple ASCII text rather than a proprietary encoding technique. Conversion to whatever data storage formats arise in the future, therefore, becomes a straightforward exercise. Moreover, MatML documents may be processed according to need, such as extracting specific data and formatting them for import into IAQ computer modeling software, formatting the data for publication in e-journals, or formatting the data for Web browser display.
As an illustration (Figure 2), the data contained in the listing were rendered for browser display using an XSLT with processing rules for also adding value to the original data; the NAICS and CASN specifications were identified and appropriate hyperlinks were created within the displayed document to the U.S. Census Bureau’s NAICS (Figure 3) and NIST Chemistry WebBook (Figure 4), respectively.
The ultimate goal of NIST’s efforts to store VOC emission rate data is to help promote the standardization needed for consistency and reliability in reporting, accessing, and manipulating these data. MatML provides a suitable format for data management and exchange, especially on the Web, and associated XML technologies offer powerful tools for streamlining data access by any IAQ model. In the near future, user-oriented applications such as an editor will be developed that will simplify the preparation of MatML documents and lead to the compilation of searchable document repositories. These repositories, in turn, may provide broad and easy accessibility to pertinent materials information for communities with a specific interest, including indoor air quality and material selection based on a range of environmental impacts. //
4 Zhang, J.S.; Shaw, C.Y.; Sander,
D.; Zhu, J.P.; Huang, Y. MEDB-IAQ: A Material Emission Database and Single-Zone IAQ Simulation
Program A Tool for Building
Designers, Engineers and Managers. National Research Council Canada. 1999.
5 U.S. EPA. Sources of Indoor Air Emissions. U.S. Environmental Protection Agency. 1999.
6 ASTM. Standard Practice for
Full-Scale Chamber Determination of Volatile Organic Emissions from Indoor Materials/Products.
D 6670-01. American Society for Testing and Materials. 2001.
7 ASTM. Standard Guide for
Small-Scale Environmental Chamber Determinations of Organic Emissions from Indoor Materials/ Products. D 5116-97. American Society for Testing and Materials. 1997.
8 European Communities.
European Concerted Action
Indoor Air Quality & Its Impact on Man (EUR 13593), Guideline for the Characterization of Volatile
Organic Compounds Emitted from Indoor Materials and Products Using Small Test Chambers. Report No. 8. COST Project 613. Luxembourg: Office for Publications of the European Communities, 1991.
9 Matthews, T.G. Atmospheric
Environment. 1987, 21, 321 329.
10 Begley, E.F. (2003). “MatML
Version 3.0 Schema,” NISTIR 6939, National Institute of Standards and Technology, Gaithersburg, MD.