[Top] [Prev] [Next]

3.2 The Scientific Data Set Data Model

The scientific data set, or SDS, is a group of data structures used to store and describe multidimensional arrays of scientific data. Refer to Figure 3a for a graphical overview of the SD data set. Note that in this chapter the terms SDS, SD data set, and data set are used interchangeably; the terms SDS array and array are also used interchangeably.

A scientific data set consists of required and optional components, which will be discussed in the following subsections.

FIGURE 3a - The Contents of a Scientific Data Set

3.2.1 Required SDS Components

Every SDS must contain the following components: an SDS array, a name, a data type, and the dimensions of the SDS, which are actually the dimensions of the SDS array.

SDS Array

An SDS array is a multidimensional data structure that serves as the core structure of an SDS. This is the primary data component of the SDS model and can be compressed (refer to Section 3.5.2 on page 32 for a description of SDS compression) and/or stored in external files (refer the Section 3.5.3.3 on page 36 for a description of external SDS storage). Users of netCDF should note that SDS arrays are conceptually equivalent to variables in the netCDF data model1.

An SDS has an index and a reference number associated with it. The index is a non-negative integer that describes the relative position of the data set in the file. A valid index ranges from 0 to the total number of data sets in the file minus 1. The reference number is a unique positive integer assigned to the data set by the SD interface when the data set is created. Various SD interface routines can be used to obtain an SDS index or reference number depending on the available information about the SDS. The index can also be determined if the sequence in which the data sets are created in the file is known.

In the SD interface, an SDS identifier uniquely identifies a data set within the file. The identifier is created by the SD interface access routines when a new SDS is created or an existing one is selected. The identifier is then used by other SD interface routines to access the SDS until the access to this SDS is terminated. For an existing data set, the index of the data set can be used to obtain the identifier. Refer to Section 3.4.1 on page 25 for a description of the SD interface routine that creates SDSs and assigns identifiers to them.

SDS Name

The name of an SDS can be provided by the calling program, or is set to "DataSet" by the HDF library at the creation of the SDS. The name consists of case-sensitive alphanumeric characters, is assigned only when the data set is created, and cannot be changed. SDS names do not have to be unique within a file, but their uniqueness makes it easy to semantically distinguish among data sets in the file.

Data Type

The data contained in an SDS array has a data type associated with it. The standard data types supported by the SD interface include 32- and 64-bit floating-point numbers, 8-, 16- and 32-bit signed integers, 8-, 16- and 32-bit unsigned integers, and 8-bit characters. The SD interface also allows the creation of SD data sets consisting of data elements of non-standard lengths (1 to 32 bits). See Section 3.7.6 on page 42 for more information.

Dimensions

SDS dimensions specify the shape and size of an SDS array. The number of dimensions of an array is referred to as the rank of the array. Each dimension has an index and an identifier assigned to it. A dimension also has a size and may have a name associated with it.

A dimension identifier is a positive number uniquely assigned to the dimension by the library. This dimension identifier can be retrieved via an SD interface routine. Refer to Section 3.8.1 on page 43 for a description of how to obtain dimension identifiers.

A dimension index is a non-negative number that describes the ordinal location of a dimension among others in a data set. In other words, when an SDS dimension is created, an index number is associated with it and is one greater than the index associated with the last created dimension that belongs to the same data set. The dimension index is convenient in a sequential search or when the position of the dimension among other dimensions in the SDS is known.

Names can optionally be assigned to dimensions, however, dimension names are not treated in the same way as SDS array names. For example, if a name assigned to a dimension was previously assigned to another dimension the SD interface treats both dimensions as the same data component and any changes made to one will be reflected in the other.

The size of a dimension is a positive integer. Also, one dimension of an SDS array can be assigned the predefined size SD_UNLIMITED (or 0). This dimension is referred to as an unlimited dimension, which, as the name suggests, can grow to any length. Refer to Section 3.5.1.3 on page 31 for more information on unlimited dimensions.

3.2.2 Optional SDS Components

There are three types of optional SDS components: user-defined attributes, predefined attributes, and dimension scales. These optional components are only created when specifically requested by the calling program.

Attributes describe the nature and/or the intended usage of the file, data set, or dimension they are attached to. Attributes have a name and value which contains one or more data entries of the same data type. Thus, in addition to name and value, the data type and number of values are specified when the attribute is created.

User-Defined Attributes

User-defined attributes are defined by the calling program and contain auxiliary information about a file, SDS array, or dimension. They are more fully described in Section 3.9 on page 49.

Predefined Attributes

Predefined attributes have reserved names and, in some cases, predefined data types and/or number of data entries. Predefined attributes are useful because they establish conventions that applications can depend on. They are further described in Section 3.10 on page 53.

Dimension Scales

A dimension scale is a sequence of numbers placed along a dimension to demarcate intervals along it. Dimension scales are described in Section 3.8.4 on page 46.

3.2.3 Annotations and the SD Data Model

In the past, annotations were supported in the SD interface to allow the HDF user to attach descriptive information (called metadata) to a data set. With the expansion of the SD interface to include user-defined attributes, the use of annotations to describe metadata should be eliminated. Metadata once stored as an annotation is now more conveniently stored as an attribute. However, to ensure backward compatibility with scientific data sets and applications relying on annotations, the AN annotation interface, described in Chapter 10, Annotations (AN API) can be used to annotate SDSs.

There is no cross-compatibility between attributes and annotations; creating one does not automatically create the other.



[Top] [Prev] [Next]

1 netCDF-3 User's Guide for C (June 5, 1997), Section 7, http://www.unidata.ucar.edu/packages/netcdf/guidec/.

hdfhelp@ncsa.uiuc.edu
HDF User's Guide - 07/21/98, NCSA HDF Development Group.