Exporting IDFS data


written by Carrie A. Gonzalez
cgonzalez@swri.org

Last updated: 05/29/2002

Table Of Contents

  1. Overview
  2. Data Parameter Definition
    1. Data Source
    2. Data Items
      1. Data Attributes
      2. Data Binning
  3. Packaging of Data
    1. File Format and Average Scheme
  4. Time Definition
  5. Exporting the Data
  6. Saving the Definitions
  7. Variable Naming Conventions
  8. XML File Layout and Tag Definitions

Overview

exportIDFS is a program which is used to extract and export data that has been stored in the Instrument Description File System (IDFS) format. The IDFS format is a data storage format that is designed to be general enough to handle the majority of scientific data sets. These data sets include raw telemetry, processed data, simulation data and theoretical data. IDFS data sources are defined as either scalar instruments or vector instruments. A scalar instrument returns singular data quantities that are dependent only upon time and position. A vector instrument returns one-dimensional data quantities that have a functional dependence on a single variable, which in IDFS terminology is called the scanning variable.

The exportIDFS program can be invoked in one of two modes: (1) interactive mode or (2) batch mode. In interactive mode, the program utilizes a GUI-based definition session to define the data items to be exported. Once this definition session has been completed, the selected data parameters can then be exported to the selected file format. To invoke the program in interactive mode, type exportIDFS at the command line.

In batch mode, the interactive GUI-based definition session is bypassed and the data requested is immediately exported based upon information contained in the named layout file. To invoke the program in batch mode, type exportIDFS -FName filename at the command line. The argument filename is the name of the layout file that is to be utilized during the current export session. Note that the name of the layout file does not include the .EXP extension, which is appended to the filename provided by the user during the GUI-based definition session. If the named layout file does not exist, an error is displayed and processing terminates. For a complete list of arguments that can be utilized by SDDAS applications that support batch mode processing, the user is referred to the SDDAS Applications Batch Interface document. The user should be aware that the exportIDFS program utilizes only the layout filename, beginning time, ending time and graphics device number command line options. If any of the other command line options are specified, a message is displayed stating that the specified option is not utilized for the exporting of the data for the current export session. The exportIDFS program utilizes the graphics device number command line option differently than the other SDDAS applications since there is no graphics output associated with exportIDFS. The exportIDFS program utilizes the graphics device number command line option to allow for the selection of the file format to which the data is to be exported (CDF, netCDF, IDFS or XML).

Data Parameter Definition

In order to export IDFS data, an IDFS source must first be selected. This is achieved by selecting the "Data Source" button. Once a valid IDFS source has been selected, the "Data Items" button becomes visible. At this point, the data items to be exported from the selected IDFS source must be defined. At least one data item must be selected; otherwise, an error message will be displayed when the "Export" action is selected.

In most cases, the data items to be exported are referred to as IDFS sensors. An IDFS sensor is defined as a primary data source returned by the virtual instrument in question. However, in some cases, the data items may be SCF output variables. The SCF (Science Computation Formulation) system provides for the creation of new data products from an existing primary data set (IDFS). In some cases, these derived products may be dependent upon values returned from a single instrument; in other cases, the derived products are dependent upon values taken from many instruments. For a more in-depth explanation of the SCF system, the user is referred to the paper entitled "The Science Computation Formulation System".

When the "Data Items" button is selected, the Data Sources GUI is displayed. On this GUI resides a list which indicates the data items to be exported. Initially, this list is empty. To add a data item to the list, the pull-down Insertion menu is utilized. Once the position for the data item to be added has been determined, the actual data item must be defined using the Data Attributes GUI. This GUI is automatically invoked when a new item is added to the list. The parameter selection is defaulted to the first data item defined for the IDFS source based upon information contained in the PIDF file. If changes to any of the attributes for a specific data item need to be made, the data item should be selected from the list and the "Attributes" button should be activated in order to invoke the Data Attributes GUI.

Once the data item has been selected, binning information must be provided for the data item. The binning information is defaulted based upon information contained in the PIDF file for the selected data item. For IDFS data items, all sensors are binned using the same binning scheme; therefore, the first data item definition on the list defines the binning scheme that will be used by all exported IDFS sensors. For SCF output variables, each data item is binned uniquely. Therefore, if three SCF output variables are selected for export, three unique binning schemes are utilized by the data acquisition software. The user need not concern themselves with this information unless a change in binning schemes is desired, which can be achieved by selecting the "Binning" button.

The exportIDFS program allows for the exportation of IDFS sensor or SCF output variable data, but not any combination of the two data sources. The first data item definition on the list determines if IDFS or SCF data products are to be exported. If both IDFS and SCF data items are needed, separate instances of the exportIDFS program must be run.

Packaging of Data

Once data items to be exported have been selected, the file format and averaging scheme for the data must be chosen. This is achieved by selecting the "Data Packaging" button. Currently, IDFS and SCF data can be exported to one of four file formats:

  1. Common Data Format (CDF)
  2. Unidata Network Common Data Form (netCDF)
  3. IDFS
  4. XML
The default scenario for the exportIDFS program is to export data in the CDF file format.

In addition to the file format, the user must also specify the averaging scheme to utilize for each exported data sample. The user may specify either a sample average or a time average. With a sample average, the user specifies the number of data samples (sweeps) to average together for each exported data sample. With a time average, the user specifies the amount of time to be acquired for each exported data sample. When time average is selected, the time specified is converted to sweeps using the maximum temporal resolution allowed by the selected virtual instrument. If the result is a non-integer value, the number of sweeps acquired is determined by adding the integer component with the ceiling of the accumulation of the fractional component. If no averaging is required, that is, if a sweep by sweep dump is to be performed, a sample average with number of samples set to one should be defined. This is the default scenario for the exportIDFS program.

Time Definition

Before the exportIDFS program can export the data, one final piece of information must be defined. This information is the time range for which the data is to be acquired. This is achieved by selecting the "Time" button.

Exporting the Data

Once the IDFS source, data items, file format and time range have been defined, the selected data items can then be exported to the selected file format. To export the data, select the pull-down Action menu from the main menubar and select the Export option. Upon activation, the local database is checked to see if the requested data files are online. If data for the requested time range is not online, the missing data is promoted to the local disk. Once the data has been placed online, the datafiles are opened, the data is extracted and exported to the selected file format. Data will continue to be processed until the user-requested end time has been reached or until an error condition is raised. When an error condition is encountered, a message is displayed, the partially created file is purged and processing terminates. Upon completion of the export task, successful or unsuccessful, any promoted IDFS data files are removed from the local disk.

When the exportIDFS program is run in interactive mode, a check is made to see if the file to be generated already exists in the current working directory. If it does, the user will be asked if they wish to overwrite the data file. If the user answers yes, the file is removed and an attempt is made to create a new file. If the user answers no, the current request for data exportation is aborted. When run in batch mode, no query is made; the file is removed and an attempt is made to create a new file.

Since the exportIDFS program has the potential to generate large data files, a clean-up mechanism is utilized. Whether or not the clean-up mechanism is invoked depends upon the actual user running the exportIDFS program. If there exists a ".guest" file in the user's home directory, the data file will be scheduled for removal 30 minutes after the data file has been closed. The user will be informed of this situation. If a ".guest" file does not exist in the user's home directory, the generated data files will be left untouched. This scheme was designed for those sites that set up a public guest account through which outside users are given access to the named local system. The contents of the ".guest" file is not important; simply, the existence of the file is utilized.

For IDFS sensor data, the exportIDFS program will export the selected data items, data quality information, any secondary data sources selected for exportation and the time range associated with each data sample. If the IDFS source returns instrument status values, this information is also exported. Instrument status values utilize a separate time tag, which will be written to the file. All of this information is considered record-variant since the values change from data sample to data sample. If the selected IDFS source is a vector-instrument, the scan values which correspond to the returned data bins are written to the file. The center scan values and the band-width values for each data bin are written once to the file since these values remain constant.

For SCF output variables, the exportIDFS program will export the selected data items along with the time range associated with each data sample. This information is considered record-variant since the values change from data sample to data sample. If the SCF output variable returns non-scalar data, the scan values which correspond to the returned data bins are written to the file. The center scan values and the band-width values for each data bin are written once to the file since these values remain constant. Unlike IDFS sensors. each SCF output variable can bin data uniquely. Therefore, if three non-scalar SCF output variables are selected for export, three unique binning schemes are utilized by the data acquisition software and thus, three sets of center and band-width values are written to the file.

Saving the Definition

Once all the information has been defined, the information may be saved to a layout file for future retrieval. This is achieved by selecting the pull-down File menu and selecting the Save As option. The information defined is not saved by the program unless the user explicitly does so. Note that when providing the name of the layout file, do not specify the .EXP extension. The exportIDFS program automatically appends the .EXP extension to the name of the layout file upon creation of the file.

Variable Naming Conventions

In exporting the data to the various formats, every attempt was made to provide descriptive names for the variables contained within the resultant files. Exporting data to netCDF was the most restrictive with respect to variable name definition, followed by CDF and then, least restrictive was IDFS and XML.

Due to the limitations / restrictions of the various formats, the following conventions are followed:

  1. netCDF - Since variables should begin with a letter and be composed of letters, digits and underscores, the convention that is used is to name the variable by its "data type" followed by the ordinal value of its definition. For example, SENSOR2 implies that the variable is IDFS sensor (primary) data for the second data item selected for exportation. A more descriptive name for the variable is provided in the variable attribute long_name, which is not restricted to just letters, digits and underscores. For example, the string "Retard/Pk 3" is a value assigned to the long_name attribute for one IDFS sensor data source.

    Instrument status values, or MODE data, are pertinent to the instrument as a whole, not to any one sensor definition. For netCDF, the naming convention utilized for the instrument status values is MODEx, where x represents the mode definition number, starting with zero. This convention was selected since a mapping variable is provided for each instrument status value defined. This mapping variable is an array of ASCII strings that describe what the value for the mode represents. There should be one definition for each possible value for the mode (3 bits = 8 definitions). For example, MODE1 is a status value defined to have two states - 0 and 1. There is also a mapping variable called MODE1_key, which has 2 entries, "Low Bias" and "High Bias". Therefore, when MODE1 returns a value of 0, the instrument is in Low Bias mode. It was decided that it would be easier to match numbers than it would be to match names since the user would first have to determine what the names were for each of the instrument status values.

  2. CDF - In creating CDF files, it was determined that variable names must be unique; that is, two variables by the same name can not co-exist within a CDF file. With the exportIDFS program, the user may select the same data item more than once. In addition, even if different data parameters are selected, the exportIDFS program will export data quality information and possibly secondary data sources for the selected IDFS data sources. These data quantities will have the same variable name for each of the selected primary data items. In order to guarantee unique variable names, the convention that is used for CDF processing is to precede the variable name by the ordinal value of its definition. For example, "1_Retard/Pk 3" is the variable name assigned to the first data item selected for exportation, "1_Data Quality" is the variable name for the data quality value for the first data item selected, and "1_Scan Voltage Steps (Cal.)" is the variable name for the calibration data associated with the first data item selected. The label (Cal.) is appended to better identify the source of the data product.

    For CDF, the naming convention utilized for the instrument status values is MODEx_descriptive name, where x represents the mode definition number, starting with zero. This convention was selected since a mapping global attribute is provided for each instrument status value defined. This mapping variable is an array of ASCII strings that describe what the value for the mode represents. There should be one definition for each possible value for the mode (3 bits = 8 definitions). For example, "MODE1_Retard Sweep Range" is the second status value (MODE1) defined for the IDFS data source of interest. The name defined for this instrument status value is "Retard Sweep Range". This instrument status value has two defined states - 0 and 1. There is also a mapping variable called "MODE1_KEY", which has 2 entries, "Low Bias" and "High Bias". Therefore, when "MODE1_Retard Sweep Range" returns a value of 0, the instrument is in Low Bias mode. It was decided that it would be easier to match numbers (MODEx_) than it would be to match names since the user would first have to determine what the names were for each of the instrument status values.

  3. IDFS - The IDFS export format is simply an ASCII file which contains the selected IDFS data items, along with data quality information, secondary data sources and any instrument status values. The layout of the file is such that for each sweep of data, timetags are reported, the selected IDFS data item (sensor data) is outputted, followed by the Data Quality value and other secondary data products associated with the selected IDFS data item. This pattern of sensor data, data quality and secondary data is repeated for each selected IDFS data item. The instrument status values are then written, as they pertain to the instrument as a whole. For SCF output variables, the layout of the file is such that for each sweep of data, timetags are reported along with the selected SCF data item(s).

    Exporting data to IDFS resulted in the most descriptive names since the file is simply ASCII text - a dump of labels and values. The variable names are outputted as they are defined for the IDFS and SCF data products. For example, "Retard/Pk 3" is the variable name assigned to the first data item selected for exportation, "Data Quality" is the variable name for the data quality value for the first data item selected, and "Scan Voltage Steps (Cal.)" is the variable name for the calibration data associated with the first data item selected. The label (Cal.) is appended to better identify the source of the data product. The instrument status variable name "Retard Sweep Range" is outputted to the ASCII file. In addition, there are variables reported to indicate the number of selected IDFS data items, the number of calibration sets defined and the number of instrument status values defined in order to process the data in a self-describing way.

  4. XML - For XML, there is no "variable" name as utilized by the other file formats. The data is simply tagged to identify the data value (numeric or character) being exported. One attribute of the data being exported is the name of the data item and is identified by the XML tag Name. Just like the IDFS file format option, XML outputs the names as they are defined for the IDFS and SCF data products in the PIDF file. For example, "Retard/Pk 3" is the value assigned to the Name tag for the first data item selected for exportation, "Data Quality" is the value assigned to the Name tag for the data quality value for the first data item selected and "Scan Voltage Steps" is the value assigned to the Name tag for the calibration data associated with the first data item selected. Since calibration data is identified within the XML file using the tag Calibration, there is no need to append the label (Cal.) to the Name value, which is done for the IDFS file format option. The value "Retard Sweep Range" is assigned to the Name tag for the instrument status value within the XML file.

    XML File Layout and Tag Definitions

    When the user selects XML as the file format for IDFS sensor data, the file generated is simply an ASCII file which contains the selected IDFS sensors, along with data quality information, any secondary data sources selected for the selected IDFS data items and any instrument status values, all identified using XML tags. The data is basically blocked or grouped together in the following manner:

    All of the data blocks identified above may or may not be contained within the XML file created, based upon the IDFS data source selected. For scalar IDFS data sources, there is no scan information; therefore, the Scan Block information is not pertinent and is not included in the XML file generated. If the IDFS data source does not define any data quality or instrument status values in the PIDF file, there is no Data Quality or Mode information to be written to the XML file. The Pitch Angle, Start Azimuthal Angle, Stop Azimuthal Angle and Calibration blocks pertain to secondary data sources and therefore are written to the XML file if the secondary data source is applicable for the selected IDFS data source and if the user selected the secondary data source for exportation.

    When the user selects XML as the file format for SCF data items, the file generated is simply an ASCII file which contains the selected SCF data parameters, all identified using XML tags. The data is basically blocked or grouped together in the following manner:

    All of the data blocks identified above may or may not be contained within the XML file created, based upon the SCF output variables selected. Unlike IDFS data sources which are uniform in rank, SCF output variables can be a mixture of scalar and multi-dimensional data (1-D up to 10-D). If the selected SCF output variable has a scan variable associated with it, the Scan Block information is included in the XML file generated and a Scan Index value is placed within the Data Item block to link the data with the scan information. This is done for each SCF output variable that has scan information defined; therefore, there may be multiple Scan Blocks contained within the XML file.

    The following table identifies the tags which are utilized by the exportIDFS program for the XML file format option:

    XML Tag Pertinent to
    IDFS or SCF
    Meaning
    Idfs_Parameters IDFS token which identifies the data as IDFS data items(s)
    Scf_Parameters SCF token which identifies the data as SCF output variable(s)
    Scan IDFS and SCF token which groups together information that is associated with the scan variable for the data items(s) being exported
    Scan_Unit IDFS and SCF token which describes the units that the values are expressed in for the scan variable
    Scan_Length IDFS and SCF token which defines the number of values returned for the scan variable
    Center_Scan IDFS and SCF token which identifies the center scan values associated with the data items being exported
    Scan_Low IDFS and SCF token which identifies the lower scan edge values for the scan range associated with the data items being exported
    Scan_High IDFS and SCF token which identifies the upper scan edge values for the scan range associated with the data items being exported
    Scan_Block_Index SCF token which represents a scan block identifier number. This number is used to link the exported SCF output variable(s) with any scan information pertinent to the data item in question.
    Data_Set IDFS and SCF token which groups together information that is associated with each exported data sample
    Number IDFS and SCF the exported data sample number, with numbering starting at zero (like a record counter)
    Start_Time IDFS and SCF token which defines the start time for the exported data sample
    Stop_Time IDFS and SCF token which defines the stop time for the exported data sample
    Data_Item SCF token which groups together information that pertains to each selected SCF output variable
    Scan_Index SCF token which represents an index value (link) to the scan block information that is pertinent to the SCF output variable named in the Data_Item block in which the token appears
    Sensor IDFS token which groups together information that pertains to each selected IDFS data item
    Data_Quality IDFS token which identifies the data as the data quality value associated with the Sensor named in the Sensor block in which the token appears
    Start_Azimuthal_Angle IDFS token which identifies the data as the start azimuthal angle data associated with the Sensor named in the Sensor block in which the token appears. The start azimuthal angle values are always returned as values between 0 and 360 degrees.
    Stop_Azimuthal_Angle IDFS token which identifies the data as the stop azimuthal angle data associated with the Sensor named in the Sensor block in which the token appears. The stop azimuthal angle values could be negative or could be greater than 360 degrees. The stop azimuthal angle values are computed by adding the degrees covered by the accumulation time of each sample to the start azimuthal angle values.
    Pitch_Angle IDFS token which identifies the data as the pitch angle data associated with the Sensor named in the Sensor block in which the token appears
    Calibration IDFS token which identifies the data as the calibration data associated with the Sensor named in the Sensor block in which the token appears. Unlike Data_Quality, Start_Azimuthal_Angle, Stop_Azimuthal_Angle, and Pitch_Angle, there will be one Calibration block defined for each calibration data set defined for the virtual instrument (IDFS data source) in question.
    Mode_Start_Time IDFS token which defines the start time for the instrument status data associated with the exported data sample
    Mode_Stop_Time IDFS token which defines the stop time for the instrument status data associated with the exported data sample
    Mode IDFS token which identifies the data as the instrument status or mode data. The instrument status data is defined for the virtual instrument (IDFS data source) in question; therefore, this data type is not associated with any particular sensor.
    Name IDFS and SCF token which identifies or gives a name to the data parameter being exported
    Unit IDFS and SCF token which describes the units that the data values are expressed in for the data parameter being exported
    Data_Length IDFS and SCF token which defines the number of data values returned for the data parameter being exported
    Values IDFS and SCF token which identifies the actual data values that are being returned for the data parameter being exported

Two XSLT stylesheets have been developed as examples in extracting the data from the xml formatted file. Both examples generate html code to display the data in tabularized format. The first stylesheet entitled IDFS.xsl can be used to process exported IDFS sensor data. The second stylesheet entitled SCF.xsl can be used to process exported SCF output variables.

To test these two stylesheets, an XSLT processor was needed. The principal role of an XSLT processor is to apply an XSLT stylesheet to an XML source file and produce a result "document". The XSLT processor utilized for the testing of the stylesheets was Saxon. Saxon is an open source XSLT processor developed by Michael Kay. It is a Java application, and can be run directly from the command prompt; no web server or browser is required. The html source generated by the stylesheets created is simply directed to standard out by Saxon. At the command line, standard output was re-directed to a file and that file was viewed through a browser for validation. The user is referred to the write-up for Instant Saxon for more information on this XSLT processor.

The remainder of this document gives an in-depth explanation of the options that appear on the various GUIs utilized by the exportIDFS program.


Data Source

The user must select a project, satellite, experiment, instrument and virtual instrument from which data is to be extracted and exported. To change any of the selected options, click on the buttons on the right hand side. Note that all lineage information under the branch being changed is no longer applicable and must be re-selected. When the IDFS data source is changed, any previous data item definitions are deleted from the list and must be re-defined.

Data Items

To add a data item to the list, the pull-down Insertion menu is utilized. The menu options indicate the position within the list at which the current data item definition is to be inserted. These options include:

The first two options, After and Before, indicate a position that is relative to the highlighted entry on the list. The new parameter definition is either placed after or before the current position in the list, respectively. The last two options, First and Last, indicate an absolute position on the list; that is, the new parameter definition is either placed at the beginning of the list or at the end of the list, respectively. Obviously, these options make sense for a non-empty list. Therefore, the first data item definition is always placed at the beginning of the list, regardless of the option selected. The position is utilized when the data is extracted and processed; that is, the data is processed in the order in which it exists on the list.

To delete a data item from the list, the pull-down Removal menu is utilized. Currently, this pull-down menu contains just one option

When this option is selected, the highlighted entry on the list is removed from the list. If no entry is highlighted, no action is taken.

The "Attributes" button invokes the Data Attributes GUI. The "Binning" button invokes the Bins GUI.

Data Attributes

The primary data sources returned by the selected IDFS source are presented in two lists entitled Sensor Group and Sensor. The PIDF file utilizes these two groupings to allow an additional level of subdivision within the primary data sources. This scheme is useful when the IDFS source contains a large number of primary data sources representing a diverse set of measurements.

Data Packaging

CDF Global Attributes

When the CDF file format is selected, a CDF file is created which contains the requested data items and meta data. The meta data is comprised of global-scope attributes that provide information about the data set as an entity. Some of the required global attributes have been selected for potential modification by the user. The values for these global-scope attributes are defaulted by the exportIDFS program. The user need not concern themselves with this information unless a change in the meta data is desired. A brief explanation of the options is given below. In all cases where a list is utilized, the list of options that are selectable are defined according to CDF documentation.

Time

In order to set the time values, enter the values in the boxes that appear next to the time component being set or use the increment / decrement arrows. The stop time must be greater than the start time. The time is initially set to the current time. By Julian convention, January 1 is day 1.

File Button

Action

Exports the selected IDFS data items or SCF output variables for the selected time range.