Binning IDFS data

Overview
Binning Schemes
Manipulation of Data Buffers
1. Data Gaps
2. Missing Data

Overview

IDFS data sets are classified as either scalar or vector data sets. A scalar data set returns scalar quantities; that is, data values that are dependent only upon time and position. A vector data set returns one-dimensional data items that have a functional dependence on a single variable, which in IDFS terminology is called the scan variable. The length of the vector is dictated by the IDFS data source. The instrument returning the vector data set may return data values that span the whole scan range or may return data values that encompass a subset of the scan range. Since the number of data values returned can change from sweep to sweep, a data binning scheme was defined. The binning scheme is utilized when IDFS data sweeps are averaged together. The binning scheme defines the size and spacing of the data buffers that are returned by the IDFS averaging routines.

The Bins GUI presents the options that are applicable to the definition of the binning scheme for the selected IDFS data source. The values for these options are initialized using the binning information specified in the PIDF file for the selected data parameter. The options contained on this GUI are applicable to instruments that return vector data sets. For instruments that return scalar quantities, there is no binning scheme to define since only one value is returned. Therefore, the data buffers are comprised of a single data bin.

Binning Schemes

Data bins can be created using one of two formats, either Fixed or Variable. With a Fixed format, the element swp_len from the VIDF file is used to determine the number of data bins. Though the number of bins is not changeable, the spacing of the bins is changeable.

With a Variable format, the following information must be defined:

the number of bins to create
the spacing of the bins
the start value associated with the first bin
the stop value associated with the last bin
the storage scheme to use for the data

The figure below serves as an aid in depicting the interaction of various information utilized for Variable-formatted bins.

[Bins Figure]

Manipulation of Data Buffers

For those applications that allow data buffer manipulations, the Bins GUI also contains options that specify how data gaps are to be handled and how missing data values within a data buffer are to be resolved.

The remainder of this document gives an in-depth explanation of the options that appear on the Bins GUI.

Binning Formats

Bin Calculation defines the format used to determine the number of bins to utilize for the binning of the data. The formats defined are:

Fixed
Variable

Number Of Bins

# of Bins defines the number of bins to utilize for the binning of the data. This value must be greater than or equal to one. This option is applicable only to the Variable Format binning scheme.

Bin Spacing for Variable Format

Bin Spacing defines the spacing for the data bins for Variable-format bins. The spacing options defined are:

Linear
Log

Linear spacing defines a scheme where the lower (upper) edge of the band is determined by subtracting (adding) one-half of the difference between two successive center values from (to) the center value. The same algorithm is used for log spacing, with the log of the center values being utilized.

Center of Start Bin Value

Center of Start Bin defines the center scan value associated with the first bin. This option is applicable only to the Variable Format binning scheme.

Center of Stop Bin Value

Center of Stop Bin defines the center scan value associated with the last bin. This option is applicable only to the Variable Format binning scheme.

Data Storage

Input Storage defines the storage scheme for the binning of the data. This option is applicable only to the Variable Format binning scheme. The storage schemes defined are:

Point
Band

The data in a vector data set are taken as a function of a scan variable (M). If M is allowed to vary over the individual measurement period or if M actually represents a bandwidth, then each element in the vector can be considered to have been accumulated with the interval M - delta1 to M + delta2. Vector data is binned by M.

If Point storage is selected, the data is stored by the scan value associated with the data. If the scan value is located between the upper and lower edge values of a given bin, the data value is placed only in this bin.

If Band storage is selected, the data is placed in all bins which are fully or partially contained within the range M - delta1 to M + delta2. The data is multiplied by the percentage of the bin covered by the range before the data is placed into the bin.

Bin Spacing for Fixed Format

Bin Spacing defines the spacing for the data bins for Fixed-format bins. The spacing options defined are:

Zero
Linear
Log
Variable

Each data bin is associated with a range (band) that covers a subset of the scan range defined for the vector data set. Zero spacing defines a scheme where the lower edge of the band is the same as the upper edge of the band; that is, the band width values are the same as the center values.

Variable width spacing defines a scheme which makes use of tables defined in the VIDF file to create the center scan values and the band correction values. These band correction values are applied to the center scan values in order to calculate the band width values.

Data Gaps

Bins are filled with data until the acquisition period has been completed. If the instrument is returning a subset of values for each data sweep, it is possible to have data bins go from data-containing bins to empty bins from one acquisition period to the next. In order to preserve the contents of the bins from one acquisition period to the next, the Interleave Missing Data option is used. When enabled, the previous contents of the data bins are left in place and only those bins for which current data has been retrieved are updated; otherwise, all data bins are zeroed out prior to each data acquisition period. The following figure illustrates this concept.

[Interleave Figure]

Missing Data Values

When a subset of the scan range is returned by the vector data set, some of the data bins may be empty. Manipulation of the empty data bins can be specified using the Fill Missing Bins By option. The schemes include:

No Fill
Linear Row/Col
Linear Col/Row
Constant Row/Col
Constant Col/Row
2-D Least Squares Fit

When filling in missing data values, be aware that for some of the instruments, the binning of the data occurs within a two-dimensional set of bins. In this 2-D binning matrix, the columns represent the data bins and the rows represent phi or azimuthal bins. If the sensor measurements are independent of phi, the binning of the data is only one-dimensional; otherwise, the binning is two-dimensional.

For 1-D and 2-D binning, missing or unfilled bins can be filled by linearly interpolating across the holes using values defined at adjacent bins (Linear) or the data in the adjacent bins can be projected inward across the area of missing bins meeting in the center of the gap (Constant). For 2-D binning, such filling can either occur first along the columns and then along the rows (Col/Row), or first along the rows and then along the columns (Row/Col).

In addition to specifying the method to use to fill in the missing bins, some additional information is needed. Cyclic indicates if the data is cyclic with respect to angle values. Bin Projection indicates if the data is to be projected to unfilled bins above and below the first and last data bin that is found to contain data. Need Filled specifies the number of filled data bins needed within a column or row in order to fill in the missing data bins along that same column or row. The minimum value allowed is three.

The 2-D Least Squares Fit fill method is selectable only for 2-D data binning. If this method is selected, a Tension value must be specified. This value indicates the weight factor for the data. This value must be given in terms of r^-tension. In addition, the Order of the fit must be specified. Note that when the Order value is modified, the Need Filled value is also updated since the minimum value when using a 2-D Least Squares Fit is order + 2.

written by Carrie A. Gonzalez
CGonzalez@swri.org

Binning IDFS data

Table Of Contents