The DB System
The DB system is the collective name for the programs and supporting structures used to access and manipulate data formatted in CPD2 style. Information on the programs and variables used with data formatted in CPD3 style is available here.
For solutions to common tasks see the CPD2 FAQ and CPD3 FAQ.
Data in the DB system is exchanged in the form of records consisting of variables. CPD3 uses a different internal approach to representing data, although many of the CPD2 names of records and variables can still be used (CPD3 data structures).
Data Access
- data.avg - Average, fill, and/or rectangularize data.
- data.consolidate - Access data by individual variables.
- data.consolidate.station - Access data by individual variables from multiple stations.
- data.get - The primary interface used to access data.
- xt2 - Shorthand data extraction.
Intensive Data Generation
- data.aggregate.ccnfit - Calculate CCN power law fits.
- data.aggregate.humidograph - Generate humidograph f(RH) curve fits.
- data.aggregate.intensives - Generate intensive parameter records.
- data.sd.cumulative.calc - Calculate cumulative integral properties from a size distribution
- data.sd.optical.calc - Calculate optical scattering/absorption from a size distribution.
Data Export
- data.export - Reformat and rectangularize data for use in other programs.
- data.aggregate.ebas - Generate NASA-Ames formatted data for WDCA/EBAS.
- data.aggregate.cpd1 - Generate cpd1 style records.
- data.aggregate.ames - Generate NASA-Ames formatted data.
- data.aggregate.narsto - Generate NARSTO formatted data.
- data.demultiplex - Extract records from a multi-record multiplexed stream.
- data.multiplex - Multiplex data from multiple sources into a single stream.
Secondary Data Interfaces
- data.confhistory - Generate a report of CPD2 conf changes.
- data.confget - Fetch the CPD2 conf for a given time.
- data.coverage - Generate a report with estimated data coverage.
- data.coverage.plots - Generate monthly coverage plots.
- data.coverage.wdca - Generate a report of WDCA submitted coverage.
- data.faultreport - Generate a report about detected data faults.
- data.flagrepass - Flag data as needing to be repassed.
- data.flight - Get flight information (start end end).
- data.lost - Mark data as permanently unavailable.
- data.status - Generate a report of data processing status.
Comment System
Comments are arbitrary strings associated with an archive, type, and a time range. Most types are allowed to overlap. Non-overlapping comments for when data is (re)processed and passed are automatically generated.
- data.comments.get - Retrieve comments for a station.
- data.comments.modify - Modify station comments.
Segmentation System
Segments are single line strings associated with a type and time range. Segments are non-overlapping.
- data.instruments.get - View filter for instrument information segments.
- data.instruments.locate - Generate a location report for instruments.
- data.segmentation.avg - Generate averaged data based on segments.
- data.segmentation.get - Retrieve segments for a station.
- data.segmentation.modify - Modify station segments.
- data.segmentation.neph - Generate neph error segments.
- data.segmentation.spikefilter - Generate segments based on a spike filter.
Data Storage Interfaces
- data.archive.get - Get data from a DB archive.
- data.archive.put - Update data in a DB archive.
- data.newstation - Create required structures for a new station.
- data.newstation.sqldb - Create required tables for SQL data export.
Editing and Correction System
- data.edit.contamfilter - Remove contaminated values from data.
- data.edit.dilution - Generate corrected or apply dilution corrections to data.
- data.edit.duplicate - Duplicate an instrument in the data stream.
- data.edit.get - Generate edited data.
- data.edit.mentor.edcum - Synchronize the edit database with the ed_cum file.
- data.edit.mentor.generate - Generate suggested mentor edits.
- data.edit.mentor.generate.nepherrors - Generate suggested mentor edits based on neph errors.
- data.edit.mentor.generate.spikefilter - Generate suggested mentor edits based on spike detection.
- data.edit.mentor.get - Get defined mentor edits.
- data.edit.mentor.modify - Modify mentor edits.
- data.edit.mergewx - Merge external MET data into the data stream.
- data.edit.psap_recalculatetr - Recalculate transmittances at higher precision.
- data.edit.wl - Adjust scattering or absorption data to different wavelengths.
- data.wx.pass - Pass MET data into the clean archive.
- data.ozone.pass - Pass ozone data into the clean archive.
- data.aethalometer.pass - Pass aethalometer data into the clean archive.
Correction Chain
These programs are normally called by data.edit.corr but they can function as part of an external pipeline, provided they are called with the correct arguments.
- data.edit.corr.aeth_calc_abs - Calculate absorption for the Aethalometer.
- data.edit.corr.aeth_schmid_coen - Apply the Schmid/Coen correction to Aethalometer data.
- data.edit.corr.aeth_stp - Adjust aethalometer data to STP.
- data.edit.corr.aeth_weingartner - Apply the Weingartner correction to Aethalometer data.
- data.edit.corr.cut_blank - Enforce data blanking after cut switches.
- data.edit.corr.field - Apply a general Perl expression to the data.
- data.edit.corr.neph_stp - Correct neph data to STP.
- data.edit.corr.neph_trunc - Apply the neph truncation correction.
- data.edit.corr.ozone_env - Add ozone envelope information.
- data.edit.corr.psap_bond - Apply the Bond correction to PSAP data.
- data.edit.corr.psap_cal - Apply a flow change or spot size correction to PSAP data.
- data.edit.corr.psap_oldflags - Handle flags for translated PSAP data.
- data.edit.corr.psap_weiss_apply - Apply the Weiss correction to PSAP data.
- data.edit.corr.psap_weiss_undo - Undo the Weiss correction to PSAP data.
- data.edit.corr.scale - Apply a scale, offset, or polynomial to fields in data.
Internal
These programs are not usable outside of being called by data.edit.get.
- data.edit.addflagsvariable - Add a flags variable to data.
- data.edit.corr - Apply the correction chain to the data stream.
- data.edit.cut_normalize - Ensure equal numbers of samples on both cut sizes.
- data.edit.mentor - Apply mentor edits to the data stream.
- data.edit.standard_corr - Apply a simple standard correction sequence.
System Utilities
- data.aggregate.allstations - Run aggregation jobs for all stations.
- data.aggregate.cpd1ftp - Update cpd1 style zip files on the FTP server.
- data.aggregate.cpd1qtr - Update cpd1 style zip files of raw interval data.
- data.aggregate.nephraw - Generate raw TSI neph data.
- data.aggregate.nilu.archive - Upload archived data to NILU for EBAS import.
- data.aggregate.nilu.nrt - Upload NRT data to the NILU archive.
- data.aggregate.ozonestats - Generate ozone statistical envelopes.
- data.aggregate.publishnotify - Send an email when data is ready to be published.
- data.aggregate.upload - Generate and upload data to remote archives.
- data.aggregate.sqldb - Update data in an SQL database.
- data.aggregate.station - Run aggregation jobs for a given station
- data.aggregate.tostnnew - Generate data for stnnew to process.
- data.avg.psap_bond - Average PSAP from transmittances with the Bond correction applied after averaging.
- data.cache.clean - Clean the cache for a given station.
- data.cache.cleanall - Clean the caches for all stations.
- data.cache.invalidate - Invalidate cached data for a station.
- data.ebas.get - Import data from the EBAS archive.
- data.flight.process - Run flight processing for a station.
- data.forallstations - Run a handler for all stations.
- data.fortimebins - Run a handler for time bins within a range.
- data.legacy.cpd1 - Converter for CPD1 style data to CPD2 style.
- data.legacy.cpd1nk - Converter for CPD1 spancheck data to CPD2 style.
- data.legacy.frh - Converter for CPD1 f(RH) fit data to CPD2 style.
- data.legacy.get - Access and convert legacy data.
- data.legacy.stripchart - Converter for old stripchart transcriptions to CPD2 style.
- data.localdata.get - Access locally storage station data.
- data.lowpass - Apply a digital filter to data.
- data.plots.update - Generate/update plots for a station.
- data.process.allmail - Mail notices for all stations.
- data.process.allstations - Process new data for all stations.
- data.process.ingest.aeth - Ingest raw Aethalometer data.
- data.process.ingest.avg_psap3w - Ingest raw PSAP-3W data and average the output.
- data.process.ingest.raw_clap3w - Ingest raw data from the CLAP-3W and output CPD2 style data.
- data.process.ingest.tsi_neph - Ingest raw data from the TSI neph and output CPD2 style data.
- data.process.mail - Mail notices for a given station.
- data.process.mail.checkclean - Generate a report for out of data clean data.
- data.process.mail.cpdconfchanges - Generate a report for cpd.conf changes.
- data.process.mail.getcomments - Extract monitor comments into a report.
- data.process.mail.nephstatus - Report any new neph status events.
- data.process.mail.notifyloss - Report data loss.
- data.process.mail.psapstatus - Report any new PSAP/CLAP status events.
- data.process.new.printspanchecks - Report the results of any spanchecks.
- data.process.psap_int - Use PSAP/CLAP intensities/transmittance to generate lower frequency absorption data.
- data.process.psap_recalculate - Recalculate PSAP parameters for CTS.
- data.process.upload - Upload pending data.
- data.process.station - Process new station data.
- data.rrneph.get - Get RR neph data corrected for zeros from the TSI neph.
- data.sd.smps.get - Convert raw SMPS data into CPD2 format.
- data.sd.sems.get - Convert raw SEMS data into CPD2 format.
- data.sd.semsnew.get - Convert raw new SEMS data into CPD2 format.
- data.segmentation.allstations - Update segmentation for all stations.
- data.shift - Shift time and change stations of input data.
- data.update.avg - Update the pre-generated clean averages.
- data.update.daily - Run required daily tasks on the DB system.
- data.update.weekly - Run required weekly tasks on the DB system.
- data.wdca.ames.convert - Convert NASA-Ames data from the WDCA to CPD2 style.
- data.wx.get - Consolidate weather data from various input sources.
System synchronization
Found in $DB/bin/sync
- data.sync.lockserver - Remote lock server to coordinate locks for synchronization.
- data.sync.station - Synchronize a station to or from a remote DB instance.
Configuration Files
- aggregate.conf - Controls the programs run by data.aggregate.station.
- ames/ingest.conf - Controls NASA-Ames ingest done by data.wdca.ames.convert.
- caching.conf - Controls data caching done by data.get.
- ccnfit.conf - Controls CCN fitting done by data.aggregate.ccnfit.
- comments.conf - Controls the behavior of system comments managed by data.comments.modify.
- contaminate.conf - Controls which variables are affected (generally not averaged) by contamination, see data.avg.
- corr.conf - Controls the correction chain programs for data.edit.corr.
- cpd2.send.conf - Controls CPD2 data sending and retrieval by data.localdata.get.
- cts.conf - Controls the CTS PSAP correction algorithm for data.edit.psap_cts.
- diagnostics.mail.conf - Defines the email recipients of the internal diagnostic results.
- dilution.conf - Defines the dilution setup used by data.edit.dilution.
- ebas.import.conf - Defines the import procedure to access data from EBAS with data.ebas.get.
- edit.filters.conf - Defines the editing and correction chain invoked by data.edit.get.
- faultreport.conf - Defines the rules checked by data.faultreport.
- get.sources.conf - Defines the available data sources.
- humidograph.conf - Controls the processing done by data.aggregate.humidograph.
- ignorecut.conf - Controls which variables are not affected by cut size (generally when averaging), see data.avg.
- instruments.conf - Defines the instruments at a station, used by various processing programs.
- intensives.conf - Defines the intensive records generated by data.aggregate.intensives.
- legacy.conf - Controls the legacy sources and times used by data.legacy.get.
- legacyheaders.conf - Defines headers for legacy data generated by data.legacy.cpd1.
- mentorgenerate.conf - Defines the programs called by data.edit.mentor.generate.
- niluarchive.conf - Controls data uploaded to the NILU WDCA archive by data.aggregate.nilu.archive.
- notify.conf - Defines who to send email to for various station events.
- nrt.conf - Controls the data sent by data.aggregate.nilu.nrt.
- plots.conf - Controls the overview plots generated by data.plots.update.
- upload.conf - Controls data uploaded by data.aggregate.upload.
- usernotify.conf - By user notification email lists.
- records.conf - Defines CPD1 and CPD2 records. Used in wildcard lookups (data.consolidate) and conversions (data.legacy.cpd1 and data.aggregate.cpd1).
- spikefilter.conf - Defines a spike filter used by data.segmentation.spikefilter.
- sqldb.conf - Controls data export to a MYSQL database, used by data.aggregate.sqldb and data.newstation.sqldb.
- standard_corr.conf - Controls the corrections applied by data.edit.standard_corr.
- tostnnew.conf - Defines what incoming CPD2 logged data to export to the legacy stnnew processing chain.
- wx.conf - Defines the sources of data used by data.wx.get.
Configuration Directories
A configuration directory is a directory either in the default location, $DB/etc/$DIRECTORY or in the station specific one at $DB/etc/$STATION/$DIRECTORY. Where “$DIRECTORY” is one of the names below. The exact override behavior varies with the specific directory, but generally the files in the station specific one take precedence over those in the default.
- ames - Controls the data generated by data.aggregate.ames.
- ebas - Controls the data generated by data.aggregate.ebas.
- narsto - Controls the data generated by data.aggregate.narsto.
- plots - Contains the XML files used by CPX2 when called by data.plots.update.
R Programs
These are programs that interface with R.
- cdf_plot - Generate a CDF plot for a single variable from one or more stations.
- filter_allan - Generate Allan plots and statics for a filter based absorption instrument.
- normal_stats - Generate normal distribution related statistics about variables from a single station.
- quantile_stats - Generate quantile statistics for set of stations and variables.
See also the examples in /aer/prg/r/examples for programs that are not directly callable but can be copied and modified for other usage.
Other
- generate_psap_spots - Generate PSAP spot segmentation from cpd.ini
- cnvt_Xme - Generate Xme MET data from me_ CPD1 files.
- cnvt_neph - Generate CPD2 data from raw TSI 3563 Neph logs.
- cnvt_iso - Convert timestamps in CSV files to ISO 8601 date-times.
- cpx2 - CPX2 data viewer.
- nephstat2 - Generate statistics for a run of data for a neph.
- colreorder - Reorder columns of CSV data based on their header names.
- timestamp_photo - Add EXIF timestamps to photo names.
- tdrecalc - (Re)calculate T and RH in a data stream.
- ingest_gmd_met_cr1000 - Documentation for the CR1000 ingest for GML met data.
CPD3 command summary
The internal structure of the database used by CPD3 is completely different from that used by CPD2, but many of the commands used by users are similar. In general, database commands for CPD3 begin with da., whereas the commands for CPD2 begin with data..
Complete documentation for CPD3 database commands will be published later. These pages are intended to get a new user of CPD3 started.
Documentation for all CPD3 database commands is available by using the –help
argument to the command. Detailed information on command arguments is available using –help=arg
, where arg
is the name of the argument. For example,
da.get --help=variables
returns an explanation of the variables
bareword argument. This is important information that belongs in the FAQ, but here it is.
The "variables" bare word argument sets the list of variables to read from the archive. This can be either a direct list of variables (such as "T_S11" or a alias for multiple variables, such as "S11a". In the single variable form regular expressions may be used, as long as they match the entire variable name. This list is delimited by commas. If a specification includes a colon then the part before the colon is treated as an override to the default archive. If it includes two colons then the first field is treated as the station, the second as the archive, and the third as the actual variable specification. If it includes four or more colons then the specification is treated as with three except that the trailing components specify the flavor (e.x. PM1 or PM10) restriction. This restriction consists of an optional prefix and the flavor name. If there is no prefix then the flavor is required. If the prefix is "!" then the flavor is excluded, that is the results will never include it. If the prefix is"=" then the flavor is added to the list of flavors that any match must have exactly. For example:"T_S11" specifies the variable in the "default" station (set either explicitly on the command line or inferred from the current directory) and the "default" archive (either the "raw" archive or the one set on the command line. "raw:T_S1[12]" specifies the variables T_S11 and T_S12 from the raw archive on the "default" station. "brw:raw:S11a" specifies the "S11a" alias record for the station "brw" and the "raw" archive. ":avgh:T_S11:pm1:!stddev" specifies T_S11 from the "default" station hourly averages restricted to only PM1 data, but not any calculated standard deviations. The string "everything" can also be used to retrieve all available variables.
Data Access
- da.get - Primary command to get data from an archive.
- da.avg - Generate averaged data.
- da.export - Convert CPD3 data to an external data format.
- da.select - Selects data to get, with fine control over details
- da.where - Selects data based on conditional tests
- cpd3messagelog - Get entries from on-line message log
- da.generate.edited - Generates edited data for a station, allowing use of different editing profiles to sprcify different sets of data or methods of generating edited data.