aggregate.conf
The configuration file used by data.aggregate.station. There are two version, the first is located in $DB/etc/$STATION/aggregate.$STATION.conf or in the default location $DB/etc/aggregate.conf. The second for slow mode is in DB/etc/$STATION/aggregate.slow.$STATION.conf or in the default location $DB/etc/aggregate.slow.conf. If the station specific file does not exist then the default one is used.
The programs defined by the configuration file are called when a data source reports a change (including newly available data) to it. They are called on intervals specified by the alignment and length. For example a program with one day alignment and one day length is called once for each day (aligned to UTC midnight) for the entire update range. If the update range includes partial segments on either end the fully aligned interval is called. In this example if the update ended at UTC noon, then the program would be called from UTC midnight of the ending day to UTC midnight the next day.
Format
Lines beginning with '#' are treated as comments.
ARCHIVE,ALIGNMENT,LENGTH,FLAGS,PROGRAM
ARCHIVE
The source to monitor for updates. These roughly correspond to the general data archives. Acceptable values are:
- raw - Raw data as added to the CPD2 archive storage format (does not include data processed by the stnnew legacy chain).
- rawnrt - Raw data checked at a much higher time frequency. Also includes data processed by stnnew.
- clean - Clean (passed) data.
- avgH - Hourly averaged clean data.
ALIGNMENT
The time alignment specifier. It consists of either an absolute alignment and an optional offset or a time length (see below). Absolute alignments have an implicit length that is used if the length below is not set.
The available absolute alignments are:
- week - Weekly with Monday being the first day of the week.
- month - Monthly.
- quarter - Standard quarterly (starts at DOY 1, 91, 182, and 274).
- year - Yearly.
The offset and time length are defined as below. In the context of an offset it is delimited by either a “+” to add the length to the alignment base or a “-” to subtract it. In the context of a time length it is the positive offset from the day boundary.
Some examples:
Weekly starting on Tuesday
week+1d
Monthly starting at UTC noon
month+12:00
Yearly from UTC 23:00:00 on December 31 to January 1 01:00:00
year-1h
Daily
1d
Every six hours
6h
LENGTH
The time length to segment program calls on. This is optional when using an absolute alignment as above (that is “week” has an implicit length of seven days, “month” has the length of the month in question) and required for those specified with a time length as the alignment.
Though acceptable to specify a length that would cause gaps in the aggregate segments calls, it is generally not advisable. Similarly a length of more than the real length of the aligned segments will cause multiple calls intersecting the same data range. So for most cases the length should be omitted for absolute alignments and specified as the same as the alignment length otherwise. That is an alignment of “month” should leave it blank while one with “1d” should set it to “1d”.
FLAGS
A series of flags delimited by spaces or semicolons that consist of zero or more of the following:
- complete - Do not run this program until a segment has been completed. So a week segment will not be run until data of the next week has been seen instead of potentially each time data within that week is seen.
- nice - Run the handler with “nice -n 10 … ”. That is, run it at a lower than normal priority; recommended for long running computationally intensive jobs (for example generating the entire station time series).
- excludehost:pattern - Don't run this command on any host that matches the specified Perl regular expression wrapped like “/^$pattern$/i”.
PROGRAM
The program is called as specified but with four additional arguments added on to the end of their command line. These are, in order, the station code, the start of this segment, then end of the segment, and the archive.
For example, a handler listed as “do.stuff –arg1 –arg2” in the configuration file might be called like: “do.stuff –arg1 –arg2 brw 1175558400 1175644800 raw”.
Additionally the environment variables STATION, START, END, and ARCHIVE are set before calling the handler. These may be expanded in the handler's command line if they are present there. That is a handler of “do.stuff –start=$START” would have the second argument expanded as “–start=1175558400”.
Time Length Specification
A time length consists of either a decimal number and a multiplier or an absolute length specifier.
The available multipliers are:
- s - Seconds.
- m - Minutes.
- h - Hours.
- d - Days.
- w - Weeks (7 days, unaligned).
An example number and multiplier might be “3.5h” for 3.5 hours or 12600 seconds.
An absolute time length specifier is of the form:
[[[d:]h:]m:]s
Where “d” is the integer number of days, “h” is the integer number of hours, “m” is the integer number of minutes, and “s” is the integer number of seconds. For example “1:5:30” translates to 1 hour, 5 minutes and 30 seconds or 3930 seconds.