next up previous contents
Next: Archiving Up: ALMA Software Science Requirements: Previous: Operator Interface

Subsections

   
Data Pipeline

Goals

The main motivation for processing ALMA data in quasi-real time is to optimize the scientific efficiency of the array. The instrument will be dynamically scheduled (Section 4), so an evaluation of the data quality must be available very soon after data taking, using visibilities in quasi real time, in order to allow switching projects if the current one is not matched to the actual observing conditions. Though readily available information on the array behavior can be obtained by monitoring atmospheric data (such as water content, and it fluctuations), more valuable information can be obtained by monitoring the atmospheric phase itself (using phase calibrators). Finally it is important to be able to determine quite soon if a project's goals are being attained, and for this a first step is naturally to calibrate the data as completely as feasible, and evaluate the quality of that calibration. The instantaneous u,v coverage being reasonably good, pipeline data processing can include not only calibration but also imaging using, if possible, the best solution to the inversion problem: diagnostic tools could eventually select between competitive methods. Another motivation is naturally to make the instrument more accessible to first-time users by producing images in a quasi-automated mode.

The required data quality level will be specified by the astronomers in their proposals, so that the output of the data pipeline can be used to decide when a project is completed. This can be e.g. on the basis of a certain rms noise level at a certain spatial resolution, a dynamic range, the achieved angular resolution or eventually the translation of these into more technical specifications such as rms phase uncertainties on calibrators, bandpass calibration accuracy, tolerance on side lobe levels in the synthesized beam, etc...

The pipeline must be able to process systematically the quasi totality of the measurements obtained with the array in a fully automated procedure. Its output will constitute a data archive with rather homogeneous properties. However their quality will not necessarily be optimal: human intervention will often be required to enhance the quality of the output. These final results should also be archived too, but in an other base of reduced data. Comparison between these two databases can be very useful to optimize the observing and data reduction procedures. These two databases, being often of much more modest size than the raw database, will be much more easily manageable and accessible through the Internet. This should maximize the use of ALMA observations, e.g. for the preparation of new projects by the proposing astronomers or, when they become publicly accessible, for direct scientific use in a different astronomical context than that of the original proposal.

Pipeline data processing will also enhance the efficiency of interactive observing, either by the astronomer if so requested in the proposal, or by the staff during technical time. The data pipeline will not only provide to the astronomer the possibility of adjusting the observing strategy following the results in quasi-real time, but also of running projects more efficiently in focus with their scientific objectives. High level specifications could actually be given during Phase 2 of the proposal submission procedure. To illustrate this, let us consider a proposal with the following requirements: some wide region of the sky must be imaged in the continuum in the 1.2 mm window; this wide field imaging must reach a certain specified rms sensitivity; all compact sources found above 5 $\sigma$ in that image have to be imaged in pointed mode, one field per source, at higher frequency down to again a certain sensitivity limit such that their spatial morphology can be investigated. Such a high level of specifications implies the need of high level measurement tools as part of the data pipeline such as a source extractor to blindly find sources and their positions; in this case the observing procedure will include several observing modes (mosaicing, pointed observations, multi-frequency) and it will be set dynamically, on the basis of results obtained by the data pipeline during the sequence of observations.

Requirements

Functionalities

Array Calibration

The basic components in the array calibration are the baseline calibration, the pointing, and focus determinations. They must be back fed as soon as possible to be taken into account by the real time system. The operator must have the flexibility to suspend a sequence of activities such as those described in a data reduction procedure and to resume from that state (plus the manual modifications) when the activity was suspended. He or she may change e.g. to an other pointing source or phase calibrator. These actions may have effects on the ongoing observing procedure. It must be possible to modify the level of interaction at any time from fully automated continuous processing to prompts requesting inputs with sensible visible defaults up to prompts with no default (i.e. fully manual).

Calibration of Interferometer Data

The basic components here are phase and amplitude calibrations and their interpolation between calibrator observation. Results (e.g. rms phases and seeing information) must be back fed both to the scheduler and the observing processes. The pipeline must also be able to self calibrate the data when possible.

Calibration of the Total Power measurements

For continuum projects the data pipeline must subtract the atmospheric contribution, in a way that depends of course on the actual observing mode. For line data it must subtract measurements obtained on an OFF position if needed, normalize by gains to scale the data into temperature units; it must also subtract spectral baselines. The pipeline must be able to grid the data for imaging. It must display the results at the various stages. If these total power measurements are obtained by a sub-array while the other antennas are used for the cross correlations for the same target, the calibrations should be able to proceed in parallel such that when imaging both data sets are ready to be combined when the imaging stage begins.

Imaging

The pipeline must produce continuum and/or line images of the calibrated data obtained so far. These images must be visualized, interactively in the case of interactive observations. The pipeline should also be able to compare redundant data (obtained simultaneously or not) to better assess the data quality. It must be possible to feed these interactive measurements back to the scheduler or to the observing process, if relevant. The images should be deconvolved using the most appropriate algorithm; it is desirable to allow several algorithms to compete in case of complex images for which there is no guaranty of a single optimum algorithm. The imaging pipeline must be able to produce images with inclusion of zero and short spacings. In any case it must return information about the robustness of the results in these cases where a unique method is not available.

Interaction with other Actors

The pipeline interacts with a number of actors in the system. It also plays an important role in the sequence diagram for the array activity. The following actors interact with the data pipeline:

The user(s), such as the PI/CoIs who will most frequently be in their home institute. If they want to go beyond the informations automatically provided by the scheduler, they may inspect the output of the pipeline. Their control on pipeline input parameters may be specified at the proposal stage. Fully interactive observations should indeed be well justified because they have a strong impact on the dynamic scheduling, with a risk of lower overall efficiency.

The operators, the astronomers on duty and engineers close to the array will use the pipeline as one of the tools to control the behavior of the system in general and to check that the automated procedures lead to rational sequential events. Because of their expertise they should have the privilege to modify some of pipelines parameter specifications initially provided by the user.

databases: the array will feed automatically the raw data into a low level database. At the same time it will feed the pipeline with these raw data. The pipeline will output its results into one or two databases of much lower volume, the first one with results obtained systematically by fully automated pipeline procedures, and a second one used when manual interaction has been necessary. Manual interaction should proceed on a parallel system to avoid breaking the fully automated procedure, in order to guarantee the homogeneity of the database obtained with fully automated processing.

Other Software Functions: The pipeline will interact with a number of other software functions, such as:


next up previous contents
Next: Archiving Up: ALMA Software Science Requirements: Previous: Operator Interface
Robert Lucas
2000-05-29