[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements



Some late comments.

> A. Two fundamentally new aspects of ALMA are the integrated archive and
>    the pipleline, therefore the impact of requirements on these two areas
>    should be considered.  In particular the Pipeline will be the most
>    critical aspect of ALMA given that we envision both an effective
dynamically
>    scheduled observatory with prompt user feedback mechanism and a
>    scientifically viable archive.

I think this statement is a bit over-done. I think a decent dynamic
scheduler would be possible just by taking into account environmental
factors. Similarly, I think even a raw data archive would be viable,
although of course we want to be able to attract non-traditional observers.

 >     1.0-R1 The Pipelines shall be able to process all data coming from
the
>    array. It must not constitute a bottle-neck in the data flow,
>    meaning that several occurences of the same pipeline shall be able
>    to run in parallel if necessary.

I would add something like: Some projects will require unusually  high data
rates or processing requirements. These will require processing outside of
the ALMA system and will be flagged appropriately so they are not processed
by the ALMA pipeline.

>     1.0-R2 All corrections applied shall be recorded so that any step can
be
>    reversed and redone if needed.

I agree with previous comments that this is harder to say than do.

>     2.0-R1 The Calibration Pipeline shall be activated after each scan has
>            been observed.

Mightn't you want to do this more often for some observations, e.g. for an
OVRO style pointing scan where you want to do a calculation on each point of
the triangle (if I remember correctly). Similarly, you might want to do
something after each raster line during holography. Maybe "observation"
rather than scan?

>     2.0-R2 The Calibration Pipeline may also be re-invoked at any time
with
>    updated parameters or improved data.  The results should not
>    immediately overwrite old results so comparison is possible
>    before adopting the new calibration.  There will need to
>    be a method for validation and acceptance of calibration
>    updates.

In general, do we want to keep old calibrations "forever" and merely "mark"
the current set?

>     R2.1 apply the atmospheric calibration to the data
Does this mean WVR? If so it is probably applied by the online system before
the calibration pipeline.

>     R3.1 compute the phase rms on the scan timescale
scan->observation?

>        2.2-R4 For the pointing and focus measuremets, the fitting results
>               should be automatically stored in the telescope
>               parameter file if the fitting error is less than
>               the user/

It would seem dangerous to allow a user specified threshold determine what
was accepted for current values in the system of things like pointing and
focus. (Do user's even want to know about these things?)

>     4.0-R1 The Science Pipeline shall be activated after completion of a
>    session.

I don't think this is right. It activates after a breakpoint if the user has
requested feedback, after all observations for a source have completed, or
when the program completes. We don't want to have to needlessly repeat the
nonlinear parts.

> 4.1-R3 The Science Pipeline shall check and correct the flux scale by
>        using observations of source of known fluxes. Any effect due to
>        the source being resolved shall be taken into account.

It seems like the second part of this is really an offline requirement.

> 4.1-R4 The Science Pipeline shall compute images for each frequency
>                channel, as well as for the continuum emission:

Does the user have an option to not image, e.g. "edge" channels (to keep
within data rate parameters, for example).

> 4.1-R5 The images shall be deconvolved using the most appropriate
>        algorithm. In case of a complex image, it should be possible to
>        have several algorithms running in parallel, the best
>        (according to criteria TBD) image being eventually selected.

This will lead to an imhogeneous archive, and determining "best" by some
automated procedure may not be easy. We have to decide if we're producing a
"reference" image or trying to produce "the best" image.

>     ? maybe Total power from detectors

If in fact it is not saved with the correlation data, do we normally throw
it away considering it only a debugging tool?

> Should these have some prefix to indicate that they are for Offline, like
> "O-1.0" etc.?

Yes (or embed the section number).

> 1.1-R3 The performance of the package should be quantifiable and
>        commensurate with the data processing requirements of
>        ALMA output at a given time.  This should be benchmarked
>        (e.g. "AIPSmarks") and reproduce accurately results for
>        a fiducial set of reduction tasks.

We could be more explicit here, i.e. take a few fiducial problems and say
that the performance should be greater than some value. I think it's also
important to say that the package must be able to cope with data sizes much
larger than main memory (however it chooses to do it).

> 2.1-R3 Multitasking for all interfaces should be available where
>        appropriate.

A bit vague, maybe: It must be possible to run one or more long-running
calculations in the "background." While background tasks are running normal
interactive activities must be possible.

This brings up the subject of locking:

The package must support locking data files so that there is no possibility
of one process corrupting a file that is also being written to by another
process. The model should be: "one writer, multiple readers."

>         2.1-R6 Multiple levels of "undo" should be supported for all
tasks.
Again, hard. Some operations can be undone readily, others can't (e.g., if
you want to be able to undo a deconvolution you probably have to keep a copy
of the original image!).

> 2.3-R4 All functionality of the CLI must also be available in GUI
>        mode.
Not realistic IMO (unless a CLI typein window counts!)

>         2.3-R5 A graphical data-flow oriented (IDL style) tool assembler
>        would be desirable, perhapsed as an advanced GUI for later
>        development.
These are cute in principle, but they don't seem to be used much in
practice.

> 2.3-R3 The CLI should have command-line recall and editing
Name completion? Minimum match?

> 2.3-R4 All functionality of the GUI must also be available in CLI
>        mode.
This direction I believe!

> 2.4-R1 Must have basic programming facilities such as:
IMO in a scientific command language whole-array arithmetic/processing is at
least desirable.

> 2.5-R2 There should be a variety of help levels and documentation
> [...]
>        These should be in printer-friendly formats.
Does this mean no native HTML?

> 3.1-R8 Comprehensive and understandable processing history information
>        for the data must be maintained and be exportable
What does exportable mean? Just that it's written into COMMENT cards in a
FITS file or something more complicated?

> 3.1-R10 When sorting or indexing is desirable for performance
>        enhancement, this should be carried out in a manner
>        transparent to the user.
I personally prefer to manually "purge" rather than having semi-intelligent
garbage collecting turn on at some random point (usually just when I want to
do something else).

> 3.3-R1 I/O of data must not be a bottleneck for processing, especially
>        for pipeline use.  This is especially true if the native format
>                of the package is not used and filling/conversion is
necessary.
I think this is really a pipeline requirement. (Of course there are low
FLOPS operations in the offline package where I/O will be the bottleneck!).
Again, rather than subjective statements like this I think some objective
tests/times would be better.

>     3.6 Images and other Data Products
Not having to transpose cubes is nice.

> 3.7-R2 Imaging data in standard formats from astronomical instruments
>        at different wavelengths should be importable, with the
>        ability to combine these with ALMA data where appropriate.
>        This should be through a set of widely used formats.

Be more explicit about what you mean by combine. I assume you mean that for
each pixel
output = f(input1, input2, ...) where f consists of the usual mathematical
and logical functions, taking into account blanking.

Blanking support should also be in the requirements: To prevent bad pixel
values from propagating through calculations blanking must be supported.
Usually any calculation that produces a pixel operating on a set of pixels
at least one of which is blanked will result in a blanked output pixel. It
is desirable that blanks not be destructive (the original pixel value is
retained), and it be possible to turn on and off different blanking ("mask")
levels.

> 4.1-R1 The package must be able reliably handle all of the proposed
>        and future ALMA calibration modes, including but not exclusive
>        to temperature controlled loads, semi-transparent vanes,
>        apex calibration systems, WVR data, noise injection,
>        fast-switching calibration transfer, planetary observations.

Several or all of these are more requirements for the online
system/calibration pipeline.

>         4.1-R7 Data editing, calibration, and display of calibration
Besides interactive editing, what about automatic editing?

> 4.2-R4 Redundancy (e.g. same or crossing baselines) should be used
Do we have enough redundant baselines to make this relevant?

> 4.4-R1 Individual data points must be associated with pointing
>        center information, and one must have the ability to
>        deal with complex scanning strategies.
What does this mean? Just that it can be gridded, or something else?

> 6.5-R2 The ability to collapse or integrate over sub-dimensions
>        of data cubes in order to form "moments" is required.
Add: Interactive and automatic facilities (windowing, S/N based blanking,
...) to avoid degrading the S/N in the moment calculations must be provided.

>         7.2-R3 Both contour plots with variously colored and styled lines
>        and false color maps should be possible, it should also be
>        possible to produce RGB overlays (i.e. one layer gets
>        assigned intensity scales of red, another one of green,
>        and one of blue).
While useful, Hue/Intensity/Saturation is probably the more interesting
color "space" to be able to do this in.

Cheers,
Brian