Tue Apr 16 19:09:39 METDST 2002 ---001-------------------------------------------------------------------------- (JoeSchwarz) They are attached. I also solicited input from Andreas Wicenec of the ESO DMD Archive group, and have incorporated his comments. ---002-------------------------------------------------------------------------- (JohnBenson) ---003-------------------------------------------------------------------------- (MasatoshiOhishi) General Comments: In the ITU (International Telecommunication Union), we discuss possible frequency allocations for Satellite Operators in the 40 GHz range and around 94 GHz. I think we need to consider how to monitor and mitigate interferences due to space-to-earth direction transmissions from satellites. It is necessary to make real-time monitoring to avoid unwanted emissions from satellites. Individual Comments for sections 3.6, 3.7, 4.2 and 4.3 (as was suggested by Morita-san): ---004-------------------------------------------------------------------------- (PrebenGrosbol) ---005-------------------------------------------------------------------------- (RayPlante) Hi Robert, As you know, Gianni Raffi invited me to comment on the Pipeline Processing and Archiving sections of the ALMA SSR; you'll find my comments below. Unfortunately, my first comment is a about requirement that is (I think) missing, so I feel compelled to break from the requested format in Part 1. Part 2 refers to existing requirements. I hope my discussion is not too lengthy to be helpful. Your committee's response need not be as detailed. cheers, Ray 1. User Access to Software I feel an explicit requirement is necessary about the usefulness of data in the archive to end users (astronomers); something like, "Astronomers shall have [free?] access to software necessary for processing all data products available from the archive." This may seem obvious (hopefully, I didn't overlook such a statement). It is certainly implied by other requirements in sections 1, 3, and 4; however, I think it's important to spell this out as I think it affects how these other requirements are met (and the cost of doing so). Some of my other comments below hinge on an implicit assumption of this nature. You probably should go even further. Here are some possible riders: * "In particular, the astronomers should have access to all software necessary for duplicating or redoing the [reversible] processing carried out by the pipeline" * "...on their local computing platforms" * the data reduction scripts are locally runnable To the extent that these are true, they should be expressed explicitly. The extent to which they need not be true, that should be stated as well--for example, * restriction: limited to data products in the Observational Archive. * possible exceptions: Atmospheric Model, Quick Look operations * products made available by astronomers (e.g. final images) resulting from local processing are exempt from this requirement. * some projects may require high performance computing to perform the processing; performance requirements (execution time, memory, disk space, etc.) should not be considered when meeting this requirement for such projects. 2. Comments on Existing Requirements ---006-------------------------------------------------------------------------- (TadafumiTakata) ---007-------------------------------------------------------------------------- (TimCornwell) Apologies for being late. I hope that you can consider my suggestions. I appreciate being asked to review these requirements. I found it helpful for our own work here on end-to-end processing of data from NRAO telescopes. I have a few global concerns: - The distinction between the various pipelines is insufficiently motivated. - The goal of processing all standard modes should be qualified by the attachment of quality measures. - There is a tendency to be overly prescriptive in describing how the reduction is to be done. I think that our best practices will evolve as ALMA comes online and the requirements should reflect this. - I think that the distinction between science and technical archives is not worth making, and would probably lead to problems. ---008-------------------------------------------------------------------------- (TonyWillis) Here are my review comments on the current version of the ALMA Software Science Requirements and Use Cases document. I apologize for the brevity of these comments. Unfortunately I've had a lot of distractions over the last couple of weeks. Section 3 - Requirements General comment: It might be useful to ensure that requirements in each section are ordered by priority - i.e. all priority 0 requirements are listed first in each section, then priority 2 requirements, etc. This would help with project planning, software estimation, etc. Section 3.2.1 (2.1-R1) The Technical Interface mentioned here and given Priority 0 is not described further anywhere in this document. I assume that the detailed requirements for this interface will appear in some other (Engineering?) document? Section 4 - Use Cases My main concerns with this document have to do with the Use Cases. The Use Cases would appear to serve two purposes: firstly to define the interactions between users and software, and secondly, to expand on and define more explicitly the software requirements that are given in Section 3. However, the Use Cases have obviously been prepared separately from the Section 3 Requirements and have been written by a number of different authors. Consequently, the exact nature of the primary and secondary roles / actors are not defined in a consistent way. Also, given the way many of the Use Cases are written, it is sometimes not easy to understand which Section 3 tools they refer to. An example of a well-specified Use Case is that in Section 4.7.2 'SetupMultifield'. Here from the Roles/Actors section onward, it is made clear that a user is interacting with the Observing Tool, and we get a good idea of what the Observing Tool should do and how it should behave in this situation. On the other hand, in the Use Case given in section 4.7.1 'SetupSingleField' we are not explicitly told which Tool the User is interacting with; but only what file systems - the Line Catalog, Source Catalog etc. However, something must be acting as the interface between the user and the file systems. This use case also contains references to 'the system' doing this or that. The use case should be specific - which 'system' is the author referring to? I assume that its the Observing Tool, but this should be made explicit. I think the whole Use Cases section should be edited in a consistent manner so that at least in the 'Roles' section it is made clear which of the major tools defined in section 3 is being used. Also, vague references to things like 'the system' should be replaced by explicit references to e.g. the Observing Tool or the Real-time Calibration Pipeline. By doing so, the exact nature and function of some software components will be better defined. For example, an item which surfaces in the Use Cases, e.g. 4.2.3 Validate Observing Programme, is the concept of a validator. Is the validator a component of the Observing Tool? I would assume so, but, for example, there's nothing in the Section 3 Requirements that states that the Observing Tool will perform validations of various types. The Validator also surfaces in Section 4.2.4 (Submit Phase I Observing Proposal) where it becomes part of something called the ALMA Proposal Handling System. So, is the validator a separate system which must interface to both the Observing Tool and the Proposal Handling System? (This Use Case (4.2.4) also mentions something called a Proposal Tool - is this part of the Observing Tool, or something separate? Ah ha - in the later Use Case 4.2.6 its is explicitly mentioned that the Proposal Tool is part of the Observing Tool!) I don't believe that either the Proposal Handling System or the Proposal Tool are mentioned anywhere in Section 3. Until a thorough editing of the Use Cases is done to achieve consistency, the exact amount of functionality required in the Tools defined in Section 3 is hard to quantify. Consequently, it presently seems difficult to estimate the number of people required to design and build, for example, the Observing Tool, over some given time period, because a detailed specification of the components and functionality of the Observing Tool remains elusive. I note that Use case priorities are defined in terms of 'Critical' or 'Major'. These terms do not appear to be defined anywhere in the document, whereas _Priorities 0 through 3 are defined explicitly at the beginning of section 3. However since the Use Cases are ultimately going to be interfaced to tools whose priorities are defined in Section 3, its not clear to me that Use Cases should be assigned priorities. ---009-------------------------------------------------------------------------- (WimBrouw) Comments on Pipeline part of ALMA SSR and Use cases -- version 4 Introduction Add in the introduction that the Science calibration and Imaging will be a sub-set of the off-line data reduction package on the one hand; and that the real-time calibration will be available in the off-line data procesing package. Add that pipeline processing will have parallel streams, with a mixture of synchronous and asynchronous operation Question (in principle for Use cases): will the measured WVR be archived? If not, calibration based on WVR should not be included in off-line package. ---010-------------------------------------------------------------------------- (PrebenGrosbol) p.2 0.0 Change Record On the front page the document has Revision 3 while it in the Change Record has Revision 4. Consistency would be good. ---011-------------------------------------------------------------------------- (JoeSchwarz) p. 24, 4.0-R10 "elected" --> "selected" ---012-------------------------------------------------------------------------- (JoeSchwarz) p. 26, 3.6.1 "NOte" --> "Note" "Occurrence" is misspelled as "occurence" everywhere in this section 3.6.1.1 "make results available to the dynamic scheduler" -- this assumes a certain division of the system into items that include a "dynamic scheduler." Since things like preliminary phase rms will be used to adjust target and calibrator dwell and cycle times, the requirement as expressed would not address this use. In general, it would be better if the authors indicated how they want quantities used, rather than to which (as yet undefined) "part" of the system they should be sent. Are baseline and holography calibrations really "(quasi) real time"? Robert Lucas indicated to the software group that baseline calibrations might be deferrable to a time after scientific observations corresponding to the (new) antenna configuration had been completed--in some cases. Again, "make results available to the sequencer" doesn't say what these results are to be used for. "Sequencer" is not a meaningful concept at the requirements stage. In the same vein, requiring Quick Look Operations to be done during and after an "observing session" doesn't give developers the information they need to make these operations useful. Who is to look at the results? When? What kinds of corrective actions should be made possible by these results? ---013-------------------------------------------------------------------------- (MasatoshiOhishi) p.26 3.6.1.1 Real-time Calibration Operations In Telescope / Array Calibrations, although I could be wrong, do we really need to reduce the holography data ? See also subsection 3.6.3. ---014-------------------------------------------------------------------------- (TimCornwell) Page 26, 3.6.1.1 Pipeline operations The distinction between Science Calibration and Science Imaging is too fine to be worth building in at this level. While calibration and imaging used to be considered as separate items in a waterfall model, a more modern view is that the two are intertwined via the concept of self-calibration. I am similarly dubious about the distinction between real-time calibration and quick look operations. I'd recommend the distinction be between Real-time and Post-observing Processing. ---015-------------------------------------------------------------------------- (WimBrouw) p26 3.6.1.1: Most of the following details in this section are valid for all types of operations. Iso saying 'relates to interferometric', why not have an indicator to exclude the few items that are not valid for single dish or light-bucket operations. ---016-------------------------------------------------------------------------- (WimBrouw) p26 3.6.1.1 RT cal operations - Some of these are 'real' RT, i.e. they have to be done before observing can proceed, since errors will remove information (like focus; delay; offset pointing); others are not (holographics; baseline tests) - The Atmospheric model is RT; some of the astronomical ones could be (atmospheric calibration idf that is the one based on the atmospheric model); others are definitely not (bandpass calibration; prelimin phase and ampl cal) - Is the 'preliminary phase rms' purely based on atmospheric information? If not, what is its purpose for the dynamic scheduler (since the phase noise will depend on baseline; ampl; atmosphere, .. - Can it be indicated which one are 'await result RT', and which are not? - Indicate that this processing should include, but not be limited to the list given - Should results be made available to sequencer (and/or scheduler) or should they be archived, and be available through archive? ---017-------------------------------------------------------------------------- (WimBrouw) p26 3.6.1.1 QL operations - Occurrence: after observing session, and during observing session at tbd intervals - Monitoring-- done through archive access of cal data? - Quick calib -- apply should be before resample/integration; also what is the calibration done here -- oh, I see apply only; not do. Maybe some terminology to indicate active and passive use of the verb 'to calibrate' should be used. Maybe use here 'correction' rather than 'calibration' - Quick imaging: rather than 'no deconvolution'. I would think 'subset of data' is more important (specified beforehand e.g. for spectral line to integrate channels with expected result - Display tools: display 'information about' current observations. Display of monitoring information and non-imaged data (e.g. baseline-time amplitude images) and statistical extractions are more important than 'images' (especially for non-SNR=100 single sources) ---018-------------------------------------------------------------------------- (JoeSchwarz) p. 27, Science Imaging Operations Accessing the archive for previous observations, producing and archiving deconvolved images are expensive to do "after completion of observing session." My understanding from the SSR discussions was that these would be done, *at most* (and this is what is implied by requirement 3.1-R13) after breakpointsand (of course) when the project (or at least observations of the target) is complete. This requirement is also in conflict with Use Case 4.5.3 on p. 73. 3.6.1.3 It seems as though "instrumental" calibrations were called "telescope" calibrations in 3.6.1.1. The change in terminology is confusing. Although bandpass calibrations do not require a time interpolation, and can therefore be "immediately derived and stored, to be applied to all following observations," 3.6.1.1 says it's "derived or applied" after completion of observing sessions. This seems inconsistent to me, and is another example of how the requirements can get confused when they contain implementation considerations instead of real requirements. ---019-------------------------------------------------------------------------- (TadafumiTakata) p27 3.6.1.1 Pipeline Operation Science Imaging Operation Comment Accurate positional calibration may be needed in producing deconvolved images, which are most popular data for general archive users. They have to include coordinate information like FITS WCS headers which can be interpreted by image browsers to be provided to ALMA users etc. (Please ignore this comment if it is already included in the pointing and/or instrument calibration before producing science images.) ---020-------------------------------------------------------------------------- (TimCornwell) Page 27, 3.6.1.3 Calibration I don't think that one can draw a line between items requiring interpolation and those not. All items must benefit from interpolation (or modeling in time and space). Another important and related factor is quality control: one needs access to a time-series to allow discovery of errant values. The key point here is whether one must wait for all relevant values before making a prediction of calibration values at some point in time and space. ---021-------------------------------------------------------------------------- (WimBrouw) p27 3.6.1.1 Science cal operations (science correction(?)) - Why after completion of session? Why not also require option during session. I could think of a program; and hence an observing session, observing many individual fields as snapshots for getting to know some parameters of a series of 'sources'. Run it in between - derive and/or apply: bandpass (not a curve according to other places but fixed; a session could have many bands and places in bands; and hence many bandpasses - what is 'final' phase and ampl calibration: cannot eb precise enough ijn pipeline with 'limited or no deconvolution and selfcal). Or is 'final' as far as the pipeline is concerned? - flux scale is that derived here? ---022-------------------------------------------------------------------------- (WimBrouw) p 27 3.6.1.1 Science imaging - should 'produce temperature-calibrated visibilities' be in previous step? (btw why day 'uv tables',is implementation - is 'deconvolved' true here? If so; there should be an extensive (self-)calibration step in scince calib operations to be able to derive that. ---023-------------------------------------------------------------------------- (WimBrouw) p27 3.6.1.2 - astronomical 'source': a project can have many sources (list) or a large (mosaiced) field. Mention that source here is not the normal single, limited source. ---024-------------------------------------------------------------------------- (WimBrouw) p27 3.6.1.3 - instrumental: indicate which ones (as mentioned above) are part of the RT-wait set; and which ones are not - no time interpolation: I am not sure it is a good idea to give these examples as 'time-invariant' calibrations. There are quite a few schemes where it is easily possible to have to interpolate e.g. bandpass (filter poles can be very T sensitive; if receivers are going to apply diurnal Doppler); bandpass could also have a phase error. I would leave it open. Also: bandpass and pol are non time-critical: Also, indicate listsinclude but are not limited to) ---025-------------------------------------------------------------------------- (WimBrouw) p28. Finally ... average of observed...' I.e. leave out 'the', and indicate that the average will be done outside the pipeline operations (there could becontaminated channels e.g.) ---026-------------------------------------------------------------------------- (JoeSchwarz) p. 29, 3.6.2-R2 What does "data-driven" mean here? 3.6.2-R3 The meaning of, and need for, "templates" isn't clear. Isn't this more of a design issue for the people doing the observing preparation tools? 6.2-R4 I understand a "pipeline" as something that, once started, runs automatically. In contrast, "tools" are usually instruments for doing some task that one wants to direct interactively. "Automatic flagging tools" sounds like a contradiction to me. I think you mean that certain data should be flagged automatically. 6.2-R6 This requirement is unreasonable, in my opinion. You can't repeat a series of previous operations and you can't resort to a copy of the dataset at an intermediate state. "Sufficient recording..." then means that results of every step have to be saved -- but not as a "copy of the dataset". Maybe I misunderstand, but I think that a straightforward interpretation of what's written here will produce a very heavy and expensive system. I think the kinds of "steps" that can be reversed and redone should be spelled out in more detail. 6.2-R9 I don't understand. Doesn't the quantity to be calibrated determine whether it's baseline- or antenna-based? (Maybe this is a stupid question, but the Aperture Synthesis Summer School isn't until after the deadline for comments.) ---027-------------------------------------------------------------------------- (MasatoshiOhishi) p. 29 3.6.2 Pipeline general requirements In 6.2-R4, the degree of interference should be added as a condition to flag data. ---028-------------------------------------------------------------------------- (PrebenGrosbol) p.29 6.2-R3 'through readable and comprehensible data reduction scripts' Sounds good but I think there are additional points, namely: a) if observing scripts exists it would be better to use the same control language for both observing and reduction scripts, and b) in order not to be too dependent on a specific pipeline engine the script language must be independent of any specific processing system (ref. Reduction Blocks for the VLT). That is the script should provide the flow control and specify the reduction tasks to be executed while the engine just should execute these tasks as best it can. Further, there is a verification issues (like for observing scripts) if users are allowed to change them. ---029-------------------------------------------------------------------------- (PrebenGrosbol) p.29 6.2-R6 'Sufficient recording ... shall be carried out so that any step can be reversed' is a very general requirements. Some processing is difficult to reverse and often it is simpler to save some intermediate results. Recording all reduction steps is clearly needed but already covered by 6.2-R5. ---030-------------------------------------------------------------------------- (PrebenGrosbol) p.29 3.6.2 Pipeline General Requirements I am missing some requirement on that the engine used for pipeline processing should be well separated from the ALMA system i.e. it should be possible to replace the pipeline engine without any significant change in the ALMA software. Although this partly is a design concern since one would not like to have the ALMA software depend on an alien, uncontrolled package, it is also a science concern. One would like to enable anybody to contribute useful pipeline modules to ALMA and not limiting them to software written in a specific system. Further, with a lifetime of decades for ALMA it is likely that it will outlast one or several pipeline engines. ---031-------------------------------------------------------------------------- (RayPlante) p. 29, 3.6.2, 6.2-R6: (Reversability of processing) I think the general aim of this requirement is necessary: we want the ability to remove or substitute any part of the processing. In particular, we want to be able redo it with small modifications to the parameters. However, the wording of this requirement concerns me when I consider its strict application to complex processing; some clarification may be helpful. This requirement places a corresponding requirement on the off-line software (assuming point 1 above), the data formats used, and on what products must be stored in the archive. Should one be able to back up an arbitrary number of steps in processing ("with out resorting to a copy at an itermediate state")? What constitutes a step? Suppose target source has been self-calibrated with multiple loops; does reversability mean that we have to keep the generated gain table and image models for each loop? Taken to its extreme, this requirement could be pretty costly: * if all datasets required for step-wise reversability, then the archive must organize and label these products in a way that they make sense to the user. * the data formats must retain sufficient information for backing up. * for every processing step implemented, its reversal must also be implemented. The cost is greater if this capability must be available to the end astronomer (see 1 above). If we only mean to apply this requirement to the extent already supported by current packages, we may be okay here. If we only require needing to make one step backward at any given step in the chain, we might be okay. The cost is of particular concern if it's an unnecessary one. Thus, I'm curious as to the intent of the clause "without...resorting to a copy of the dataset at the intermediate step." Redoing processing by going back to an intermediate product is often the easiest way to "back out" (as opposed to reversing multiple steps in turn). If the processing script is available (which is required) and the assumption from 1. (above) is true, then this is straight forward. This requirement could be clarified in the context of the definition of ranked data products in the archive. For example: Level 0 -- raw data: has no/certain/all real-time calibratation operations applied Level 1 -- calibrated data: has science calibration applied Level 2 -- deconvolve images: has Science imaging operations applied Level 3 -- final, user-supplied (locally processed) data products Then, say, if one starts with the data products at a given level and apply the scripts that produce the next level stepwise, it should be then possible step backwards at any point in the chain. By the way, defining ranked data products will help the user understand what they need from the archive. They can easily choose what level of processing they want to accept and know what products they, therefore, need to retrieve. ---032-------------------------------------------------------------------------- (RayPlante) p. 29, 3.6.2, 6.2-R8: "Sequencer" -- you probably need an entry in the Glossary for this. I was a little unclear about what it does. According to my pdf reader, this is the first use of this term in the document; although, its function may be made clear in the Use Cases (which I did not study as well). ---033-------------------------------------------------------------------------- (TadafumiTakata) p29 6.2-R1.1 Comment Is there any need to have a feed-back (like result images and so on) from this operation ? If so, some tools for helping the process to reflect the result to ALMA archive should be necessary. ---034-------------------------------------------------------------------------- (TimCornwell) Page 29, Pipeline General Requirements I think that the goal of being able to process all data from the array in standard modes is too ambitious, certainly so for a telescope yet to be built. For the EVLA, VLBA, and GBT pipelines, we are planning to attach a quality measure to pipeline results. This adds an important level of qualification. A division might be: Meets all quantified scientific goals Meets some quantified scientific goals Meets none of the quantified scientific goals A priority 2 requirement could be added that the pipeline must process all data from all standard modes to the highest level of quality. The _ priority 1 requirement is to process all standard modes and attach a quality measure. The requirement that the pipeline not constitute a bottleneck is a fine sentiment but I'm not sure who it is directed to? - TAC, operations? ---035-------------------------------------------------------------------------- (TimCornwell) Page 29, 6.2-R2: Some default parameters (cellsize, field of view, calibration methods, etc) should come from a standards database that is under change control. ---036-------------------------------------------------------------------------- (TimCornwell) Page 29, 6.2-R3: The second sentence is an unnecessary implementation detail. For NRAO pipelines, we are generating scripts from production rules encapsulated as make files. This is arguably superior to templates. In any event, it's not necessary to state how the scripts will be generated. ---037-------------------------------------------------------------------------- (TimCornwell) Page 29, 6.2-R6: Redoing is not the same as reversing. I'd recommend removing the reversing part. To redo, one simply needs checkpoints. To reverse, one needs much more. I think this requirement is close to requiring the ability to undo operations, which is very expensive. ---038-------------------------------------------------------------------------- (TimCornwell) Page 29, 6.2-R9: I think this requirement is not necessary. It is true that antenna-based calibration is better than baseline calibration for effects that really are antenna-based. However, it almost certainly is true that physical modeling of antenna phases by e.g. time and space parameterized phase screen is superior to antenna-based calibration. Perhaps the requirement should state that "Best practices" must be following in calibration? ---039-------------------------------------------------------------------------- (WimBrouw) p29 6.2-R1.1 This holds for science data; I would suggest that the pipeline always handles 'active calibrater' data ---040-------------------------------------------------------------------------- (WimBrouw) p29 R3: shall operate through 'automaticly generated ' readable ... Drop the second part (implementation) ---041-------------------------------------------------------------------------- (WimBrouw) p29 R4: 'step' is undefined; say: the pipeline shall include automatic flagging of data. Do not use 'discard' (bypass? not use). ---042-------------------------------------------------------------------------- (WimBrouw) p29 add: R4.1: Flagging should be multi-level to enable selctive re-use of automatically flagged data; or the automatic flagging must be reversable in the off-line stage using identical algorithms ---043-------------------------------------------------------------------------- (WimBrouw) p29 R5: is 'output' archiving (should be)? ---044-------------------------------------------------------------------------- (WimBrouw) p29 R6: yes; but add : '.. reversed and redone, taking into account the order of operations and the fact that some of them are non-commutative, and have to be 'done and redone' ...' ---045-------------------------------------------------------------------------- (WimBrouw) p29 R7 and R2 are mutually exclusive. Maybe R7 could be rephrased slightly? ---046-------------------------------------------------------------------------- (WimBrouw) p29 6.2-R9: talking about Ampl and phase only? R9: could you give one or more examples were baseline calibration (of what) is required? ---047-------------------------------------------------------------------------- (JoeSchwarz) p. 30, 3.6.3.1 "Astronomical Calibration: Atmospheric Model" -- this terminology is inconsistent with that of 3.6.1.1, where "Atmospheric Model" and "Astronomical Calibration" are listed as separate categories. ---048-------------------------------------------------------------------------- (MasatoshiOhishi) p. 30 3.6.3 Real-time Calibration Operations In 6.3-R1 it says, "The real-time Calibration Operations shall be activated AFTER each observations." However in page 26, in subsection 3.6.1.1, it says, "Real-time Calibration Operations occur (quasi) real time." I feel these are inconsistent. ---049-------------------------------------------------------------------------- (PrebenGrosbol) p.30 6.3-R1 '... after each observation' It sounds too strong for me. I would have expected that only after calibration observations the calibration Operations are done. Normally, calibrations would be done after each science observation which then would trigger the calibration operation but the current requirement is stronger. ---050-------------------------------------------------------------------------- (PrebenGrosbol) p.30 6.3.2-R2 'convert the raw data into temperatures, or, alternatively, store the conversion factor' There seems no reason to give an alternative - either one or the other. I would prefer the latter since this would store the raw, acquired data with the factor and not apply it. ---051-------------------------------------------------------------------------- (RayPlante) p. 30, 3.6.3, 6.3-R2: "Whenever the results ... allows to identify" ==> "Whenever the results ... allow one to identify" ---052-------------------------------------------------------------------------- (TadafumiTakata) p30 6.3.1-R3 Comment The resultant parameters of model calculation should be referred by users (including astronomers) via "The Data Extractor Tool" or so for evaluating how the data are affected by this parameters. (I think it is already included in enviromental information extraction.) ---053-------------------------------------------------------------------------- (TimCornwell) Page 30, 6.3-R2: I think this should be priority 2. It's going to be very hard to do from the real-time calibration. An equivalent goal for the post-observing _ processing is possible and should be priority 1. ---054-------------------------------------------------------------------------- (TimCornwell) Page 30, 6.3.2-R3: This seems too prescriptive. What is described is the current best practice for mm arrays. This may not be the best practice for ALMA. Does one want to bind ALMA to work this way? ---055-------------------------------------------------------------------------- (WimBrouw) p30 6.3-R2 '...after?? ..' RT calib is necessary before some data can be taken (atmospheric model). Certainly an observing session should also end with one. If you do flag; multi-level flagging is even more important. ---056-------------------------------------------------------------------------- (WimBrouw) p30 6.3.1-R1 ...The prediction will be based on measured atmospheric data, including, but not limited to: ---057-------------------------------------------------------------------------- (WimBrouw) p30 R1: what is 'line-of-sight'?: on a per antenna basis (what I would assume, certainly for higher frequencies and longer available baselines); and does it assume some isotropy and/or weighting over the HPBW FOV? ---058-------------------------------------------------------------------------- (WimBrouw) p30 6.3.1-R3 already covered in R1? If not, what is different? ---059-------------------------------------------------------------------------- (WimBrouw) p30 6.3.2-R1: 'store the results'? or 'archive the results' ---060-------------------------------------------------------------------------- (WimBrouw) p30 R1.*: a time-constant indication is necessary here. It is a large variety of items (and again the list ois probably not complete over life-time of instrument). I think 1 and 3 are per observation normally (or even longer). R2 could be fast varying?? (probably depend on length of observation as well); R4: is that in bore-direction (is really long-term to determine correctly ). The polarised antena pattern is something measured annually maybe. ---061-------------------------------------------------------------------------- (WimBrouw) p30 6.3.2-R2: alternative is incorrect. Tsys should always be stored (=archived). The alternative is the deferment of the conversion, not the storage. ---062-------------------------------------------------------------------------- (JoeSchwarz) p. 31, 6.3.2-R3.4 "...passed to the Dynamic Scheduler..." Again, it is difficult to understand how this data is to be used. Scheduling decisions are typically made on the timescale of an SB (nominally 15 minutes-1/2 hour, in order to take account also of the time needed to bring a new receiver band to a ready state). So what's needed here? Averages? Instantaneous values? Predictions of how these values will evolve over the next 1/2 hour? ---063-------------------------------------------------------------------------- (JoeSchwarz) p. 31, 6.3.4-R2, How are the baseline calibrations to be used by the "Sequencer"? ---064-------------------------------------------------------------------------- (MasatoshiOhishi) p. 31 3.6.3.4 Telescope / Array Calibration In 6.3.4-R4, do we really need to reduce the holography data real-time ? ---065-------------------------------------------------------------------------- (PrebenGrosbol) p.31 6.3.3-R1 '... and pass the results to' For me 'pass' suggests something active i.e. the pipeline explicitly sends the information. I would prefer 'made available' only. ---066-------------------------------------------------------------------------- (PrebenGrosbol) p.31 6.3.3-R2 Same comment as for 6.3.2-R2 ---067-------------------------------------------------------------------------- (PrebenGrosbol) p.31 6.3.4-R1 'must be passed or ...' Same comment as for 6.3.3-R1 ---068-------------------------------------------------------------------------- (TimCornwell) Page 31, 6.3.4-R2: "baselines" is jargon. I'd change this to "antenna locations". ---069-------------------------------------------------------------------------- (WimBrouw) p31 6.3.2-R3.3 : How can you have different parameters for the correction that is done on a fast basis before integration over samples? ---070-------------------------------------------------------------------------- (WimBrouw) p31 R3.4: what do you gain by doing this operation per baseline?? ---071-------------------------------------------------------------------------- (WimBrouw) p31 6.3.3: Drop; or indicate again the earlier comment ---072-------------------------------------------------------------------------- (WimBrouw) p31 6.3.4-R1: why explicitly made available to sequencer (earlier comment). Let sequencer determine what it needs: just archive/log it. ---073-------------------------------------------------------------------------- (WimBrouw) p31 6.3.4-R1.1: indicate (or in a more general comment) if it handles on offset pointing here (as I would assume). ---074-------------------------------------------------------------------------- (WimBrouw) p31 6.3.4-R4: should piepline handle this? Why? Mybe make it more generic: ' The pipeline should be able to accomodate plug-ins for the handling of special observations like holography; absolute pointing model generations; baseline determination etc' ---075-------------------------------------------------------------------------- (JoeSchwarz) p.32, 6.3.4-R5 Is the derivation of the primary beam properties from planets, etc., really a "real-time" operation? ---076-------------------------------------------------------------------------- (JoeSchwarz) p. 32, 6.3.4-R6 "...passed or made available to the Sequencer..." Once again, we need to know how these results are to be used, and on what timescale. Even if we accept that there will be a "Sequencer" in the system, we have no "Sequencer Requirements" chapter (and I hope we don't write one!), so we need to know *what* is to be done with pointing, focus, skydip, etc. Are they to be used to correct the control system's pointing model so that the antenna points where it's supposed to, or rather to correct the data already taken, or both...? ---077-------------------------------------------------------------------------- (PrebenGrosbol) p.32 6.3.4-R6 'must be passed or ...' Same comment as for 6.3.3-R1 ---078-------------------------------------------------------------------------- (WimBrouw) p32 6.3.4-R5: drop the 'and aperture efficiency' Add 'squint' to the list; or jusdt make it 'generic beam parameters like ...' Should this also be a plug-in? I assume this is not done on a daily basis. ---079-------------------------------------------------------------------------- (WimBrouw) p32 R6: drop ---080-------------------------------------------------------------------------- (JoeSchwarz) p. 33, 6.4-R2 This seems too vague to me. How much data should be made available to the PI's over the Internet? When? In near real-time? Suppose the PI's aren't available or are asleep? It's entirely unclear how this data is to be used, e.g., whether we're talking about letting the PI play operator from his/her institute in Europe or America or just check on what an image looks like from time to time... ---081-------------------------------------------------------------------------- (PrebenGrosbol) p.33 6.4-R1 'shall be activated automatically after each ...' That is after each observation which may be too much. I would weaken the statement and not make it mandatory but configurable. ---082-------------------------------------------------------------------------- (PrebenGrosbol) p.33 6.4-R2 '..., via the Internet' It sounds nice but it may give a bandwidth problem if lots of people tries to get Quick Look data like images. ---083-------------------------------------------------------------------------- (WimBrouw) p33: 6.4.1-R3.4: integrated? Since the seesion will contain 'simple scans' in frequency and/or position (mosaicing..) difficult to define exactly here. Just say 'indication of above over seesion' or so r3.5: noise per pointing? ---084-------------------------------------------------------------------------- (WimBrouw) p33 3.6.4.2/3: See it as an indicator of what should be done at the minimum. A good list. Maybe add to R3: 'TAKING INTO ACCOUNT FLAGGING' ---085-------------------------------------------------------------------------- (JoeSchwarz) p. 34, 6.4.2-R4 "Mosaic and self-calibration projects shall be supported." We need much more information than this to understand this requirement, and hence to know whether we have fulfilled it! What kind of Quick Look Operations are appropriate here? ---086-------------------------------------------------------------------------- (MasatoshiOhishi) p. 34 3.6.4.3 Data Processing: Single Dish Data In 6.4.3-R1.1, "on/off " could be replaced by "position switch". ---087-------------------------------------------------------------------------- (TimCornwell) Page 34, 6.4.2-R3: The mention of the Fourier transform is too prescriptive, and is unnecessary. It is too prescriptive because it is entirely possible that we might end up using some efficient linear algebra technique to make images quickly. It would be better to require something like along the lines that any image be available in a time comparable to a typical observing block. ---088-------------------------------------------------------------------------- (TimCornwell) Page 34, 6.4.2-R4: More explanation is needed: does the Quick Look pipeline just process an observing block or all relevant observing blocks? E.g. in a mosaicing experiment, does the QL pipeline just image the last patch of the sky observed? This is also a problem for self-calibration. ---089-------------------------------------------------------------------------- (TimCornwell) Page 34, 6.4.2-R5. This is too prescriptive and probably wrong: one is better off doing this in the visibility plane. I'd make this a best practice requirement. ---090-------------------------------------------------------------------------- (WimBrouw) P34 6.4.2-R4: mosaic yes; self-calibration not as stated (see R3; and many earlier remarks about 'no or limited'. Should be limited to make it a non-bottleneck ---091-------------------------------------------------------------------------- (WimBrouw) P34 R5: Compare to 'clean-beam'?? I do not get this at all. Maybe state: For ...the pipeline shall use the data to produce an estimate of the seeing.' (and pointing??) ---092-------------------------------------------------------------------------- (WimBrouw) p34 6.3.4-R1: a question: no statement about 'supported modes' is made for synthesis. Switching, OTF etc could be there as well. Should be added somewhere? ---093-------------------------------------------------------------------------- (JoeSchwarz) p. 36, 6.5-R1, R2 The term "session" is used too imprecisely here. Isn't it possible that some data preceding and following a session will be relevant? If, for example, a baseline calibration was done by the project that was executing 5 minutes before the current one, do we really want to repeat it? I imagine that there is other calibration data coming from outside the "session" that could be useful, too. I would also change "shall find in the Archive all data..." to "shall use all data...". Where this data comes from (whether it's in the Archive, or cached in memory, or whatever) is irrelevant from a requirements point of view. ---094-------------------------------------------------------------------------- (MasatoshiOhishi) p. 36 3.6.5.2 Single Dish Data In 6.5.2-R1.1, "on/off " could be replaced by "position switch". ---095-------------------------------------------------------------------------- (TimCornwell) Page 36, 6.5-R2: Can and should the Science Calibration Pipeline make use of historical observations which are not part of the project? ---096-------------------------------------------------------------------------- (WimBrouw) p36 6.5.1-R1: The way the bandpass and time-variations are given here does not indicate that the requirement could be for multiple bandpasses (depending on type of session) and time-ferquency variations. ---097-------------------------------------------------------------------------- (WimBrouw) p36 R2: should the pipeline 'observe' such a source if none available?? ---098-------------------------------------------------------------------------- (WimBrouw) p36 R3: why not drop R3.1 and make it just amplitude and phase corrected for the appropriate frequency ('scaling' could be to simple depending on how the delay is done as a mixture of time-delay and phase rotation?? ---099-------------------------------------------------------------------------- (JoeSchwarz) p. 38, 6.6-R1 "subproject" is not defined anywhere in this document. ---100-------------------------------------------------------------------------- (JoeSchwarz) p. 38, 6.6-R4 Whether the Internet is an appropriate delivery medium for Science Imaging results regardless of their size seems to me debatable. If a PI has been waiting for six months for a project to complete (because, for example, it requires different antenna configurations), why can't he/she wait a few more days for the delivery of a DVD? I think that this decision should be made in view of a) how urgent the project is (a basis for coordinated or follow-up observations?); b) how much data is involved (Alma will produce 180 Tb/year, so it's not impossible that we might occasionally have datasets that are significant fractions of a Terabyte); and c) what the real capacity, cost and reliability of the Internet is in the Alma epoch. ---101-------------------------------------------------------------------------- (JoeSchwarz) p. 38, 6.6.1-R1, R2 "...find in the Archive..." Again, where it comes from isn't relevant. ---102-------------------------------------------------------------------------- (JoeSchwarz) p. 38, 6.6.1-R3, R4 If the flux scales are different, what is to be done? What are the possible consequences of "direct comparison of the redundant data"? ---103-------------------------------------------------------------------------- (JoeSchwarz) p. 38, 6.6.1-R7 I suggest that having "several [deconvolution] algorithms running in parallel" is better done as an offline, rather than an online task. If it *must* be done as part of the production pipeline processing, I don't see how it can be met as a "priority 1" requirement, which, as Page 12 of this document defines it, means: "Must be there for Interim Science period, when the system is commissioned to produce meaningful science results." ---104-------------------------------------------------------------------------- (MasatoshiOhishi) p. 38 3.6.6.1 Interferometric Data In 6.6.1-R7, it would be worthy to add a new CLEAN algorithm, the Wavelet-CLEAN, developed by Japanese group. And it would be very important how we judge the image quality. Which is the best ??? ---105-------------------------------------------------------------------------- (PrebenGrosbol) p.38 6.6-R4 '..., via the Internet' Same comment as for 6.4-R2 ---106-------------------------------------------------------------------------- (TimCornwell) Page 38, 3.6.6 Science Imaging Operations There is a potential problem in all pipeline processed images: the provenance of each scientific result must be knowable and straightforward. If a virtual observatory is to make use of ALMA results, there must be an ALMA standard product that was produced in some standard way without strange choices for e.g. cellsize, field of view made by the observer. By this logic, one is forced to produce at least two results: the "standard product", and that required by the observer. In many cases, the observer may just ask for the standard product. This whole aspect of the scientific imaging pipeline must be clarified. It affects a large number of the requirements following. My specific recommendation would be that two products are produced: the standard product defined by ALMA, and the observer's product, defined by the observer. Another question that must be resolved is whether results are processed only before insertion into the archive or also on exit from the archive (e.g. triggered if best practices have changed). This is a hard question to answer. This is discussed in requirements below but I think it needs to be thought through a bit more. Finally, is the observer free to use the pipeline repeatedly or is the processing limited to that specified in the observing setup? If the former, how is time allocated? I think the latter is therefore preferred. ---107-------------------------------------------------------------------------- (TimCornwell) Page 38, 6.6-R2 In many cases, the final product could be a linear mosaic instead of a deconvolved image. I'd therefore remove the second sentence. ---108-------------------------------------------------------------------------- (TimCornwell) Page 38, 6.6-R3 Can relevant non-proprietary data from other projects be included? ---109-------------------------------------------------------------------------- (TimCornwell) Page 38, 6.6.1-R3 Phase centers and polarization frames must also be checked. ---110-------------------------------------------------------------------------- (TimCornwell) Page 38, 6.6.1-R6, 6.6.1-R7. If ALMA works as well as expected, the extra images requested here will be unnecessary: for example, the various weightings should be less divergent than for the VLA. Similarly for the deconvolved images. Also there is a combinatorial explosion possible (weighting x deconvolution x other parameters). I'd recommend that the Standard Product be just one image. ---111-------------------------------------------------------------------------- (WimBrouw) p38 6.6.1-R5: There is no 'continuum measurement' ---112-------------------------------------------------------------------------- (WimBrouw) p38 R2: 'compatible'? ---113-------------------------------------------------------------------------- (WimBrouw) p38 R5: 'appropriate'? I would think that the data is used to produce the rquired image cube; not the other way around. ---114-------------------------------------------------------------------------- (TimCornwell) Page 39, 6.6.1-R9 This is really a call for more development of automated methods for identifying and removing continuum. In the absence of any known method, one cannot require that it be used! ---115-------------------------------------------------------------------------- (WimBrouw) p39 6.6.1-R7.3: Add: 'model fitting and data subtraction' ---116-------------------------------------------------------------------------- (WimBrouw) p39 R8: do not specify both the domains. Why? I would think that image-plane subtraction is a non-option in general. ---117-------------------------------------------------------------------------- (JoeSchwarz) p. 40, Section 3.7 in general It is likely to be very difficult to construct a satisfactory (User-) Archive without more information about its intended use. It seems to me that statements like "all data taken by the array is archived" are somewhat irresponsible. Certainly, we don't want to lose anything important, but this shouldn't be used as an excuse to not think about what should be in and what should be left out. It's not just that storage space is wasted, but that packing the Archive with data that's not needed makes it more difficult to manage and organize the data that *is* needed. ---118-------------------------------------------------------------------------- (JoeSchwarz) p. 40, 7.1-R2 The distinction between "observational" and "technical" archives is never exploited and is often blurred (scripts, for example, seem to be included in both). What is needed is specification of what uses will be made of what kind of data. Then the Archive designers can decide where best to put it to facilitate these uses. ---119-------------------------------------------------------------------------- (JoeSchwarz) p. 40, 7.2-R2 "The observational archive shall also include as header data... technical data..." So, again, why have we defined a "technical" archive that is to hold "all technical data"? ---120-------------------------------------------------------------------------- (JoeSchwarz) p. 40, 7.2-R5 "...extract the database information for efficient data search from the header..." What does this mean? That certain information should be indexed? Why not specify what kind of performance is desired (i.e., what "efficient data search" means) and let the Archive designers worry about how to get it. ---121-------------------------------------------------------------------------- (JohnBenson) p.40 7.0-R1 I think you should also archive the calibrated/flagged data from the pipeline. This is probably what the observers will want distributed to them anyway.. ---122-------------------------------------------------------------------------- (JohnBenson) p. 40 7.0-R2 In order for dynamic scheduling and automated pipeline processing to work, the system needs to have a quantitative, parameterized description of the observers scientific goals. Things like sensitivity limits, image fidelity requirements, maybe spectral range and resolution in km/sec... ---123-------------------------------------------------------------------------- (MasatoshiOhishi) p. 40 3.7.2 Observational Archive In 7.2-R2, item 4, wrong reference to 1.3-R2 ? It might refer to 2.3-R2 in page 17. ---124-------------------------------------------------------------------------- (PrebenGrosbol) p.40 7.2-R1 '... shall include raw data, header information, ...' All these items should be in the archive but I was missing things like catalogs of line data or calibration sources. Where are they going to be? ---125-------------------------------------------------------------------------- (TadafumiTakata) p40 7.1-R1 Comment Archive system should enable astronomers and engineers to know the status of data handling. It is like "supervisor of data handling". On managing side, it is very useful to know where the data is at that time such as in the way of pipeline processing, archiving, or in the way to RSC or so, especially in the trouble around data handling. The most important role of archive system is let users know everything about all data. In the distributed database environment such as ALMA, which has several copies of database in each RSC and so on, tracing the problem in data handling is very complicated work and this supervisoring function (like FEDEX managing system,,,,) may be useful in various sites of ALMA. On user side, such as astronomer(observer and/or support astronomer, observation operator) may want to know the status of their observational data such as what step their data are in pipeline processing and so on. For astronomical requirement, the trace of dataset processing may be better for users. ---126-------------------------------------------------------------------------- (TadafumiTakata) p40 7.2-R4 Comment The submission of offline reduced data by users should be performed using user-friendly GUI and submission process should have the function to make a link from submitted data to the original raw data or dataset for effective use of these data by archive users or so. ---127-------------------------------------------------------------------------- (TimCornwell) Page 40, 7.2-R2 Throughout this section, the phrase "header information" is used. This seems to indicate to me that every item retrieved must have the extensive information attached. In the first sentence here, I'd just remove "as header data". ---128-------------------------------------------------------------------------- (TimCornwell) Page 40, 7.2-R3 I cannot think of any reason why one would want to let the user make an irreversible choice like this. I'd remove the last sentence "This may be overridden.". ---129-------------------------------------------------------------------------- (TimCornwell) Page 40, 7.2-R5 I have no idea what this means! ---130-------------------------------------------------------------------------- (WimBrouw) p40 3.7.1: I would suggest to have the VO requirements R1 and R3 (7.5) in the introduction. They are essential requirements from the science point of view ---131-------------------------------------------------------------------------- (WimBrouw) p40 7.1-R2.1 : add: Access and interface to the two archives should be compatible R2.2 : add: Searching in the archive should be of O(1), or O(lnN) at most. ---132-------------------------------------------------------------------------- (WimBrouw) p40 7.2-R3: I think it is wrong in principle to let the user (I suppose the observer here) decide what should be archived. The value of an archive lies in having information of which it is a priori unknown if it will ever be used. Either always archive both (preferred IMO) or let the ALMA operations at some stage decide which one will be archived from then on ---133-------------------------------------------------------------------------- (WimBrouw) p40 7.2-R4: This isuues raises the questions of archive management (which should maybe be raised in use cases) and of 'super-headers' ';linking' etc. In cases of final products and/or papers would it be advantageous to have the headers of observations used point to these outcomes (and the outcome of course point to the observations). Both would probably best served by having a 'supra' or 'virtual' header describibg the sum of observations used; and the results (with, if at all feasable a pointer from low-level to high level (or linked-list)). Note that one observation can be part of many super headers. Probably this requirement is one for the introduction. ---134-------------------------------------------------------------------------- (WimBrouw) p40 7.2-R.5: Like I mentioned above, I think eficiency should be stated as a general requirement, and not give a 'solution'. I could easily imagine that there will be sub-headers (e.g. log or linkage to others, or because knowledge increases) which should be searchable. Drop this one, and replace it with: "All information describing the observation should be accessable through the header." This ensures that the archive will as coherent as possible (and is also updatable easily at a later stage). ---135-------------------------------------------------------------------------- (WimBrouw) p40 7.2-R6: why not: All information shall be archived in SOC/OSF as soon as it is available in coherent units. This will enable headers; logs; rawdata in pieces; pipeline parts (like individual channel images) be off-loaded as soon as possible ---136-------------------------------------------------------------------------- (JoeSchwarz) p. 41, 7.2-R8 Shouldn't this be the responsibility of the Regional Centers? Why should there be an *additional* access point? ---137-------------------------------------------------------------------------- (JoeSchwarz) p. 41, 7.2-R9 This requirement can be dropped if each Regional Center has a full copy of the Archive. Backup of this amount of data is a *very* expensive exercise. ---138-------------------------------------------------------------------------- (JoeSchwarz) p. 41, 7.2-R11 Who decides what the "goal" of a scan is? Can't it serve multiple purposes? If an "expert" writes a series of low-level commands to define his/her observing procedure, how can the system figure out what his/her intention was? ---139-------------------------------------------------------------------------- (JoeSchwarz) p. 41, 7.2-R13 If we should sometimes store images and generate them on-the-fly at other times, then I conclude that uniformity is not a requirement for the Alma Archive. This could pose a problem for survey- type research (i.e., to make sure that you're getting a uniform sample, you'd have to reprocess all the images that you want to include). Is this really what's wanted? ---140-------------------------------------------------------------------------- (JoeSchwarz) p. 41, 7.2-R14.2 Does this mean that, for example, an e-mail should be sent to anyone who has ever used the Archive every time calibration procedures change? This sounds a little like spamming. ---141-------------------------------------------------------------------------- (JohnBenson) p. 41 7.0-R8 I think the archive should be accessable through web-tool GUI's on the internet. Essentially all information in the archive catalog tables (your header data I think) should be accessable to any qualified user. Our goal with the NRAO E2E Archive is to build web-tools that allow a user a wide set of queries, and allow selection and FTP downloading of archived data. I think the FTP downloading will be very popular for a substantial fraction of observing programs. The rest will have to be distributed on some recording media. ---142-------------------------------------------------------------------------- (MasatoshiOhishi) p. 41 3.7.2 Observational Archive In 7.2-R9, only one backup for the archive ? It would be useful to backup the principle archive in ALL RSCs. ---143-------------------------------------------------------------------------- (PrebenGrosbol) p.41 7.2-R13.4 'Image must always be archived if the pipeline cannot ...' This requirement just re-states a special case of 7.2-R13.3. ---144-------------------------------------------------------------------------- (PrebenGrosbol) p.41 7.2-R14.1 '... and provide the most up-to-date calibration.' It sounds like only the most resent calibrations are in the archive. I would think that all calibrations are there but you by default only get the latest. ---145-------------------------------------------------------------------------- (TimCornwell) Page 41, 3.7.3 Technical archive The concept of a technical archive is troublesome. There are different roles (observer, engineer, operator) that access the archive but I do not think that this should mean that there are different archives. I would rather see one archive that can be filled to analysis packages in different ways: e.g. some tables to engineers, most information but perhaps sub-sampled to analysis programs. In the AIPS++ MeasurementSet, we have subtables for environmental data (the WEATHER subtable) in the expectation that the observer might chose to flag all data where WIND_VELOCITY>20m/s. Similarly the observer might wish to see the operator log book. Hence the trend that I see is to move away from separate archives but allow filling programs to chose to fill different information from the archive in a context-dependent way. ---146-------------------------------------------------------------------------- (WimBrouw) p41 7.2-R7: There may be several SHADOW archives .. ---147-------------------------------------------------------------------------- (WimBrouw) p41 7.2-R9: One shadow archive shall act as a backup, and be shadowed on a continuous and complete basis Note: the shadowing could be done through a tape or other medium, not necssarily net. ---148-------------------------------------------------------------------------- (WimBrouw) p41 7.2-R10: does R6 not already imply this (i.e. how can you archive data if you have no access to archive?) ---149-------------------------------------------------------------------------- (WimBrouw) p41 7.2-R11: should be 2-way. I.e. header should know what the data (or dat piece) represents (cf R5 (either old or proposed from) ---150-------------------------------------------------------------------------- (WimBrouw) p41 7.2-R12: ceratinly at the start of the ALMA observations (and even much later) it will be completely unknown which technical/environment data could/should be used at some (maybe future) time to improve the quality of the reduced data. Limiting it at this stage to 'which is necessary to make off-line analysis' and 'if not present in header' is too constricting. Why not: 'The archive shall provide, in the observation header, the appropriate links to all technical data available for the observation period. This link is in addition to the set of technical data that is provided in the header. ---151-------------------------------------------------------------------------- (WimBrouw) p41 7.2--R13.5: Add: On-the-fly re-imaging should be available for saved images as well in special circumstances This to be able to redo bad calibration; or to compare with a newer observation done with a different calibration scheme. ---152-------------------------------------------------------------------------- (WimBrouw) p41 7.2-R14: I sthis talking about calibration 'DATA' or 'PROCEDURES' or both? Does transparently mean that all the maps in the archive are invalidated and/or automatically redone? Or just a message/code to any user of a stored map from before the change? ---153-------------------------------------------------------------------------- (WimBrouw) p41 7.2-R14.3 '... which should always use the 'standard' ...' is undefined. The standard will be different for images that are retrieved from stored to images done at the same time but re-imaged OTF. I suggest that if a stored imaged that is reflagged is retrieved, it will be re-imaged always OTF. ---154-------------------------------------------------------------------------- (WimBrouw) p41 7.3-R1: replace 'recorded' by 'measured'. Changed the part after ',' by: Each item shall contain a time-stamp. The reason is that you should not prescribe the archive organisation: a more hierarchal or other structure could be better than just a single threaded time series. (Even by requesting a time-stamp you request certain hardware characteristics of the information gathering devices). ---155-------------------------------------------------------------------------- (WimBrouw) p41 7.3-R2: Always dangerous to give an inclusive list as requirements. At the least say: - all measured environmental data - the water vapor radiometric raw data (at ~1s timescale) - all monitored data I should exclude the 'derived' pathlength. This is model dependent (and hence will vary with time), but also it is cheaper to recalculate (processing is always faster than reading data up to quite some processing; and it will save storage space. ---156-------------------------------------------------------------------------- (JoeSchwarz) p. 42, 7.3-R3 The "high- and low-level scripts" that we're asked to save in the Technical Archive were already saved in the Observational Archive in 7.2-R2. This reinforces the need to be explicit about what is to be *done* with the data, not *where* it is to go. Asking for the "monitor data" is pretty vague and open-ended. It would be good to think about what's really needed here. ---157-------------------------------------------------------------------------- (JoeSchwarz) p. 42, 7.3-R3 & R4 Things like electronic log books and records of manual operations should be available in the Archive, but it should be clear that it's not an Archival task to provide the user interfaces to enter this data. ---158-------------------------------------------------------------------------- (JoeSchwarz) p. 42, 7.4-R1 says that the Archive Search Tool should be a GUI, 7.4-R2 states that the Data Extractor Tool should use it as a front end, 7.4-R9 says that 'the Data Extractor Tool shall use the Search Tool'.... This is a clear contradiction! Which tool is using which other tool, shall a programmatic tool (Data Extractor Tool) use a GUI (the Archive Search Tool)? ---159-------------------------------------------------------------------------- (JoeSchwarz) p. 42, 7.4-R5 "Two interfaces" are asked for, but the differences between them are never described. ---160-------------------------------------------------------------------------- (JohnBenson) p.42 7.0-R10 You might add 'project' and 'observer' to your list of search criteria. 'Molecular transition' is a good idea, I hadn't thought of that one for the E2E Archive, I'll use it. ---161-------------------------------------------------------------------------- (PrebenGrosbol) p.42 7.3-R3 'The archive shall record all high- and low-level scripts' It seems not reasonable to request the archive to record scripts, it should rather store them. It must be the Dispatcher/Sequence which records the scripts in the archive. ---162-------------------------------------------------------------------------- (TadafumiTakata) p42 7.4-R6 Comment Exspecially in technical archive, it is very important to provide the function of data searching using any kind of keyword and header information flexibly. (R6 may be very essential especially during developing and first light phases.) ---163-------------------------------------------------------------------------- (TimCornwell) Page 42, 7.4-R1 A CLI must be available to interrogate the archive from scripts. In addition, one will also want a web service equivalent. ---164-------------------------------------------------------------------------- (WimBrouw) p42 7.3-R4: make it priority 1: especially at the start of operations the notes can be very helpful in disentangling any problem or error. ---165-------------------------------------------------------------------------- (WimBrouw) p42 7.4-R2: ... searching the OBSERVATION database ... ---166-------------------------------------------------------------------------- (WimBrouw) p42 7.4-R4: priority 2 (make the cookbook then priority 1): use feedback to finalise help ---167-------------------------------------------------------------------------- (WimBrouw) p42 7.4-R5: Do you mean: The AST shall have two interfaces: one mainly for astronomical product production; one mainly for technicien use (e.g. for quality monitoring and error tracing) I believe that data from both archives can (and should be) used for either astronomical and technical purposes. The interface for both purposes should be different (certainly), not the underlying available data. ---168-------------------------------------------------------------------------- (WimBrouw) p42 7.4-R6: 'The search criteria shall include all the information in the observation headers, including any information pointed to in sub-headers. Search criteria based on combination of search fields should be possible (e.g. time * bandwidth) They shall include (but not be limited to) e.g.: - your list; but: is integration time the total or the per sample; why not have a product of bandwidth and time (for continuum mostly) there are coupled items: configuration and resolution; frequency and resoltion, ... ---169-------------------------------------------------------------------------- (WimBrouw) p42 7.4-R6.1: Add: The user interface should be able to understand an SQL-type language with expressions between fields. (this takes also care, and should be combined with R.7) ---170-------------------------------------------------------------------------- (JoeSchwarz) p. 43, 7.4-R8 The intent of showing "query statements" isn't clear. We don't know what database technology we will use for the Archive; I assume the user just wants to know what the search criteria was and be able to modify them. ---171-------------------------------------------------------------------------- (JoeSchwarz) p. 43, 7.4-R12: This is impossible to provide for a general archive user. We must allow for some (possibly short) delay. The Scheduling Process and ALMA operations must have top priority at this stage. If we still think about something like a 'Fast Store Archive' and a separate general Archive, then the upload of the data to the general archive will have to be asynchroneous and might be delayed in case of peak load to the fast store. ---172-------------------------------------------------------------------------- (JoeSchwarz) p. 43, 7.4-R13 How can a Data Extractor provide links? Or is the Data Extractor really a GUI? ---173-------------------------------------------------------------------------- (MasatoshiOhishi) p. 43 3.7.4 User Interface In relation with 7.4-R13, it would be useful to have a hyperlink with published papers e.g., ADS if available, that used the relevant data. ---174-------------------------------------------------------------------------- (PrebenGrosbol) p.43 7.4-R14 '... request to get the file like' The requirement should be more specific and not just give one example. Also this requirement seems to contradict 7.4-R9 which states that the Search Tool shall be used. ---175-------------------------------------------------------------------------- (RayPlante) p. 43, 3.7.4, 7.4-R13 Wording seems a little funny. I think you mean that the Tool, when displaying project/data product descriptions, should be able to include hyperlinks that retrieve related images or catalog data from external archives. ---176-------------------------------------------------------------------------- (RayPlante) p. 43, 3.7.4, 7.4-R14: I didn't quite understand what this meant. Does it mean... a. the Tool should respond to a url-encoded HST archive search query? (I don't think so.) b. that the Tool accept URL-encoded search queries for retrieving data? This implies that the tool is a web service (as opposed to a client application that connects to the archive via the web). Personally, I don't think it is sensible to retrieve big data products via a search query as it's too easy for the query to return much more data (or not enough) than you want. Instead, associate each data product with an unique identifier, which is then retrievable via a unique URL. Search queries can then return these URLs along with other metadata; the Extractor (or user) can then select what data should actually be downloaded. This is the model used with the BIMA archive. The client application DaRT allows the user to download a list of URLs all in one shot. ---177-------------------------------------------------------------------------- (WimBrouw) p43 7.4-R10: Is meant as: 'A preview image of the image produced from the selected data shall be made available before transfer of the final image-cube'? I think this requirement is not meant that way, but what is meant to produce from the selected data a small image. Small in field? Integrated over spectrum? Low resolution? Taking only every 10th datapoint? Whatever way you do it, for it to give any indication of correctness, a full calibration and imaging must be done. Maybe it would be better to change this to something like: Before transmitting the selected data, an image of the distribution of the datapoints in the Fourier domain will be transmitted (or maybe a PTF) at some centre frequency. ---178-------------------------------------------------------------------------- (WimBrouw) p43 7.4-R10.1: cannot be done: the data selection will not correspond with pipeline result. Only way to do this is to add all the individual pipeline results (will they always be scaled the same?) images based on the data selected. ---179-------------------------------------------------------------------------- (WimBrouw) p43 7.4-R10.2: same m.m. ---180-------------------------------------------------------------------------- (WimBrouw) p43 7.4-R11: is that not inhrent in 7.2-R6? ---181-------------------------------------------------------------------------- (WimBrouw) p43 7.4-R13: replace 'hyperlink' with 'persistent link information' (hyperlinks have the tendency to become invalid; by using more modern ideas (e.g. XML links) this can be largely overcome: leave it to the archive designers to come with solution) ---182-------------------------------------------------------------------------- (WimBrouw) p43 7.4-R14: should not be as such in SSR. More: The DET should be able to accept (properly verified) web-based extraction requests ---183-------------------------------------------------------------------------- (WimBrouw) p43 7.4-R15: The DET must be invokable from the Offline .. (I suppose you do not want to limit it to Offline package only: must be stand-alone as well (and R14)) ---184-------------------------------------------------------------------------- (JoeSchwarz) p. 44, 7.5-R2.1 I believe there was an ASAC decision to exclude source extraction as part of the Alma project responsibility. If this is right, what are the "ALMA Catalogues", and who will produce them? ---185-------------------------------------------------------------------------- (JoeSchwarz) p. 44, 7.5-R3 This requirement says basically, "We don't know what the VO requirements are, but you must meet them, and moreover as Priority 1 [when Interim Science Ops begin]." This will be an exceptionally hard requirement to meet. ---186-------------------------------------------------------------------------- (MasatoshiOhishi) p. 44 3.7.5 Relationship with the VO Projects As one of leading persons for the Japanese Virtual Observatory Project, I would very appreciate to see this section. Important points are to guarantee data quality including in providing its reliable information, and to provide network-transparent interface to connect to each VO via, for example, the globus tool kit. ---187-------------------------------------------------------------------------- (PrebenGrosbol) p.44 7.4-R19 '... secure access to proprietary data' Security is good and needed but I would also have expected some more general requirements on access like that all users of the archive should be identified by user-id or something like that. ---188-------------------------------------------------------------------------- (TimCornwell) Page 44, 3.7.5 Relationship with Virtual Observatory Projects There are several blank checks being signed here! There is no definition yet of what it means to meet requirements for VO access so the project cannot take on an obligation to support such access. I would remove this section (I am in favor of VOs but not of signing blank checks). ---189-------------------------------------------------------------------------- (WimBrouw) p44 7.4-R17: If an archive user requests a DISK file... in A user accessible directory. The user will be informed about the estimated transfer time. An email message can be requested at the end of the transfer. ---190-------------------------------------------------------------------------- (WimBrouw) p44 7.4-R19: is DET or AST meant her (R9 says that the DET uses the AST). Or is it meant that login only necessary to get proprietary data? ---191-------------------------------------------------------------------------- (WimBrouw) p44 7.5-R1/R3: see earlier ---192-------------------------------------------------------------------------- (WimBrouw) p44 7.5-R2: Whatever ... to provide: - ALMA catalogs (i.e. archived images) - image quality information for archived images - data quality information for selected observations ---193-------------------------------------------------------------------------- (MasatoshiOhishi) p. 64 4.3.1 Schedule Scheduling Blocks How do we handle with "sudden phenomena" such as gamma-ray burst sources ? This would be related with 4.3.2 "Dispatch SB". ---194-------------------------------------------------------------------------- (PrebenGrosbol) p.74 4.5.3 Process Science Data '$Date 2001/05/03 13:50:32 $' Just as a final remark. This makes me loss my trust in configuration control. The Use Case seems to have been updated but the modification date not! ---195-------------------------------------------------------------------------- (WimBrouw) p77 4.6.2: The goal .... There are proprietary ... always available for everyoine. This is stated to sloppy in view of earlier comments in the archiving part that all search information should be available in header. This could e.g. be the max/min flux (as it should be I think), integrated flux; SNR; binning period for variable phenomna; source list coordinates etc which could be indicators of the astronomical results. Maybe 'header' should be replaced by 'pre-observational header information' or some other restriction 2. Does user provides all of these? some of them; as an example? exception Course 1: PART OF requested.. Last issue mentioned (pipeline output): is already completely defined in the requirements wnb 2002/03/18 _____oOo_____