The ALICE Offline Software Framework
The data production of the LHC experiments (about 10 - 15 PB per year) is at a new scale compared to any previous experiment. In ALICE, an average Pb-Pb event will have a size of about 13.75 MB; on average a p-p event is about 1.1 MB. For a standard running year, of the order of 109 p-p events and 108 Pb-Pb events are expected yielding a total raw data volume of 2.5 PB. The data taken with cosmics in 2008 amounts to about 300 TB. Two thirds were taken in so-called global runs with several participating subdetectors, a condition similar to real data-taking. The average size of the reconstruction output is 3 MB for a Pb-Pb event and 40 kB for a p-p event. This only includes high-level information needed for user analysis. Examples are the event-vertex position, reconstructed track parameters, and PID information. The required computing resources for the reconstruction and analysis of the raw data as well as the production of simulated events needed for the understanding of the data exceed the computing power of single institutes and even of centers like CERN. Therefore, institutes that are part of the collaboration also provide storage and computing resources. At present 80 centers contribute to ALICE’s computing resources. Distribution of the data for reconstruction and analysis cannot be performed manually and this led to the need for an automated system. The concept of GRID was identified as a solution. ALICE uses the ALICE Environment (AliEn) system as a user interface to connect to a GRID composed of ALICE-specific services that are part of the AliEn framework and basic services of the Grid middleware installed at the different sites. A dedicated framework called AliRoot enables simulation and reconstruction of ALICE events to be performed. It is also the basis for any analysis performed on the data.
Dataflow
The raw data taken by the subdetectors has to be processed before be available in the form of reconstructed events for further analysis. This happens in several stages as illustrated in Figure 1. Data originating from the subdetectors (denoted by 1 in Figure 1) are processed by LDCs, global events are built by GDCs (2). The so-called publish agent registers the assembled events into the AliEn system (3) and ships them to the CERN computing center where they are stored first on disks (4) and then permanently on tapes (5) by the CASTOR system. During data-taking the subdetectors also produce conditions data that are relevant for the calibration of individual detector signals. Conditions data provides information about the detectors status and environmental variables during data-taking: inactive and noisy channel maps, distributions describing the response of a channel, temperatures and pressure in a detector, and detector configuration. Many of the conditions data could in principle be calculated from the raw data and extracted offline after data-taking. However, such an approach would require an additional step over the raw data before the reconstruction requiring not available online computing resources. Therefore, conditions data are already extracted during data-taking. Conditions data are produced by special programs that process the raw data stream and extract the needed values. These programs work in the realm of DAQ, DCS (Detector Control System), and HLT and store their output on so-called File eXchange Servers (FXS) (6-8 in Figure 1).
Figure 1: Global view of ALICE's data flow
A dedicated program called Shuttle collects these outputs and makes them available to the reconstruction. Furthermore, it retrieves information about the run from the ECS logbook (9) and collects continuously monitored values that are written by DCS into the DCS Archive (10). After processing the data, the Shuttle registers the produced condition files in AliEn (11) and stores the data in CASTOR (12). With the registration of the raw and conditions data the transition from the online to the offline environment has taken place. Online denotes all actions and programs that have to run in real time. Offline processing is the subsequent step, like for example event reconstruction, which is executed on Worker Nodes (WN) of GRID sites located around the Globe.
The ALICE Analysis Framework on the GRID: Alice Environment (AliEn)
The GRID paradigm implies the unification of resources of distributed computing centers, in particular computing power and storage, to provide them to users all over the world. It allows computing centers to offer their resources to a wider community. This allows resources in large collaborations to be shared. The huge amount of data produced by the ALICE detector (∼ 2 PB per year) makes almost unavoidable the necessity of automatized procedures for the (software) reconstruction of the events and for the first steps of the analysis, with the consequent employ of a large mass of computing resources. The worldwide distributed GRID facilities were designed to provide both the computing power and the disk space needed to face the LHC software challenge. Hence the need of a GRID-oriented analysis code. One of the main advantages in using the GRID is the possibility to analyze a large set of data by splitting a job analysis into many “clone” subjobs running in parallel on different computing nodes. The ALICE VO (Virtual Organization) is made of more than 80 sites distributed worldwide (Figure 2 is a snapshot of the sites in Europe).
Figure 2: A snapshot of the ALICE VO sites in Europe. A green circle indicates that jobsare running on the sites while red and yellow circles indicate sites with problems.
Each site is composed of many WN, which are the physical machines where the software programs can be run. The Storage Element (SE) is responsible for managing physical files in the site and for providing an interface to mass storage. The Computing Element (CE) service is an interface to the local (WN) batch system and manages the computing resources in the site. The ALICE Collaboration has developed AliEn as an implementation of distributed computing infrastructure needed to simulate, reconstruct and analyze data from the experiment. AliEn provides the two key elements needed for large-scale distributed data processing: a global file system (catalogue) for data storage and the possibility to execute the jobs in a distributed environment. The analysis software, the user code and the AliRoot libraries (or par files in the case a development of the code is not deployed on the GRID), needed by each subjob to run must be specified in a JDL (Job Description Language), together with the data sample and the way to split it. The data sample is specified through a XML (eXecutable Machine Language) collection file which contains a list of the Logical File Names (LFN, the entries in the catalogue) of the files to be executed.
The AliRoot Framework
AliRoot is the offline framework for simulation, alignment, calibration, reconstruction, visualization, quality assurance, and analysis of experimental and simulated data. It is based on the ROOT framework. Most of the code is written in C++ with some parts in Fortran that are wrapped inside C++ code. The AliRoot development started in 1998 and it has been extensively used for the optimization of the experiment’s design. It has been used for large-scale productions, so-called Physics Data Challenges (PDCs), where millions of events are produced. These have been used to estimate the physics performance of ALICE. Such events are also used to develop analysis procedures and to estimate the associated systematic errors, as is performed in this thesis. Finally, AliRoot is used to reconstruct events that occurred in the detector. For event simulation the framework provides the following functionality:
- Event generation. A collision is simulated by an event generator that is interfaced with AliRoot (e.g. Pythia, Phojet, or HIJING); this step produces the kinematics tree containing the full information about the generated particles (type, momentum, charge, production process, originating particle, and decay products).
- Transport. The particles are propagated through the detector material which is modeled as realistically as possible. In this process, particles can interact with matter, decay, and create additional particles. Naturally, these particles have to be propagated through the detector as well. The total number of particles after the transport is significantly larger than the number of particles created in the initial generation step. During this process all interactions of particles with sensitive detector parts are recorded as hits that contain the position, time, and energy deposit of the respective interaction. Furthermore, track references that can be used to follow a track’s trajectory, mainly needed for the debugging of the reconstruction algorithms, are stored. Programs that perform the transport and are interfaced with AliRoot are Geant3, Geant4, and Fluka.
- Digitization. If a particle produced a signal in a sensitive part (hit), the corresponding digital output of the detector is stored as a summable digit taking into account the detector’s response function. Possible noise is then added to the summable digit and it is stored as a digit. Summable digits allow events to be merged without duplication of noise. In the last step, the data is stored in the specific hardware format of the detector (raw data).
At this stage the raw data corresponds to the signals that would be produced by an interaction of the same kind within the detector. The subsequent reconstruction is identical, both for simulated as well as real events. It consists of the following steps:
- Cluster finding. Particles that interact with the detector usually leave a signal inseveral adjacent detecting elements or in several time bins of the detector. In this step these signals are combined to form clusters. This allows the exact position or time of the traversing particle to be determined and reduces the effect of random noise. Overlapping signals from several particles in a single cluster are unfolded. This step is performed for each subdetector where due to the different nature of the subdetectors the implementations vary significantly.
- Track reconstruction. The clusters are combined to form tracks that allow the track curvature and energy loss to be calculated with the aim of determining the associated momentum and particle identity. The tracking is a global task as well as an individual procedure per detector. The global central barrel tracking starts from track seeds in the TPC which are found by combining information from a few outermost pad rows under the assumption that the track originated from the primary vertex. Tracks are then followed inwards using a procedure called the Kalman filter: in each step the track, i.e. the track parameters and the covariance matrix, is propagated to the next pad row. The covariance matrix is updated adding a noise term that represents the information loss by stochastic processes such as multiple scattering and energy-loss fluctuations. If a cluster is found that fits to the track, it is added to the track, updating its parameters and the covariance matrix. Afterwards the same procedure is repeated by starting the seeding closer to the collision point. In a final step all clusters already associated to tracks are removed and the procedure is repeated without requiring that the seeds point to the primary vertex. The result, the so-called TPC-only tracks to which only TPC information contributed, is saved in the reconstruction output. Subsequently, these tracks are complemented with information from the ITS, TRD, and TOF as well as HMPID and the veto of PHOS if the track is in their acceptance, which produces so-called global tracks. Tracks can also be formed out of information from the ITS only. Tracks are represented by the parameters y, z, sinφ, tanθ and 1/pt.
- Primary-vertex reconstruction. Various information are used to find the primary-vertex position of the interaction. Examples of information, each of which is sufficient to produce a vertex position, are clusters in the SPD, tracks in the TPC, and global tracks. When a vertex position is found the tracks are constrained to it: the vertex position is used as an additional point to estimate the track parameters. The TPC-only tracks are constrained with the vertex position found with TPC-only tracks while the global tracks are constrained with the vertex position found with global tracks. Of course this constraint is only used for tracks that actually pass near the vertex.
- Secondary-vertex reconstruction. Tracks are combined to find secondary vertices in order to reconstruct decayed particles like Λ → pπ− and photon conversions. For this purpose, opposite-sign tracks that originate sufficiently far away from the primary vertex are combined. If the closest approach and the topology of the two tracks are consistent with a decay, the pair is accepted as a potential secondary vertex.
The output of the reconstruction is called Event-Summary Data (ESD) which contains only high-level information such as the position of the event vertex, parameters of reconstructed charged particles together with their PID information, positions of secondary vertex candidates, parameters of particles reconstructed in the calorimeters, and integrated signals of some subdetectors. These data are further reduced to the Analysis-Object Data (AOD) format. These smaller-sized objects contain only information needed for the analysis. Therefore, the transformation procedure may already contain a part of the analysis algorithm, for example track selection. Several AODs, focusing on different physics studies, can be created for a given event.
The Resonance (RSN) Package
The Resonance Package is a set of tools built in AliRoot to help the user to implement his or her resonance analysis. It's tool set quite sofisticated but it's use is quite easy and allow the analizer to set his or her own cut in a transparent way, ie independently of the particle under analysis. More details on the RSN Package are give in the Rsn Package Tutorial.