physics and beyond
# Automated Parallel Calculation of Collaborative Statistical Models

Dr. Wouter Verkerke

Nikhef

Nikhef

Dr. Patrick Bos

eScience Research Engineer

eScience Research Engineer

The recent discovery of the Higgs boson in 2012 by the ATLAS and CMS experiments at the Large Hadron Collider at CERN, Geneva, is a prime example of the success of large scale statistical data analysis in particle physics. At the LHC approximately 10 Petabytes of data are recorded every year of data taking. The scientific goal of the examination of proton-proton collisions is to explore whether previously unseen particles are produced in these collisions, whose presence may be indicative of previous unconfirmed or unknown fundamental physics.

As decay products of the sought-after particles may decay in a multitude of ways, and are buried among hundreds of decay products collision, constructing proof of the existence of these particles requires an exhaustive analysis of collision data. The final statistical evidence combines the results of the analysis of dozens partial data samples that each isolate a signature of interest or measure an important background or nuisance parameter.

**Collaborative statistical modelling**

In recent years the concept of collaborative statistical modelling has emerged, where detailed statistical models of measurements performed by independent teams of scientists are combined a posteriori without loss of detail. The preferred tool to do this, RooFit, allows to build probability models from expression trees of C++ objects that can be recursively composed into descriptive models of arbitrary complexity.

**Computational performance is a limiting issue**

With the emergence of ever more complex models, computational performance is now becoming a limiting issue. The work in this project aims to introduce eScience techniques to improve computational performance: vectorization and parallelization of calculations will lead to significant improvements in performance, while new structures to represent the combined data will simplify the process of building joint models for heterogeneous datasets.

**Useable in lateral directions**

With much improved scalability of computional efficiency the developed software can also become useable in lateral directions such as spectral CT image reconstruction.

*Image: CMS Doomsday at the CERN LHC by solarnu - **https://www.flickr.com/photos/solarnu/2078532845*