Testing future infrastructure designs for efficiency in simulation

Research conducted by the Karlsruhe Institute of Technology (KIT).

 

In our work on Research Area II, focused on data lakes, distributed data, and caching, we completed the search for and deployment of a flexible, high-performance, and accurate simulation tool. This effort also served as a preparation for practical research into innovative and realistic computing infrastructures.

When planning the future computing infrastructure for High Energy Physics (HEP), we considered a range of complex requirements, including:

  • Analysis, reconstruction, and simulation on the Worldwide LHC Computing Grid (WLCG).
  • A dynamic, distributed infrastructure involving many components, variable workloads, data access, resource availability, and complex scheduling systems.
  • Limited resources for infrastructure modernization.

After thoroughly analyzing these needs and constraints, we implemented a new simulation infrastructure. Previously, the MONARC simulation network was used, but now we have chosen the SimGrid and WRENCH open-source simulation frameworks.

  • SimGrid offers low-level simulation abstractions for distributed systems, modeling network, storage, and CPU resources with a fluid approach.
  • WRENCH, built on SimGrid, provides high-level tools and services for defining and managing activities within the simulation.

We have also implemented HEP-specific adaptations, such as job, dataset, and workflow models, data streaming logic, and service management capabilities. These extensions are available as open-source code.

To implement this simulation infrastructure, the following steps were completed:

  1. Define job workloads, including operations, memory requirements, input datasets, and output sizes.
  2. Specify the platform parameters, such as CPU speed, RAM, disk space, and network roles.
  3. Deploy storage systems and allocate files.
  4. Run the simulation: jobs are scheduled, inputs are streamed and cached, and cache management is handled dynamically.
  5. Calibrate the simulator by adjusting free parameters to align with real-world system behavior.
  6. Validate the simulator by comparing its performance with actual data, such as job execution times relative to cached input files.

This work represents a significant step forward in developing future-ready computing infrastructures for HEP, combining flexibility, efficiency, and accuracy in a simulation environment.

Future work

Looking ahead, we have plans for further enhancing our simulation capabilities. We will focus on improving the speed of simulations and surrogate models, automating calibration and validation processes, and adding uncertainty estimations for parameters, calibration, and models.

Additionally, if needed, we will develop simulations that track energy consumption, helping us better understand and optimize resource usage.

Another key goal is to create a data-aware scheduling system that intelligently accounts for cached data, making our simulations even more efficient and responsive.

Cookie Consent mit Real Cookie Banner