Caching evaluation at PHYSnet cluster

Research conducted by the University of Hamburg.

 

During TA2, we were working on exploring and deploying advanced data caching technologies, particularly for dynamic data caches near newly integrated CPU resources.

Significant progress has been made in setting up HEP-specific software on the PHYSnet cluster, which is a shared resource among all institutes within the physics faculty.

The cluster offers diverse pools and queues for various applications, with parts reserved for specific project groups. Through containerization technologies, the PHYSnet cluster is now adaptable to HEP workflows.

We’ve successfully deployed a small-scale testing setup on PHYSnet, including a dedicated HTCondor instance and drones submitted as long-running jobs to the local SGE batch system. Using container images from /cvmfs/unpacked.cern.ch, we’ve mounted the CernVM-File System in user space via cvmfsexec, and all components are operating without elevated privileges.

Current installation of PHYSnet

In preparation for a cache-enabled installation, we’ve identified several workflows to evaluate caching performance:

  • Customized production of standardized, ready-calibrated n-tuples (such as the CMS NanoAOD format)
  • Object calibration, including jet energy scale and resolution
  • Fully orchestrated analysis workflows using modern tools and data formats like AwkwardArray and Apache Arrow.

Modern orchestrated workflows greatly benefit from caching, making this an essential step in optimizing performance and efficiency for complex HEP analyses.

Future work

We plan to enhance the PHYSnet cluster by setting up dedicated hosts for essential services like the HTCondor scheduler, dedicated scratch space, cache, and monitoring. Additionally, we aim to deploy the XCache service and conduct performance comparisons to optimize efficiency.

Future installation of PHYSnet

As part of the FIDIUM project extension, our goals include:

  • advancing drone management with COBalD/TARDIS
  • developing a prototype for a federated dCache instance
  • integrating into the overlay batch system at NAF.
Cookie Consent mit Real Cookie Banner