PHYSnet cluster integration
Research conducted by the University of Hamburg.
In research area III devoted to adaptation, testing, and optimization, we have set the following goals:
- deploy tools developed within FIDIUM to selected computing centers
- integrate into production/analysis environments of HEP experiments
- optimize to requirements for typical analysis workflows
PHYSnet cluster overview
PHYSnet cluster contains resources for computing shared by all institutes of physics faculty. It contains:
- heterogeneous, multiple pools/queues for diverse applications: idefix.q, infinix.q, obelix.q, epyx.q, graphix.q
- parts reserved for exclusive use by various project groups, thus providing high flexibility for tailoring to individual/group use cases.
To use these resources for HEP, adaptation using containerization technologies and transparent integration into HEP-specific infrastructure is required.
Current setup vs ideal setup
As a perfect setup, we see transparent integration of compute resources from third-party sites into a single “overlay batch system”.
The current setup is a working small-scale setup deployed at PHYSnet for testing. It includes:
- small dedicated HTCondor instance with schedd running on general purpose “compile node” as a central manager
- drones submitted to local SGE batch system as long-running jobs with startd running inside drones and connecting to other HTCondor daemons
- CernVM-File System (CVMFS) mounted in userspace using cvmfsexec
- all components are running without elevated privileges.
In the current setup, container sources:
- are unpacked from container images taken from /cvmfs/unpacked.cern.ch. For drones, htcondor-wn image has been developed by KIT. For job containers, standard CMS CentOS 7 image cc7-cms is used.
- htcondor-wn provides flexibility to dynamically reconfigure drones. ansible and condor-git-config are used to reconfigure HTCondor without needing to restart container.
The image below shows both preferred and current setups:
Workflows for first large-scale tests
Several workflows listed below have been studied in preparation of the first large-scale test:
- simple file transfer from/to grid storage elements via gfal2 libraries and X.509 authentication was tested. It works without problems, and was used to benchmark file transfer to various grid sites.
- typical EDM file processing with CMS software framework CMSSW has been checked. We have determined that precompiled user code can run inside drones using CMS-specific containers. Balance between I/O-intensive (e.g. calibration) and CPU-intensive (analysis) tasks has been found.
- fully orchestrated workflows using modern columnar-based analysis tools were tried. Run-3 CMS analyses based on NanoAOD in development at UHH & can be used for first studies. Workflow management tools (e.g. luigi/law) are leveraging HTCondor for job submission.
Understanding resource needs for columnar analysis
The ttbar reconstruction use case is challenging due to large jet combinatorics [O(3Njet) possible assignments per event]. Integrated hooks are needed in workflow for profiling memory allocations.
Future work
Among planned developments, we can name:
- dedicated host(s) for essential services, such as HTCondor scheduler, specific scratch space/cache, monitoring
- site-wide CVMFS installation.
Further development of the FIDIUM project can include:
- drone management with CoBalD/TARDIS
- prototype of federated dCache instance
- integration into overlay batch system @ NAF.
Note that working on FIDIUM extension includes collaboration with other sites.
Future developments can be represented as a diagram below:
Automation with COBalD/TARDIS
The goal of working on automation with COBalD/TARDIS is providing on-demand provisioning of resources based on cluster use metrics. COBalD/TARDIS deployment provides:
- resource integration
- plugins that provide access to external services
- provision resources to batch system users
- control dynamic provisioning of resources.