Research Areas

The objectives of the project are divided into three interconnected topics. Research area I covers the development of tools and technologies for the integration and utilization of heterogeneous computing resources, while research area II is intended to further develop the integration of data lakes and associated caches. The adaptation and testing of the technologies developed in areas I and II on concrete target systems as well as the evaluation of the performance in combined tests is dealt with in research area III. Each area consists of several work packages that are described below.

Research area I: Development of tools for heterogeneous resources integration

Coordinators: Manuel Giffels, Oliver Freyermuth

The goal of the research area I is the results-oriented, collaborative and cross-disciplinary development of tools, technologies, and structures for the integration and efficient usage of heterogeneous computing resources. This area includes 2 work packages:

1. Accessing and efficient integration of opportunistic resources
  - Further development and adaptation of Resource Manager COBalD/TARDIS to future conditions
  - Development of dynamic control of job scheduling (e.g. consideration of data locality, I/O rates)
  - Automated scaling of peripheral services
  - Implementation of the „Compute Site in a Box“ concept that implies utilization of resources at Tier 3 and Tier 2 centers with minimal additional administrative resources (full automation, scalability).
2. Accounting and controlling of heterogeneous resources
  - Tools for accurate tracking and documenting reserved and actually used resources
  - Tools for continuous monitoring of usage efficiency.

Research area II: Data lakes, distributed data, caching

Coordinators: Kilian Schwarz, André Brinkmann

The goal of research area II is the results-oriented, collaborative, and cross-disciplinary development of technologies for building efficient and intelligent federated data storage infrastructures. All solutions will be integrated into the international context and take into account the needs of large-scale experimental collaborations. This area includes 4 work packages:

1. Setup of a real-time data lake monitoring system
  - Logging the utilization of data lake components
  - Logging data access patterns.
2. Development and further extension of technologies for data lake caching
  - Further development and consolidation of the data lake caching technologies
  - Efficient integration of dynamic data caches in data lake and on CPU resources
  - Usage of parallel ad-hoc file systems as caches in HPC systems.
3. Developing technologies for data lake data and workflow management
  - Replication and placement mechanisms
  - Demand-driven data management mechanisms
  - Efficient data access and adaptation to workload management systems.
4. Creating data lake prototypes, technologies for QoS, and efficient connectivity
  - Construction of data lake prototypes
  - Efficient connectivity of users, centers, data sources, and computing infrastructures
  - Developing technologies for Quality of Service (QoS).

Research area III: Adaptation, testing and optimization of production and analysis environments

Coordinators: Christian Zeitnitz, Günter Duckeck

Successful testing of all components developed in research areas I and II and their integration into an overall system is an important precondition for the new system to be successfully used for operating heterogeneous resources in the long run and also to have a wider worldwide distribution (e.g. WLCG centers). This is the goal of the research area III. To accomplish it, the tools that are being developed must show that they are lean, robust and effective when in operation. Furthermore, the scalability and ability to meet diverse requirements of research projects will be investigated. This area includes 3 work packages:

1. Integration, testing, optimization, and deployment of the developed services
  - Integration of the various components (e.g. workflow management, caching, accounting, resource management and monitoring systems)
  - Functional tests on selected centers
  - Integration into the production environment with experiments involved
  - After successful testing is completed, deployment of the complete solution to the available WLCG Tier centers, HPC centers, and cloud providers.
2. Specific adaptation of services to complex workflows and usage of specific technologies for scientific data analysis
  - Optimization of specific workflows with high I/O load, memory requirements, GPU usage, etc.
  - Optimization for fast parallel analysis of large data sets with modern vector-based analysis algorithms.
3. Support
  - Establishment of a cross-site support team that will assist centers and all interested parties with the solution installation, operation, and maintenance.