Commit 0b69fe16 authored by Hector Mtz-Seara's avatar Hector Mtz-Seara
Browse files

Tests aurum3

parent 3cf36fcc
Loading
Loading
Loading
Loading
+203 −4
Original line number Diff line number Diff line
# aurum_3
# aurum3 gromacs acceptance tests

This repository contains information for the aurum tests as a part of the aurum3 tender.

We provide the following files:
## Content decription
- This repository contains information for the gromacs tests as a part of the aurum3 tender that are required to be passed for the acceptance of the cluster

## Disclaimer
- The following test might require some tuning by the provider to make them work on the purchased cluster. Typos or incorrect syntax in the templates and scripts provided do not exempt the provider from passing the tests. 

## General definitions
- A test can contain several tasks, and each task may contain several jobs.
- We define a task as a collection of jobs which have similar parameters defined in the test.
- A job is an own parallel execution (i.e., using mpi/gpu) to be executed in a unique set of nodes.

## Provided files
- Intructions for the tests  (Intructions_gmx2024_tests.docx) 
- Gromacs input files to perform the tests (*.tpr files)
- Examples scripts on how to run the tests (*.sh files)

## General considerations:
- All test must successfully pass without exception.
- All the tasks (i.e., group of jobs) within a test must be submitted to the job scheduler consecutively.
- When executing a test, no time gaps between the submission of tasks are acceptable beyond those produced by the job scheduler when functioning normally.
- All the jobs in each task must run simultaneously and use a unique set of nodes.
- Small differences in the starting execution times of the jobs within a task are acceptable when produced by the job scheduler when functioning normally.
- Each of the jobs within a task must exceed the minimal performance prerequisite established for the task.
- It is the responsibility of the provider to adjust the minimal instructions provided in the **sample script** for each test to run in the cluster in a way that fulfills the target performance.
- Job lanching scrips (run-slurm-gmx2024.sh and run-gpu-slurm-gmx2024.sh) called in the sample scripts have been validated for [Slurm](https://slurm.schedmd.com/) and use Gromacs installed using [Spack](https://spack.readthedocs.io/). It is the responsibility of the provider to adjust these scripts to the target cluster. 
- The results of the tests if not performed by the client must be provided for inspection to ensure their validity.
- The tests have to be performed using Gromacs 2024.3 or newer - The program has a GPLv3 license and can be downloaded for free from the following [page](http://www.Gromacs.org/).

## General terms and abbreviations:
- Number of Nodes and their types
   - $c_{cpu}$ = Total number of cores in computational nodes
   - $c_{mem}$ = Total number of cores in big memory nodes
   - $c_{gpu}$ = Total number of cores in GPU nodes
   - $c_{biggpu}`$ = Total number of cores in GPU nodes
   - $N_{cpu}$ = Total number of computational nodes
   - $N_{mem}$ = Total number of big memory nodes
   - $N_{gpu}$ = Total number of GPU nodes
   - $N_{biggpu}`$ = Total number of GPU nodes
   - $N = N_{cpu}+N_{mem}+N_{gpu}+N_{biggpu}$= Total number of nodes in the cluster
   - $N^{test}_{task,job}$ = Number of nodes to be used in a job for a given task belonging to a test
- Number of jobs required to run simultaneously within a task
   - $J_{cpu}^{max} = \text{floor interger part of } (N_{cpu}/N^{test}_{task,job})$ (Maximum number of simulatneous jobs in a task using computing nodes)
   - $J_{mem}^{max} = \text{floor interger part of } (N_{mem}/N^{test}_{task,job})$ (Maximum number of simulatneous jobs in a task using big memory nodes
   - $J_{gpu}^{max} = \text{floor interger part of } (N_{gpu}/N^{test}_{task,job})$ (Maximum number of simulatneous jobs in a task using gpu nodes)
   - $J_{biggpu}^{max} = \text{floor interger part of } (N_{gpu}/N^{test}_{task,job})$ (Maximum number of simulatneous jobs in a task using gpu nodes)

## Template submission in slurm of a gromacs job for 1 hour
```bash
#!/bin/bash
#SBATCH -J g2024
#SBATCH --partition=special_res
#SBATCH --exclusive
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=36
#SBATCH --exclusive


export SLURM_MPI_TYPE=pmix_v3

source /uochb/soft/a/spack/2024.1-gcc11.4-0.21.1/share/spack/setup-env.sh
spack load gromacs@2024.3

NAME="sys1_150k_gmx2024"
# Start gromacs Run
srun gmx_mpi mdrun -deffnm ${NAME} -nsteps -1 -v -maxh 1

```

## 1. Tests scaling in the cpu nodes
- **Test name:**: scaling_test_cpu
- **Nodes involved**: cpu computational nodes
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a periferal protein system in Gromacs.
- **tpr file:** sys1_150k_gmx2024.tpr 
- **Number atoms:** 149261
- **Short description of the test**: Full (36 cores max) ocupation of the computational nodes using 1 or more nodes per job. There might be few unallocaded nodes if $N_{cpu}$ is not a multiple of $N^{\text{scaling\_test\_cpu}}_{task,job}$.
- **Number of tasks:** 11 x 5
- **Storage for tasks:** This test must try all 5 different NFS shares in storage1, storage2, scratch1,  scratch2, cryo2.
- **Execution time per task:** 1 hour
- **Number of computational nodes per job in each task:**  $N^{\text{scaling\_test\_cpu}}_{\text{task,job}}= [1, 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20] Nodes.
- **Total duration test:** $\sim 1 \text{ hour per task} \times N^{\text{scaling\_test\_cpu}}_{\text{task,job}} \times 5 \text{ storages}$
- **Number of simultaneous jobs in each task:** $J_{cpu}^{max}$
- **Special test conditions:**
   - All tasks in the test need to be run sequentially without interruption to ensure that some jobs eventually use nodes placed in different sections of the cluster.
   - Only 36 cores can be used for each node in this test.
- **Current performance in aurum for each job:**

2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz per node

| Cores (Nodes)   |  36 (1) |  72 (2) | 144 (4) | 216 (6) | 288 (8) | 360 (10) | 432 (12) | 504 (14) | 576 (16)| 648 (18)| 720 (20)|
| ------ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ns/day |  13 |  25 |  48 |  71 |  90 | 104 | 130 | 152 | 167 | 175 | 188 |

- **Minimal performance for each job:**
The results for the same number of nodes with the number of cores per node restricted to 36 should be at least higher than 80% of the value obtained with our older aurum cluster.

## 2. Tests scaling in the mem nodes
- **Test name:**: scaling_test_mem
- **Nodes involved**: big memory cpu computational nodes
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a periferal protein system in Gromacs.
- **tpr file:** sys1_150k_gmx2024.tpr 
- **Number atoms:** 149261
- **Short description of the test**: Full (36 cores max) ocupation of the computational nodes using 1 or more nodes per job. There might be few unallocaded nodes if $N_{mem}$ is not a multiple of $N^{\text{scaling\_test\_mem}}_{task,job}$.
- **Number of tasks:** 2 x 5
- **Storage for tasks:** This test must try all 5 different NFS shares in storage1, storage2, scratch1,  scratch2, cryo2.
- **Execution time per task:** 1 hour
- **Number of computational nodes per job in each task:**  $N^{\text{scaling\_test\_men}}_{\text{task,job}}= [1, 2] Nodes.
- **Total duration test:** $\sim 1 \text{ hour per task} \times N^{\text{scaling\_test\_men}_{\text{task,job}} \times 5 \text{ storages}$
- **Number of simultaneous jobs in each task:** $J_{mem}^{max}$
- **Special test conditions:**
   - All tasks in the test need to be run sequentially without interruption to ensure that some jobs eventually use nodes placed in different sections of the cluster.
   - Only 36 cores can be used for each node in this test.
- **Current performance in aurum for each job:**

2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz per node

| Cores (Nodes)   |  36 (1) |  72 (2) |
| ------ | --- | --- |
| ns/day |  13 |  25 |

- **Minimal performance for each job:**
The results for the same number of nodes with the number of cores per node restricted to 36 should be at least higher than 80% of the value obtained with our older aurum cluster.


## 3. Tests scaling in the gpu nodes
- **Test name:**: scaling_test_gpu
- **Nodes involved**: gpu computational nodes
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a periferal protein system in Gromacs.
- **tpr file:** sys1_150k_gmx2024.tpr 
- **Number atoms:** 149261
- **Short description of the test**: Full (36 cores max) ocupation of the computational nodes using 1 or more nodes per job. There might be few unallocaded nodes if $N_{gpu}$ is not a multiple of $N^{\text{scaling\_test\_gpu}}_{task,job}$.
- **Number of tasks:** 2 x 5
- **Storage for tasks:** This test must try all 5 different NFS shares in storage1, storage2, scratch1,  scratch2, cryo2.
- **Execution time per task:** 1 hour
- **Number of computational nodes per job in each task:**  $N^{\text{scaling\_test\_gpu}}_{\text{task,job}}= [1, 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20] Nodes.
- **Number of cores per node:** Only 36 cores can be used for each node in this test 
- **Total duration test:** $\sim 1 \text{ hour per task} \times N^{\text{scaling\_test\_gpu}_{\text{task,job}} \times 5 \text{ storages}$
- **Number of simultaneous jobs in each task:** $J_{gpu}^{max}$
- **Special test conditions:**
   - All tasks in the test need to be run sequentially without interruption to ensure that some jobs eventually use nodes placed in different sections of the cluster.
   - Only 36 cores can be used for each node in this test.
   - No GPU is allowed
- **Current performance in aurum for each job:**

2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz per node

| Cores (Nodes)   |  36 (1) |  72 (2) | 144 (4) | 216 (6) | 288 (8) | 360 (10) | 432 (12) | 504 (14) | 576 (16)| 648 (18)| 720 (20)|
| ------ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ns/day |  13 |  25 |  48 |  71 |  90 | 104 | 130 | 152 | 167 | 175 | 188 |

- **Minimal performance for each job:**
The results for the same number of nodes with the number of cores per node restricted to 36 should be at least higher than 80% of the value obtained with our older aurum cluster.



## 4. Tests scaling in the biggpu nodes (CPU only)
- **Test name:**: scaling_test_biggpu
- **Nodes involved**: gpu computational nodes
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a periferal protein system in Gromacs.
- **tpr file:** sys1_150k_gmx2024.tpr 
- **Number atoms:** 149261
- **Short description of the test**: Full (36 cores max) ocupation of the computational nodes using 1 or more nodes per job. There might be few unallocaded nodes if $N_{biggpu}$ is not a multiple of $N^{\text{scaling\_test\_gpu}}_{task,job}$.
- **Number of tasks:** 5 x 5
- **Storage for tasks:** This test must try all 5 different NFS shares in storage1, storage2, scratch1,  scratch2, cryo2.
- **Execution time per task:** 1 hour
- **Number of computational nodes per job in each task:**  $N^{\text{scaling\_test\_biggpu}}_{\text{task,job}}= [1, 2, 4, 6, 8] Nodes.
- **Total duration test:** $\sim 1 \text{ hour per task} \times N^{\text{scaling\_test\_biggpu}_{\text{task,job}} \times 5 \text{ storages}$
- **Number of simultaneous jobs in each task:** $J_{biggpu}^{max}$
- **Special test conditions:**
   - All tasks in the test need to be run sequentially without interruption to ensure that some jobs eventually use nodes placed in different sections of the cluster.
   - Only 36 cores can be used for each node in this test.
   - No GPU is allowed
- **Current performance in aurum for each job:**

2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz per node

| Cores (Nodes)   |  36 (1) |  72 (2) | 144 (4) | 216 (6) | 288 (8) |
| ------          | ---     | ---     | ---     | ---     | ---     |
| ns/day          |  13     |  25     |      48 |      71 |      90 |

- **Minimal performance for each job:**
The results for the same number of nodes with the number of cores per node restricted to 36 should be at least higher than 80% of the value obtained with our older aurum cluster.


## 5. Test endurance cluster - not internode calculations
- **Test name:**: endurance_test_all
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a periferal protein system in Gromacs.
- **Number atoms:** 172742
- **Short description of the test:** Full occupation of all nodes (cpu, mem, gpu, and biggpu) using 1 job per node in cpu and mem nodes, and n jobs per nodes in the gpu and biggpunodes. It is a stress test where all the cluster should work under full load for at least 23h.
- **Number of tasks:** 1
- **Number of computational nodes per job in each task:** 1
- **Number of simultaneous jobs in each task:** $J_{cpu}^{max} + J_{mem}^{max} + J_{gpu}^{max} + J_{biggpu}^{max}$
- **Execution time per task:** 23 hours
- **Total duration test:** 23 hours
- **Special test conditions:**
   - All jobs must run simulatneously for the fixed period of 23h without interruptions.
   - All cores must be used, hyperthreading or similar technology must be activated when available.
   - In gpu containing nodes there should be 1 job running for each gpu (n_jobs). The cores in the node should be devided equally devided between the n_jobs.

- **Minimal performance for each job:**
    - jobs in cpu and mem nodes: 50 ns/day
    - jobs in gpu nodes: 130 ns/day 

The following reference performance values were obtained to compare with this tests:
-  39 ns/day were obtained using 48/96 cores/threads (AMD EPYC 9454 48-Core Processor)
- 161 ns/day were obtained using 8 cores (AMD EPYC 9454 48-Core Processor)  and 1x L40s graphic card.

sys1_150k_gmx2024.tpr

0 → 100644
+5.46 MiB

File added.

No diff preview for this file type.