Commit d2ac6e95 authored by Kiran K Telukunta's avatar Kiran K Telukunta
Browse files

docs: README update

parent ef9f919a
Loading
Loading
Loading
Loading
+25 −25
Original line number Diff line number Diff line
# aurum3 gromacs acceptance tests


## Content decription
## Content description
- This repository contains information for the gromacs tests as a part of the aurum3 tender that are required to be passed for the acceptance of the cluster

## Disclaimer
@@ -13,18 +13,18 @@
- A job is an own parallel execution (i.e., using mpi/gpu) to be executed in a unique set of nodes.

## Provided files
- Intructions for the tests  (Intructions_gmx2024_tests.docx) 
- Instructions for the tests (Instructions_gmx2024_tests.docx) 
- Gromacs input files to perform the tests (*.tpr files)

## General considerations:
- All test must successfully pass without exception.
- All tests must successfully pass without exception.
- All the tasks (i.e., group of jobs) within a test must be submitted to the job scheduler consecutively.
- When executing a test, no time gaps between the submission of tasks are acceptable beyond those produced by the job scheduler when functioning normally.
- All the jobs in each task must run simultaneously and use a unique set of nodes.
- Small differences in the starting execution times of the jobs within a task are acceptable when produced by the job scheduler when functioning normally.
- Each of the jobs within a task must exceed the minimal performance prerequisite established for the task.
- It is the responsibility of the provider to adjust the minimal instructions provided in the **sample script** for each test to run in the cluster in a way that fulfills the target performance.
- Job lanching scrips (run-slurm-gmx2024.sh and run-gpu-slurm-gmx2024.sh) called in the sample scripts have been validated for [Slurm](https://slurm.schedmd.com/) and use Gromacs installed using [Spack](https://spack.readthedocs.io/). It is the responsibility of the provider to adjust these scripts to the target cluster. 
- Job launching scripts (run-slurm-gmx2024.sh and run-gpu-slurm-gmx2024.sh) called in the sample scripts have been validated for [Slurm](https://slurm.schedmd.com/) and use Gromacs installed using [Spack](https://spack.readthedocs.io/). It is the responsibility of the provider to adjust these scripts to the target cluster. 
- The results of the tests if not performed by the client must be provided for inspection to ensure their validity.
- The tests have to be performed using Gromacs 2024.3 or newer - The program has a GPLv3 license and can be downloaded for free from the following [page](http://www.Gromacs.org/).

@@ -41,10 +41,10 @@
   - $N = N_{cpu}+N_{mem}+N_{gpu}+N_{biggpu}$= Total number of nodes in the cluster
   - $N^{test}_{task,job}$ = Number of nodes to be used in a job for a given task belonging to a test
- Number of jobs required to run simultaneously within a task
   - $J_{cpu}^{max} = \text{floor interger part of } (N_{cpu}/N^{test}_{task,job})$ (Maximum number of simulatneous jobs in a task using computing nodes)
   - $J_{mem}^{max} = \text{floor interger part of } (N_{mem}/N^{test}_{task,job})$ (Maximum number of simulatneous jobs in a task using big memory nodes
   - $J_{gpu}^{max} = \text{floor interger part of } (N_{gpu}/N^{test}_{task,job})$ (Maximum number of simulatneous jobs in a task using gpu nodes)
   - $J_{biggpu}^{max} = \text{floor interger part of } (N_{gpu}/N^{test}_{task,job})$ (Maximum number of simulatneous jobs in a task using gpu nodes)
   - $J_{cpu}^{max} = \text{floor integer part of } (N_{cpu}/N^{test}_{task,job})$ (Maximum number of simultaneous jobs in a task using computing nodes)
   - $J_{mem}^{max} = \text{floor integer part of } (N_{mem}/N^{test}_{task,job})$ (Maximum number of simultaneous jobs in a task using big memory nodes
   - $J_{gpu}^{max} = \text{floor integer part of } (N_{gpu}/N^{test}_{task,job})$ (Maximum number of simultaneous jobs in a task using gpu nodes)
   - $J_{biggpu}^{max} = \text{floor integer part of } (N_{gpu}/N^{test}_{task,job})$ (Maximum number of simultaneous jobs in a task using gpu nodes)

## Template submission in slurm of a gromacs job for 1 hour
```bash
@@ -69,16 +69,16 @@ srun gmx_mpi mdrun -deffnm ${NAME} -nsteps -1 -v -maxh 1
```

## 1. Tests scaling in the cpu nodes
- **Test name:**: scaling_test_cpu
- **Test name:** scaling_test_cpu
- **Nodes involved**: cpu computational nodes
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a periferal protein system in Gromacs.
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a peripheral protein system in Gromacs.
- **tpr file:** sys1_150k_gmx2024.tpr 
- **Number atoms:** 149261
- **Short description of the test**: Full (36 cores max) ocupation of the computational nodes using 1 or more nodes per job. There might be few unallocaded nodes if $N_{cpu}$ is not a multiple of $N^{\text{scaling\_test\_cpu}}_{task,job}$.
- **Short description of the test**: Full (36 cores max) occupation of the computational nodes using 1 or more nodes per job. There might be few unallocated nodes if $N_{cpu}$ is not a multiple of $N^{\text{scaling\_test\_cpu}}_{task,job}$.
- **Number of tasks:** 11 x 5
- **Storage for tasks:** This test must try all 5 different NFS shares in storage1, storage2, scratch1,  scratch2, cryo2.
- **Execution time per task:** 1 hour
- **Number of computational nodes per job in each task:**  $N^{\text{scaling\_test\_cpu}}_{\text{task,job}}= [1, 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20] Nodes.
- **Number of computational nodes per job in each task:**  $N^{\text{scaling\_test\_cpu}}_{\text{task,job}}$= [1, 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20] Nodes.
- **Total duration test:** $\sim 1 \text{ hour per task} \times N^{\text{scaling\_test\_cpu}}_{\text{task,job}} \times 5 \text{ storages}$
- **Number of simultaneous jobs in each task:** $J_{cpu}^{max}$
- **Special test conditions:**
@@ -98,15 +98,15 @@ The results for the same number of nodes with the number of cores per node restr
## 2. Tests scaling in the mem nodes
- **Test name:**: scaling_test_mem
- **Nodes involved**: big memory cpu computational nodes
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a periferal protein system in Gromacs.
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a peripheral protein system in Gromacs.
- **tpr file:** sys1_150k_gmx2024.tpr 
- **Number atoms:** 149261
- **Short description of the test**: Full (36 cores max) ocupation of the computational nodes using 1 or more nodes per job. There might be few unallocaded nodes if $N_{mem}$ is not a multiple of $N^{\text{scaling\_test\_mem}}_{task,job}$.
- **Short description of the test**: Full (36 cores max) occupation of the computational nodes using 1 or more nodes per job. There might be few unallocated nodes if $N_{mem}$ is not a multiple of $N^{\text{scaling\_test\_mem}}_{task,job}$.
- **Number of tasks:** 2 x 5
- **Storage for tasks:** This test must try all 5 different NFS shares in storage1, storage2, scratch1,  scratch2, cryo2.
- **Execution time per task:** 1 hour
- **Number of computational nodes per job in each task:**  $N^{\text{scaling\_test\_men}}_{\text{task,job}}= [1, 2] Nodes.
- **Total duration test:** $\sim 1 \text{ hour per task} \times N^{\text{scaling\_test\_men}_{\text{task,job}} \times 5 \text{ storages}$
- **Number of computational nodes per job in each task:**  $N^{\text{scaling\_test\_men}}_{\text{task,job}}$= [1, 2] Nodes.
- **Total duration test:** $\sim 1 \text{hour per task} \times N^{\text{scaling\_test\_men}}_{\text{task,job}} \times 5 \text{ storages}$
- **Number of simultaneous jobs in each task:** $J_{mem}^{max}$
- **Special test conditions:**
   - All tasks in the test need to be run sequentially without interruption to ensure that some jobs eventually use nodes placed in different sections of the cluster.
@@ -126,14 +126,14 @@ The results for the same number of nodes with the number of cores per node restr
## 3. Tests scaling in the gpu nodes
- **Test name:**: scaling_test_gpu
- **Nodes involved**: gpu computational nodes
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a periferal protein system in Gromacs.
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a peripheral protein system in Gromacs.
- **tpr file:** sys1_150k_gmx2024.tpr 
- **Number atoms:** 149261
- **Short description of the test**: Full (36 cores max) ocupation of the computational nodes using 1 or more nodes per job. There might be few unallocaded nodes if $N_{gpu}$ is not a multiple of $N^{\text{scaling\_test\_gpu}}_{task,job}$.
- **Short description of the test**: Full (36 cores max) occupation of the computational nodes using 1 or more nodes per job. There might be few unallocated nodes if $N_{gpu}$ is not a multiple of $N^{\text{scaling\_test\_gpu}}_{task,job}$.
- **Number of tasks:** 2 x 5
- **Storage for tasks:** This test must try all 5 different NFS shares in storage1, storage2, scratch1,  scratch2, cryo2.
- **Execution time per task:** 1 hour
- **Number of computational nodes per job in each task:**  $N^{\text{scaling\_test\_gpu}}_{\text{task,job}}= [1, 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20] Nodes.
- **Number of computational nodes per job in each task:**  $N^{\text{scaling\_test\_gpu}}_{\text{task,job}}$= [1, 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20] Nodes.
- **Number of cores per node:** Only 36 cores can be used for each node in this test 
- **Total duration test:** $\sim 1 \text{ hour per task} \times N^{\text{scaling\_test\_gpu}_{\text{task,job}} \times 5 \text{ storages}$
- **Number of simultaneous jobs in each task:** $J_{gpu}^{max}$
@@ -157,10 +157,10 @@ The results for the same number of nodes with the number of cores per node restr
## 4. Tests scaling in the biggpu nodes (CPU only)
- **Test name:**: scaling_test_biggpu
- **Nodes involved**: gpu computational nodes
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a periferal protein system in Gromacs.
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a peripheral protein system in Gromacs.
- **tpr file:** sys1_150k_gmx2024.tpr 
- **Number atoms:** 149261
- **Short description of the test**: Full (36 cores max) ocupation of the computational nodes using 1 or more nodes per job. There might be few unallocaded nodes if $N_{biggpu}$ is not a multiple of $N^{\text{scaling\_test\_gpu}}_{task,job}$.
- **Short description of the test**: Full (36 cores max) occupation of the computational nodes using 1 or more nodes per job. There might be few unallocated nodes if $N_{biggpu}$ is not a multiple of $N^{\text{scaling\_test\_gpu}}_{task,job}$.
- **Number of tasks:** 5 x 5
- **Storage for tasks:** This test must try all 5 different NFS shares in storage1, storage2, scratch1,  scratch2, cryo2.
- **Execution time per task:** 1 hour
@@ -185,9 +185,9 @@ The results for the same number of nodes with the number of cores per node restr

## 5. Test endurance cluster - not internode calculations
- **Test name:**: endurance_test_all
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a periferal protein system in Gromacs.
- **Short description of the system**:  Molecular dynamics simulation of membrane interacting with a peripheral protein system in Gromacs.
- **Number atoms:** 172742
- **Short description of the test:** Full occupation of all nodes (cpu, mem, gpu, and biggpu) using 1 job per node in cpu and mem nodes, and n jobs per nodes in the gpu and biggpunodes. It is a stress test where all the cluster should work under full load for at least 23h.
- **Short description of the test:** Full occupation of all nodes (cpu, mem, gpu, and biggpu) using 1 job per node in cpu and mem nodes, and n jobs per nodes in the gpu and biggpunodes. It is a stress test where all the clusters should work under full load for at least 23h.
- **Number of tasks:** 1
- **Number of computational nodes per job in each task:** 1
- **Number of simultaneous jobs in each task:** $J_{cpu}^{max} + J_{mem}^{max} + J_{gpu}^{max} + J_{biggpu}^{max}$
@@ -195,9 +195,9 @@ The results for the same number of nodes with the number of cores per node restr
- **Execution time per task:** 23 hours
- **Total duration test:** 23 hours
- **Special test conditions:**
   - All jobs must run simulatneously for the fixed period of 23h without interruptions.
   - All jobs must run simultaneously for the fixed period of 23h without interruptions.
   - All cores must be used, hyperthreading or similar technology must be activated when available.
   - In gpu containing nodes there should be 1 job running for each gpu (n_jobs). The cores in the node should be devided equally devided between the n_jobs.
   - In gpu containing nodes there should be 1 job running for each gpu (n_jobs). The cores in the node should be divided equally divided between the n_jobs.

- **Minimal performance for each job:**
    - jobs in cpu and mem nodes: 50 ns/day