LUMI @ CSC
This page only provides HiPACE++ specific instructions. For more information please visit the LUMI documentation.
Log in with <yourid>@lumi.csc.fi
.
Running on GPU
Create a file profile.hipace
and source
it whenever you log in and want to work with HiPACE++:
# please set your project account
export proj=<your project id>
# required dependencies
module load LUMI
module load partition/G
module load PrgEnv-amd/8.3.3
module load rocm/5.2.3
module load buildtools/22.08
module load cray-hdf5/1.12.1.5
export MPICH_GPU_SUPPORT_ENABLED=1
# optimize ROCm compilation for MI250X
export AMREX_AMD_ARCH=gfx90a
# compiler environment hints
export CC=$(which cc)
export CXX=$(which CC)
export FC=$(which ftn)
Install HiPACE++ (the first time, and whenever you want the latest version):
source profile.hipace
mkdir -p $HOME/src # only the first time, create a src directory to put the codes sources
git clone https://github.com/Hi-PACE/hipace.git $HOME/src/hipace # only the first time
cd $HOME/src/hipace
rm -rf build
cmake -S . -B build -DHiPACE_COMPUTE=HIP -DHiPACE_openpmd_mpi=OFF
cmake --build build -j 16
You can get familiar with the HiPACE++ input file format in our Get started section, to prepare an input file that suits your needs.
You can then create your directory in /project/project_<project id>
, where you can put your input file and adapt the following submission script:
#!/bin/bash
#SBATCH -A $proj
#SBATCH -J hipace
#SBATCH -o %x-%j.out
#SBATCH -t 00:30:00
#SBATCH --partition=standard-g
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8
export MPICH_GPU_SUPPORT_ENABLED=1
# note (12-12-22)
# this environment setting is currently needed on LUMI to work-around a
# known issue with Libfabric
#export FI_MR_CACHE_MAX_COUNT=0 # libfabric disable caching
# or, less invasive:
export FI_MR_CACHE_MONITOR=memhooks # alternative cache monitor
# note (9-2-22, OLCFDEV-1079)
# this environment setting is needed to avoid that rocFFT writes a cache in
# the home directory, which does not scale.
export ROCFFT_RTC_CACHE_PATH=/dev/null
export OMP_NUM_THREADS=1
# needed for high nz runs.
# if too many mpi messages are send, the hardware counters can overflow, see
# https://docs.nersc.gov/performance/network/
# export FI_CXI_RX_MATCH_MODE=hybrid
# setting correct CPU binding
# (see https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/lumig-job/)
cat << EOF > select_gpu
#!/bin/bash
export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID
exec \$*
EOF
chmod +x ./select_gpu
CPU_BIND="map_cpu:49,57,17,25,1,9,33,41"
srun --cpu-bind=${CPU_BIND} ./select_gpu $HOME/src/hipace/build/bin/hipace inputs
rm -rf ./select_gpu
and use it to submit a simulation.
Tip
Parallel simulations can be largely accelerated by using GPU-aware MPI.
To utilize GPU-aware MPI, the input parameter hipace.comms_buffer_on_gpu = 1
must be set and the following flag must be passed in the job script:
export FI_MR_CACHE_MAX_COUNT=0
Note that using GPU-aware MPI may require more GPU memory.