LUMI @ CSC ========== This page only provides HiPACE++ specific instructions. For more information please visit the `LUMI documentation `__. Log in with ``@lumi.csc.fi``. Running on GPU -------------- Create a file ``profile.hipace`` and ``source`` it whenever you log in and want to work with HiPACE++: .. code-block:: bash # please set your project account export proj= # required dependencies module load LUMI/25.03 module load partition/G module load PrgEnv-amd/8.6.0 module load rocm/6.3.4 module load buildtools/25.03 module load cray-hdf5/1.14.3.5 export MPICH_GPU_SUPPORT_ENABLED=1 # optimize ROCm compilation for MI250X export AMREX_AMD_ARCH=gfx90a # compiler environment hints export CC=$(which cc) export CXX=$(which CC) export FC=$(which ftn) Install HiPACE++ (the first time, and whenever you want the latest version): .. code-block:: bash source profile.hipace mkdir -p $HOME/src # only the first time, create a src directory to put the codes sources git clone https://github.com/Hi-PACE/hipace.git $HOME/src/hipace # only the first time cd $HOME/src/hipace rm -rf build cmake -S . -B build -DHiPACE_COMPUTE=HIP -DHiPACE_openpmd_mpi=OFF cmake --build build -j 16 You can get familiar with the HiPACE++ input file format in our :doc:`../../run/get_started` section, to prepare an input file that suits your needs. You can then create your directory in ``/project/project_``, where you can put your input file and adapt the following submission script: .. code-block:: bash #!/bin/bash #SBATCH -A $proj #SBATCH -J hipace #SBATCH -o %x-%j.out #SBATCH -t 00:30:00 #SBATCH --partition=standard-g #SBATCH --nodes=2 #SBATCH --ntasks-per-node=8 #SBATCH --gpus-per-node=8 export MPICH_GPU_SUPPORT_ENABLED=1 # note (12-12-22) # this environment setting is currently needed on LUMI to work-around a # known issue with Libfabric #export FI_MR_CACHE_MAX_COUNT=0 # libfabric disable caching # or, less invasive: export FI_MR_CACHE_MONITOR=memhooks # alternative cache monitor # note (9-2-22, OLCFDEV-1079) # this environment setting is needed to avoid that rocFFT writes a cache in # the home directory, which does not scale. export ROCFFT_RTC_CACHE_PATH=/dev/null export OMP_NUM_THREADS=1 # needed for high nz runs. # if too many mpi messages are send, the hardware counters can overflow, see # https://docs.nersc.gov/performance/network/ # export FI_CXI_RX_MATCH_MODE=hybrid # setting correct CPU binding # (see https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/lumig-job/) cat << EOF > select_gpu #!/bin/bash export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID exec \$* EOF chmod +x ./select_gpu CPU_BIND="map_cpu:49,57,17,25,1,9,33,41" srun --cpu-bind=${CPU_BIND} ./select_gpu $HOME/src/hipace/build/bin/hipace inputs rm -rf ./select_gpu and use it to submit a simulation. .. tip:: Parallel simulations can be largely accelerated by using GPU-aware MPI. To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set and the following flag must be passed in the job script: .. code-block:: bash export FI_MR_CACHE_MAX_COUNT=0 Note that using GPU-aware MPI may require more GPU memory.