High Performance Computing Smooth(ed) Particle Hydrodynamics
The successor of miluphcuda targeting GPU cluster via CUDA aware MPI.
This repository implements a multi-GPU SPH & N-body (via Barnes-Hut) algorithm using C++11 and CUDA-aware MPI by combining already proven parallelization strategies and available implementations
with new ideas and parallelization strategies.
See also:
Repository content:
Directory | Description |
---|---|
src/ & include/ | **actual multi-GPU SPH & Barnes-Hut implementation** |
| bin/ | binary to be executed, compile via make
| | build/ | build files, created by make
| | debug/ | debugging with gdb, lldb, cuda gdb (README) | | config/ | config files for settings (.info
) and material parameters (.cfg
) (README) | | testcases/ | test cases including Plummer and Sedov (README) | | cluster/ | information to dispatch simulation on clusters using queing systems (README) | | postprocessing/ | postprocessing scripts (README) | | H5Renderer/ | H5Renderer implementation: basic Renderer (2D) (README) | | utilities/ | utilities e.g. counting lines of code (README) | | doc/ | create Doxygen documentation (README) | | documents/ | several documents including files for README, instructions, notes, ... | | images/ | images for MD files, ... |
implemented using C++ and CUDA-aware MPI.
Implementation details
Directory | File | Description |
---|---|---|
**./** | include/ & src/ directory | |
main.cpp | main: setting CUDA device, config parsing, loading parameters/settings, integrator selection, start of simulation | |
*miluphpc.h/cpp* | abstract base class defining largest reoccuring part of the simulation (right hand side) and assorted high level functionalities | |
particles.cuh/cu | particle class (SoA) and reduced particle class: particle attributes like mass, position, velocity, density, ... | |
particle_handler.h/cpp | handler class for particle class including memory allocation and copy mechanisms | |
simulation_time.cuh/cu | simulation time class: start & end time, time step, ... | |
*simulation_time_handler.cpp* | handler for simulation time class including memory allocation and copy mechanisms | |
device_rhs.cuh/cu | CUDA kernels for resetting arrays/variables (in between right hand sides) | |
helper.cuh/cu | buffer class and sorting algorithms (based on CUDA cub) | |
*helper_handler.h/cpp* | buffer class handler including memory allocation and copy mechanisms | |
subdomain_key_tree/ | (parallel) tree related functionalities including tree construction and domain decomposition | |
tree.cuh/cu | (local) tree class and CUDA kernels for tree construction | |
tree_handler.h/cpp | (local) tree class handler including memory allocation and kernel execution wrapper | |
*subdomain.cuh/cu* | (parallel) tree structures including domain decomposition, SFC keys, ... | |
*subdomain_handler.h/cpp* | (parallel) tree handling including memory allocation and kernel execution | |
gravity/ | gravity related functionalities according to the Barnes-Hut method | |
gravity.cuh/cu | gravity related CUDA kernels according to the Barnes-Hut method | |
sph/ | Smoothed Particle Hydrodynamics (SPH) related functionalities | |
*kernel.cuh/cu* | SPH smoothing kernels | |
kernel_handler.cuh/cu | SPH smoothing kernels wrapper | |
*sph.cuh/cu* | fixed radius near neighbor search (FRNN) and multi-node SPH | |
density.cuh/cu | SPH density | |
*pressure.cuh/cu* | SPH pressure | |
soundspeed.cuh/cu | SPH speed of sound | |
*internal_forces.cuh/cu* | SPH internal forces | |
*stress.cuh/cu* | SPH stress (not fully implemented yet) | |
*viscosity.cuh/cu* | SPH viscosity (not fully implemented yet) | |
materials/ | material attributes (as needed for SPH) | |
material.cuh/cu | material attributes class | |
material_handler.cuh/cpp | material attributes handler class including loading from *.cfg* file | |
integrator/ | child classes for miluphpc implementing integrate() | |
device_explicit_euler.cuh/cu | explicit Euler integrator device implementations | |
*explicit_euler.h/cpp* | explicit Euler integrator logic and flow | |
*device_leapfrog.cuh/cu* | leapfrog integrator device implementations | |
*leapfrog.h/cpp* | leapforg integrator logic and flow | |
*device_predictor_corrector_euler.cuh/cu* | predictor-corrector Euler integrator device implementations | |
*predictor_corrector_euler.h/cpp* | predictor-corrector Euler integrator logic and flow | |
processing/ | removing particles that moved to far from simulation center, ... | |
kernels.cuh/cu | removing particles that moved to far from simulation center based on a sphere/cuboid | |
utils/ | C++ utilites like config parsing, logging, profiling, timing, ... | |
config_parser.h/cpp | config parser based on cxxopts | |
h5profiler.h/cpp | HDF5 profiler based on HighFive | |
*logger.h/cpp* | Logger class and functionalities (taking MPI ranks into account) | |
*timer.h/cpp* | timing events based on MPI timer | |
cuda_utils/ | CUDA utilities including wrappers, execution policy and math kernels | |
cuda_launcher.cuh/cu | CUDA Kernel wrapper and execution policy | |
cuda_runtime.h/cpp | thin CUDA API wrapper | |
cuda_utilities.cuh/cu | utilities for CUDA including simple kernels, assertions, ... | |
linalg.cuh/cu | linear algebra CUDA kernels |
For more information and instructions refer to Prerequisites.md
library | licence | usage | link |
---|---|---|---|
GNU | GPLv3+ | compiler | gnu.org |
OpenMPI | BSD 3-Clause | compiler, MPI Implementation | open-mpi.org |
CUDA | CUDA Toolkit End User License Agreement | compiler, CUDA Toolkit and API | developer.nvidia.com |
CUDA cub | BSD 3-Clause "New" or "Revised" License | device wide parallel primitives | github.com/NVIDIA/cub |
HDF5 | HDF5 License (BSD-Style) | parallel HDF5 for I/O operations | hdf5group.org |
HighFive | Boost Software License 1.0 | C++ wrapper for parallel HDF5 | github.com/BlueBrain/HighFive |
Boost | Boost Software License 1.0 | config file parsing, C++ wrapper for MPI | boost.org |
cxxopts | MIT license | command line argument parsing | github.com/jarro2783/cxxopts |
libconfig | LGPL-2.1 | material config parsing | github.io/libconfig |
make
make debug
./debug/cuda_debug.sh
make single-precision
(default: double-precision)mpirun -np <np> <binary> -n <#output files> -f <input hdf5 file> -C <config file> -m <material-config>
**<binary>
within bin/
e.g. bin/runner
<input hdf5 file>
: appropriate HDF5 file as initial (particle) distribution<config file>
: configurations<material-config>
: material configurationsinclude/parameter.h
make clean
, make cleaner
make remake
Preprocessor directives: parameter.h
Input HDF5 file
Config file
Material config file
Command line arguments
./bin/runner -h
gives help:
The code validation comprises the correctness of dispatched simulation on one GPU and multiple GPUs, whereas identical simulation on one and multiple GPUs are not mandatorily bitwise-identical. By suitable choice of compiler flags and in dependence of the used architecture this is in principle attainable. However, this is generally not useful to apply for performance reasons and therefore at this point not presupposed. Amongst others, three test cases were used for validating the implementation:
each color represents a process, thus a GPU
Plummer
Taylor–von Neumann–Sedov blast wave
Boss-Bodenheimer: Isothermal collapse