Parallelisation of LAMMPS’ force calculations vs local forces

KART provides two main approaches for decreasing the wall-clock cost of computing forces:

Double parallelisation, which parallelizes both the force calculation in LAMMPS and event generation
Local forces, which limits force calculations to a subset of atoms surrouding the generated events (details of the parameters are given in the guide - section: Using local forces for accelerating simulations of large systems)

In this section, we discuss the diffences between the two approaches and when it is best to use them. Note that these two approaches are not incompatible and can used together for additional gains.

K-ART basic parallelization

By default, KART requires parallisation for launching events. Indepdent events for a given topology can be launched in parallel. The resulting events are then analysed on the master node. This means that the number of cores should not be larger than the SEARCH_FREQUENCY parameter, as additional would not be used.

When use local forces

Because of the limits on parallizing k-ART, strategies are need to decrease the cost of a force evalution. The simplest one is to use local forces. Since events are local in nature, affecting typically a few tens to a few hundred atoms, there is no need to compute all forces systematicaly on a simulation box counting, for example, 10~000 atoms.

This is why the local forces) option is provided. With this option, it is possible to limit the number of atoms on which forces are computed to 1000 to 2000 atoms only (i.e. atoms able to move - 500-1000 atoms) and the surrouding fixed atomic shell on which forces need to be partially computed to ensure the right forces on the atoms that can move.

Within this approximation, the full cell is relaxed only once, after each step, to ensure that events are started from globally relaxed configuration.

When to use parallel calculations of forces

In some cases, it is preferable to compute forces on all atoms. When this force calculation is costly, it is then useful to parallize this calculations.

This is done simply by setting the NTRAVAILLEUR parameter to the number of cores you want to use for each LAMMPS force calculation. For example, you wish to have 8 cores per force calculations with 10 k-ART cores (1 Master + 7 workers launching events). In this case, you set up, in `KMC.sh``:

setenv NTRAVAILLEUR   8

and make sure that you launch your code with

% mpirun -np 80 KMC.sh

Note that, in this case, 8 cores are reserved for lammps for each of the kART nodes, including the master node which calls lammps only a few times during the process.

Because of that, it is preferable to maximise first the number of cores for kART and, then add parallelisation on force calculations.

Careful : this can be costly and ill-advised!

Note that launching a single force calculation in parallel is costly, computionally. This is why, for example, it makes no sense to use this approach for cheap forcefields (EAM potentials, LJ, etc.) as the cost of setting the parallel calculation at each force call is greater than the gain in calculating the force in itself.

In that case, if the box is large, it is much more efficient to use local forces.

For expensive potentials (neural nets, ReaxFF, etc.), then such parallisation is generally useful first maximising the number of k-ART cores.

Can I mix both approaches (local forces + double parallelisation) ?

Answer: No

While, in principle, it should be possible to mixe both approaches, technical issues prevent the joint use of these approches. Correcting this limitation is in the list of tasks to be done.