# User guide

This section offers simple explanations for the most important
parameters defined in the `KMC.sh` file.

## Launching k-ART simulation
Before launching a k-ART simulation, it is necessary to determine the
length of the simulation, either in number of steps (`NBRE_KMC_STEPS` )
or in total time spent (`TOTAL_TIME`). The simulation will stop when the
first of these two occurrences is met. Default values for these two
quantities are large and it is recommended to fix at least one of the
two.

It is also necessary to set the `TEMPERATURE`. It is defined in Kelvin
(considering that internal energy units are in eV). There is no default
value and it must be set.

    setenv NBRE_KMC_STEPS   100   # Max number of KMC steps to be executed (def: 50 000)
    setenv TOTAL_TIME       1.0   # Maximum simulation time in seconds (def: 20 s)

    setenv TEMPERATURE      600.0 # Simulated temperature in kelvin (no def)

    setenv ELAPSED_TIME     0.0   # Elapsed time before first KMC step
                                  # (will be added to the sim. time)

Repeating the simulations, specially to find a bug, might be necessary.
For this, it is preferable to start with the same seed for the random
number generator. It can be fixed with the following command. When not
defined, it is extracted from the computer clock.

    setenv Random_seed  -1103   # Random number generator see (use only when debugging)

## Second level of parallelization

k-ART is now able to use two level of parallelization. the first one is
about distributing ARTnouveau search throughout a specific group of
submaster, and the second level is about computing force throughtout
each group of workers associated to a specific submaster. The
implementation is done in such a way that one has to define the
following variable.

    setenv   NTRAVAILLEUR     k

With *k* is the number of workers that will compute forces. Be careful
of the value that you might want to use because the number of submaster
is calculated base on the number of worker and the number of cores
asked. If this division is not a integer the code will stop.

## Describing the simulated system

The description of the simulated system includes, of course, its size,
number of species, box type and the force field used to describe it.

### Details of the system: atomic types and active/inactive species

    setenv NUMBER_ATOMS      1018    # Total number of atoms
    setenv NSPECIES          2       # NUmber of atomic species (default: 2)
    setenv ATOMIC_SYMBOLS  "Fe C:active Ni:inactive" # Atomic symbols of atoms used in the
                                                     # simulation, they must be given in
                                                     # the same order than in the potential.

    setenv ATOMIC_TYPE_FILE   'Atomic_types'  # List of atomic types (read at start of sim)
                                              # (def: Atomic_types)

We can set the atomic symbols using the the string `ATOMIC_SYMBOLS`. It
is set as a list with the names of each atomic symbol. Also we can say
if each atom type is *active* or *inactive*, that is, if to launch for
researching of events on it or not. We can type for example: "Fe\" (or
"Fe:active\", although by default all elements are active, so this is
not necessary) and if inactive: "Fe:inactive\". As another example, in
Si-H, the H could be declared *inactive* such topologies are never
centered on it as the relevant moves involve Si rebonding.
`ATOMIC_SYMBOLS`\" can be set to a string of a Maximum of 30 atomic
symbols.

If the list becomes large we can use the file *Atomic_types* which also
contains the name of the atomic species as well as whether they are
*active* or *inactive*. If neither of these variables are set, default
values are used. If both are used, the *Atomic_types* is used. Comments
can also added to this file by starting the line with \#. The format of
the *Atomic_types* file is the following (see the `Si-H` directory in
`EXAMPLES`):

      # Atomic name,       state (launch for search of events on it if active)
       Si active
       H  inactive

### Maintaining different subgroups of the same element as active or inactive

In some cases, it is useful to select as active only a subset of all
atoms of the same species. To do so, simply give different names to the
various subsets of the same element. For example, with three regions for
a block of silicon. In `KMC.sh` write:

      setenv NSPECIES              3     # The number of different atom types (default: 2)
      setenv ATOMIC_SYMBOLS        "Si Sit:inactive Sib:inactive"

For `lammps`, you will need to modify the potential. For example, for
Silicon:

      pair_style sw
      pair_coeff * * Si.sw Si Si Si

and define the masses

      1 28.0855
      2 28.0855
      3 28.0855

### Soft or strict activity

By default, the code uses so-called "soft activity", i.e. no event are
launched from *inactive* species but events can be recentered around
these atoms.

Setting `SOFT_ACTIVITY` to `.false.`, no event can be centered on
*inactive* atomic species, even if the initial deformation was made on
an *active* atom. This, however, can create problems when, for example,
the displacement of the interesting species involves the motion of
inactive species on the same scale. This setting can be made by
inserting the following into KMC.sh:

**Note: This option biases the kinetics and must be used carefully.**

A system might use both soft and strict activity for different atom
types. For example, if a big systems has many atoms that never will be
moved by events thoughout the whole simulation, these atoms can be
declared strictly inactive to not analyze their topologies and therefore
speed up the simulation's initialization time. If also certain atoms
won't yield relevant events, they should be declared softly inactive.\
This can be achieved by not using the `SOFT_ACTIVITY` option and having
an *Atomic_types* file as follows:

      # Atomic name,       state (launch for search of events on it if active)
       Si active
       H  inactive
       C  strictlyinactive

Now, C behaves as if under the option `SOFT_ACTIVITY .false.`, H as if
`SOFT_ACTIVITY` was `.true.`, and Si is active.

The keyword "strictlyinactive" does not work in `ATOMIC_SYMBOLS` yet.
This should be implemented in the future and, for better consistency,
the option `SOFT_ACTIVITY` should probably be removed.

### Box size and boundary conditions

Currently, k-ART allows orthorombic as well as triclinic boxes with
periodic conditions in all directions. The box size is defined in length
units and should be the full box size.

For a cubic box, simply use `SIMULATION_BOX`. For other boxes, it is
necessary to specify the box length for each direction separately. When
simulating a surface, for example, simply use a very large `Z_BOX`

    setenv SIMULATION_BOX    29.0638147  # Size of the simulation box (x, y and z)

or

    setenv  X_BOX  28.02    # Simulation box size in the x dimension
    setenv  Y_BOX  14.17    # Simulation box size in the y dimension
    setenv  Z_BOX  68.45    # Simulation box size in the z dimension

These parameters may also be defined in the k-ART input configuration
file. For a parallelipedic box, only one line is needed after the energy
line. The terms represent, respectively, the size of the box in the x, y
and z directions.

    total energy:        -2213.34826631256
         21.7200000000000     21.72000000000     21.7200000000000

However, when simulating a triclinic box, three lines are needed. The
triclinic parameters matrix is defined following the LAMMPS formalism.
The documentation can be found on the
[](http://lammps.sandia.gov/doc/Section_howto.html#triclinic-non-orthogonal-simulation-boxes)

    total energy :     -2213.34826631256
         21.7200000000000     1.000000000000     2.00000000000000
         0.00000000000000     21.72000000000     3.00000000000000
         0.00000000000000     0.000000000000     21.7200000000000

When using lammps (ENERGY_CALC set to LAM), these parameters are not
needed as the box size is already defined in the input of Lammps; so you
can safely comment these lines. If they are defined they will be
ignored. You may use the following format to input the boxe size to
LAMMPS in the LAMMPS input configuration file. For an orthorombic box,
the three lines after the number of atomic types line are needed. For a
triclinic box, the four lines, including the xy, xz and yz term, are
needed.

    511 atoms
    1    atom types
    0.0 21.72 xlo xhi
    0.0 21.72 ylo yhi
    0.0 21.72 zlo zhi
    1.0 0.0 0.0 xy xz yz

### Frozen boundary conditions

It is also possible to fix a subset of atoms, i.e., to prevent them from moving and keep their original coordinates. This can be used to impose an external deformation, to look at one surface, fixing the other surface, for example.  To allow for most flexibility, the code does not analyse the geometry of the system (surfaces or other) and simply requires a list of atoms that are prevented to move.

To imposed fixed boundaries first set the variable `FIXED_BC` to `TRUE`. 

By default, the list of fixed atoms must be placed in the file `Fixed_atoms`. This name can be changed with the envrionnment variable ``FIXED_ATOMS_FILE`:

    setenv FIXED_BC             .true.
    setenv FIXED_ATOMS_FILE     new_name.dat   # Name of file that contains the list of atoms fixed during the simulation.
                                               # This is optional, one can used "Fixed_atoms' which is the default.

The file must contain the atomic number, one atom by row, followed by `free`  or  `fixed`.  More specifically: the file must respect the following format

1. atomic id (a number) in the **first 8 colums**
2. followed by `fixed` or `free` **starting in column 9**

For example, to fix atoms 1 and 3, the file will contain (omit the first line, that shows colum position).

    12345678901234567890  # Column number, DO NOT PUT IN THE FILE
    1       fixed
    2       free
    3       fixed
    4       free
    5       free

Atoms that are not in the list are, by defaults, set to `free`, so it is sufficient, in general, to only list atoms that are fixed.


### Open boundary conditions

The kART code requires periodic boundary conditions. However, to set open boundary condition, it is sufficient to define a simulation box that exceeded the region occupied by the atoms. For example if you want to simulate a cluster with a 10 Å-radius,  you can define your box with a size of 50 or 100 Å, to ensure that there will be no interaction between images. 

## Choice of forcefield/neighbour lists

A few potentials are directly implemented in k-ART

1.  Stillinger-Weber with the original parameters (SWP) or the modified
    ones by Vink *et al.* for simulating amorphous silicon (SWA);

2.  The EDIP potential, still for silicon (EDP);

3.  The MEAM potential, implemented by Noël Jackse for Si/Au (MEA)

4.  The EAM potential used for simple metals (EAM).

5.  A mixture of modified Stillinger-Weber potential coupled with a
    Slater-Buckingham for H-H interactions. This is not great for H
    diffusion (SIH) (see Ref. [@mousseau90]).

Of those, only Stillinger-Weber is program both with a global and local
calculation of forces.

When those forcefields are not sufficient, it is possible to turn to
LAMMPS to access its vast set of forcefields. To do so, one has to
select the `LAM` option for `ENERGY_CALC` and to provide the appropriate
*LAMMPS* input file.

LAMMPS works in a number of different energy and length units. Even
though k-ART does not formally care, all default parameters are defined
for eV/Å set of units. If the potential is in kcal, it is possible to
change it directly at the k-ART level. Other options can be implemented
but are not at the moment.

        System                      Potential                  Command   Neighbours
  ------------------ ---------------------------------------- --------- ------------
       silicon                   Stillinger-Weber                SWP     TTL or PAR
                                   Modified SW                   SWA     TTL or PAR
                                       EDIP                      EDP        TTL
        metals                         EAM                       EAM        TTL
      Si/metals                        MEAM                      MEA        TTL
   silicon/hydrogen   Stillinger-Weber and Slater-Buckingham     SIH        TTL
       generic                        Lammps                     LAM        TTL

pt

    setenv ENERGY_CALC        SIH     # choose between EDP, SWP, SWA, SIH or LAM (Lammps)
    setenv FORCE_CALC         TTL     # choose TTL (total force) or PAR (partial force)

    setenv INPUT_LAMMPS_FILE  'in.lammps'  # LAMMPS input file when ENERGY_TYPE = LAM

    setenv UNITS_CONVERSION  metal    # Converts the energy and distance units
                                      # to ensure eV and Angstrom in kART

When using an internal potential, it is also necessary to parametrize
the neighboring list. These options are dependent on the type of
forcefield (global or local). Except for Stillinger-Weber, it is
therefore necessary to select : `TTL` or `VER`.

    setenv UPDATE_VER_NEIB   TTL      # TTL for total force or PAR for partial force
    setenv NEIB_CALC         ALL      # ALL or VER
    setenv UPDATE_TOPO       TTL      # TTL or PAR

    setenv PAR_DISP_THRESH2  0.00001  #max displacement squared which triggers an update of
                                      # neighbor list with VER (default: 0.00001)

## Managing output: options and file names

A number of options are possible for outputs, depending both on the amount of information you want and how you want to manage it. 

Here is the list along with brief description.

The following allow to follow the run itself. `PRINT_DETAILS`, for exemple, shows the details of activation and minimization for each attempted event. It is particularly useful when adjusting initial parameters.

If `PRINT_DETAILS = .true.`, then adding `MINSAD_DETAILS`, force the code to write in a director `MINDSAD_DIR` all saddle and mininimum files generated. While this later option generates a large number of files, it might be useful for understanding problems with some events. 

    setenv PRINT_DETAILS                 .true.  # Prints the details of activation and minimization steps in sortieproc (def: false)
    setenv MINSAD_DETAILS                .false.  # Prints the details of activation and minimization (def: false)


The  `STATISTICS` option generates a number of files describing the simulation, including : `Energies.dat`, that presents information about accepted events, including energies, barriers, rates and displacements; `Gen_ev.dat`, that gives informations about generic events searches and more at each step, `Spec_ev.dat`, that focuses on specific events, `selec_ev.dat` and `topo_stat.dat`, that provide information about selected events and topologies at each step of the KMC simulation.

    setenv STATISTICS                    .true.   # Write statistics about force and event calculation  (def: true)


### Event catalog - detailed events

The next one present informations about events per se. Generic and specific events, i.e. all details, are saved in a non-formatted file to save space (generic and specific are saved in a different file). It is possible, however, to also write these files in a text form. 


    setenv USE_TXT_EVENTFILE             .true.   # If true, writes each generic event in a different file (def: false)
    setenv OUTPUT_SPECIFIC               .true.   # If true, writes each specific configuration in a different file (def: false)
    setenv USE_TXT_SPECFILE              .true.   # If true, writes each specific events in a different file (def: false)

*NOTE*: THe option `USE_TXT_SPECFILE` is not yet operational. At the moment, specific files are only write to text format. 


It is possible to impose a name for the event files and directories (if USE_TXT_EVENTFILE is `TRUE`):
    setenv UFT_EVENTFILENAME       'events.uft'   # The name of the uft event file (default: events.uft)
    setenv EVENTS_DIR               'EVENT_DIR'   # The name of the directory where the event txt files are stored (default: EVENT_DIR)


To convert the catalog from one format (txt or utf) to the other, one may use the following commands in the `KMC.sh` file. *When this is done, the code converts the catalog and stops immediately, without any event search.*

    setenv CONVERT_TO_UFT          .false.        # IF true, the program will read the txt event files and create a new uft file
    setenv CONVERT_TO_TXT          .false.        # IF true, the program will read the uft file and copy the catalogue to txt files (use for vizualization) 


### Event catalog - overall properties

The succinct event catalog describing the general properties of events available at each step provide an information unique to k-ART. With it, it is possible to reconstruct the complete local landscape. By default, the output is written in a multiple text files. If a single file is wanted (to reduce the number of files, for example), set `SPLIT_EVENT_FILE` to `FALSE`. 

    setenv OUTPUT_CONFIG_EVENTS          .true.   # If true, creates an event catalog (all events available at each step - def: true)
    setenv SPLIT_EVENT_FILE              .false.  # If true, puts the event catalog in separate files for each event (def: true)

Note that `OUTPUT_CONFIG_EVENTS` must be `TRUE`for the second option to mean something.


The next option defines the completeness of the output configuraitons. By default, init-saddle-accepted configuration are written along the trajectory. It is also possible to save only minima along the pathway (set `ALLCONF_WITH_SADDLE`to `FALSE`):

    setenv ALLCONF_WITH_SADDLE           .true. # If true, write full configuration events including init-saddle-conf. (def: true)
    

Other outputs

    setenv OUTPUT_NEB_GEN_EVENT        .true.    # Can be useful
    setenv SAVE_FULL_EVENTS              .true.          # save full events during catalog and after one file per node


## Restarting

It is possible to restart and extend a simulation. This simulation can
start from the last point, using the catalogue already computed or
reconstruct one if there are problems with the one at hand.

It is also possible to start a *new* simulation using a previously
calculated catalog. In this case, one has only to start with
`RESTART_IMPORT` set to `.true.`

By default, all these options are set to `.false.`

In particularly large or complex systems building the initial catalog
can be expensive. Moreover, if the simulation is interrupted during the
construction of the catalog kART will have to restart it from nothing.
To have the possibility to build the catalog in multiple runs the
following parameter in the environment file must be set to true (in the
initial run).

        setenv  SAVE_SEARCH_LIST      .true.

This will write the formatted text file `search_list.txt`. To resume
building the catalog in a subsequent run the following parameter must be
set to true in the environment file:

        setenv RESUME_BUILD_CATALOG   .true.

## Basin parameters

KMC methods are limited in time by the lowest-energy barrier available
at any time. When this low-energy leads to a state with a significantly
lower energy, it belongs to the kinetics of the systems. However, these
low-energy barriers often connect non-diffusive states of similar
energy, i.e., states that do not lead to the structural evolution of the
system. Such states are called *flickers* and represent the bane of KMC
simulations.

In kinetic ART, we handle both types of events specifically.

### Low-energy barrier events

Very low-energy barriers leading to lower-energy states could be handled
directly, without going through a KMC choice. Indeed, these barriers
typically involve time scales of 10$^{-10}$ to 10$^{-13}$ s and so do
not influence the time scale for the problem typically studied with
k-ART. There is therefore no impact in applying them directly. The
advantage would be that doing so decreases considerably the cost of
handling these low barriers, allowing us to focus on the kinetics of
interest.

**Further improvements: very low-energy barriers.**

At the moment, the whole catalog of generic events is constructed before
applying a move to the lowest energy barrier (i.e., below the threshold
for real KMC). In some disordered systems, it could be preferable to
perform the moves as these low barriers are met and to update the
catalog only when no new very low-energy barrier are found.

This is yet another modification that we are planning for k-ART in the
near future.

### Flickers

There are many ways to handle these flickers. In k-ART, we use the
autoconstructing basin mean-rate method (bac-MRM), a modified version of
the mean-rate method proposed by Puchala *et al.*. [@puchala]. It is
described in some details in Ref. [@beland2011].

    setenv OSCILL_TREAT    BMRM    # choose between BMRM, TABU or NONE
    setenv MIN_SIG_BARRIER  0.1    # Max height of dir. and inv. barrier for an event
                                   # to be considered inside a basin
    setenv BASIN_LOCAL     .false. # If true, local basins are used (default: false)
    setenv BASIN_RADIUS    10.0    # If using local basin, radius of a local move

We have also implemented a *TABU* algorithm. However, for many systems,
this algorithm biases significantly the kinetics and we do not really
recommend it. A description of the method can be found in
Ref. [@vocks2005; @chubynsky2006].

The following environment parameter define its use.

    setenv OSCILL_TREAT   TABU       # choose between BMRM, TABU or NON
    setenv FLICKER_DIR   'FLICKERS'  # Directory where the TABU event files are stored
    setenv EVENTS_MEMORY   100       # Number of stored event TABU (default: 100)
    setenv BASIN_MEMORY    10        # Number of stored basins for TABU (default: 10)

### Manually adding basin events

In some cases, it is helpful to add events with high energy barriers to
a basin. For example, it can happen that a system flickers between two
states connected by a large $\Delta E$ because any other event has
significantly higher barriers. Treating this transition as a basin
solves this. When using BMRM, it is possible to load a file with the
default name `Event_basin` and the following structure:

       #  initial         saddle       final
          1247195         80279       1247195
       ...

The file name can be changed with

    setenv EVENT_BASIN_FILE       Event_basin # File that is read for additional basin events

Using this file means that events will be basin events that have energy
barriers below `MIN_SIG_BARRIER` or topologies from the list or both.
The reverse event to any list entry will automatically be considered.

### Basin auto analysis

Because it can be tedious to manually add events to the event basin
file, there is the basin auto analysis. Its parameters are

    setenv BAA_STEPS          3       # Number of steps analysed (default 0)
    setenv BAA_RADIUS         0.3     # Radius for atom to move in, in angstrom (default 0.3)
    setenv BAA_MONITORED    100       # Particle to be monitored for the analysis
                                      # (default NUMBER_ATOMS, i.e. the last atom)

The method remembers the last $N=$ `BAA_STEPS` positions of the atom
with identifier `BAA_MONITORED` and if all of these positions lie within
a sphere with radius `BAA_RADIUS`, the event last executed will be added
to the `EVENT_BASIN_FILE` file. If the parameter `BAA_STEPS` is not
given or identical to 0, the basin auto analysis is disabled.

## Topological identification of local geometries

K-ART relies on the assumption that we can identify the local geometry
around each atom in the system by looking at the topology. K-ART works
by generating the connectivity graph of a cluster of atoms centered on
the target atom. While k-ART has functions to deal when this one-to-one
correspondence between geometry and topology breaks down, it is
important to choose well the parameters used to construct the
connectivity graph. A topology label is associated with each
connectivity graph to identify it.

Two file names can be defined for the outfile statistics regarding
topologies. These files are not used for restarting.

      setenv TOPOLOGY_FILE  'Topologies'      # Store info about topologies
      setenv TOPO_STAT_FILE 'topos.list'      # Store statistics about topologies

### Defining graph size: `MAX_TOPO_CUTOFF`

The three parameters used defining the graph size are:

`MAX_TOPO_CUTOFF`, `TOPO_RADIUS` and `MAX_NODES_GRAPH`.

`MAX_TOPO_CUTOFF` defines the maximum distance for which 2 atoms are
considered connected. This value is the same for all atom types. A good
value of `MAX_TOPO_CUTOFF` usually can be found by looking at the radial
distribution function (RDF) of the system. The value should generally be
between the 1st and 2nd peaks. It is important that minute variations in
bond length do not affect the connectedness of two neighboring atoms. In
fact, this is good because the program will be able to identify the same
structures even if the actual bond lengths vary slightly. It is easier
to find this value in crystalline systems since the RDF has well defined
peaks. It is more difficult in amorphous systems, where there is no
clear separation between the peaks. Even so, choosing a value where the
RDF is a minimum between the 1st and 2nd peaks was found to be optimal.

In the case where k-ART cannot recover the geometry from a topology
label, it does what we call splitting the topology. In this case, the
original topology label is called the primary label and we associated
with it secondary labels by changing the cut-off used to connect atoms.
The parameter `MIN_TOPO_CUTOFF` determines the smallest cut-off value
used. Each time the topology is split, the cut-off will be reduced from
its original `MAX_TOPO_CUTOFF` by a distance of

(`MAX_TOPO_CUTOFF`-`MIN_TOPO_CUTOFF`)/10

and a new topology label is calculated. This is done until a different
topology label is found. The value of `MIN_TOPO_CUTOFF` should not
ideally be smaller that the minimum bonding distance observed but small
enough so that the connectivity distance changes enough to find new
topology labels.

### Improvement for alloys: Multiple `MAX_TOPO_CUTOFF`

You may also define multiple cutoff distances between atoms for
multi-atomic crystals.

In this case, these may be defined directly with both the
`MAX_TOPO_CUTOFF` and `MIN_TOPO_CUTOFF` in the environment file:

    setenv MAX_TOPO_CUTOFF       "2.0 3.2 2.5 2.7 2.9 3.1"
    setenv MIN_TOPO_CUTOFF       "1.8 2.9 2.1 2.5 2.6 2.7"

The cutoffs can also be placed in a separate file defined by the
environment variable `TOPO_CUTOFF_FILE`. If this variable exist, it
supersedes the `MAX_TOPO_CUTOFF` and `MIN_TOPO_CUTOFF` variable. In the
file, the first line represents the `MAX_TOPO_CUTOFF` values and the
second the `MIN_TOPO_CUTOFF` ones:

    "2.0 3.2 2.5 2.7 2.9 3.1"
    "1.8 2.9 2.1 2.5 2.6 2.7"

For a crystal with two atom types, the format is 1-1, 1-2, 2-2 cutoff
lengths. For a crystal with three atom types, as shown in the above
examples, the format is 1-1, 1-2, 1-3, 2-2, 2-3, 3-3 cutoff length. The
same logic applies for crystals with more atomic types. Note that with
both methods the cutoffs lengths must be defined between quotation
marks.

### Defining graph size: `TOPO_RADIUS`

The second important parameter is `TOPO_RADIUS`, which defines the
radius of the sphere centered on the target atom that is used to include
neighboring atoms in the graph. The larger the sphere, the more atoms
are included and the more complex the graph is. A too small value will
result in a graph that does not properly correlates with the local
geometry. A too large value will result in clusters that are too
specific and include atoms situated far away that do not affect the
transition states centered on the target atom. We find that in *c*-Si
and *a*-Si, a value of 6 Angstroms works well. Note that as the volume
of this sphere increases, so does the probability of having atoms near
the edge of the sphere. These atoms' positions can fluctuate in and out
of the surface and change the number of atoms in the graph, resulting a
multiple changes in the topology. Again, the program will work but will
be less efficient because for each new topology, a new series of
(redundant in this case) event searches are launched.

      setenv   TOPO_RADIUS      6.0   # by default, set to 6 Angst.
      setenv   MAX_TOPO_CUTOFF  2.8   # maximum length cutoff
      setenv   MIN_TOPO_CUTOFF  2.2   # Also sets a minimum distance

### Further improvements for alloys: `MAX_NODES_GRAPH`

The cut-off defining the limits of a graph used in the topological
classification is not ideal for an alloy. In this case, particularly
when the atomic size varies greatly, it would be preferable to use a
fixed number of atoms around the center.

This is possible with the command `MAX_NODES_GRAPH`, which imposes a
maximum number of nodes (atoms) to the graphical analysis. To use this
command, you first must set a cut-off size for `TOPO_RADIUS` that
corresponds to the correct size of a cluster containing only your
biggest atom (which would have the lowest number of atoms for a given
radius).

You can then use `MAX_NODES_GRAPH` to impose a maximum number of nodes
(atoms) for the graph.

      setenv   MAX_NODES_GRAPH  42

To optimize these parameters, it is best to first run kART on a system
with only the largest species and to identify the size of the graph
corresponding to `TOPO_RADIUS`. Look, for this, into the `sortieproc.0`,
which gives the cluster size (number of nodes) for the largest atoms.

    Fetching topology label for atom:      1( cluster size:  45)
     Topology has changed :       0-->  617416
     Assigning topoId      617416 to atom           1

Here, for example, setting `MAX_NODES_GRAPH` to 45 would ensure that
most clusters have a similar size, facilitating classification and
cataloguing

### The special case of random alloys: `UNIQUE_CRYST_STRUCT`

In the case of an alloy whose atoms are randomly distributed on the
lattice sites, without any short or long range order (solid solutions
for example), the crystallinity should not depend on the atom types but
only on the positions of the atoms. The default calculations in kART are
not performed this way but one can decide to make the crystallinity
check on atomic positions only by setting:

        setenv   UNIQUE_CRYST_STRUCT = .true.

By doing so, even if the topologies id found by kART do not correspond
to the `CRYST_TOPOID` given as a parameter, if they are structurally
equivalent they will be considered as crystalline and will be ignored.

### Leaving some topologies aside.

K-ART's performance is proportional to the number of different
topologies in the system and not necessarily with the number of atoms.
However, to avoid spending time on environments where no event is
expected, such as perfectly crystalline regions, it is possible to
remove these events from the catalog.

One way to do this is to include the file `Topo_ignore` in the
simulation directory. This file should contain the topology labels (one
per line) associated with region to be ignored.

The previous solution is applicable to neglect any type of topologies.
To avoid spending times on barely non-crystalline environments (i.e.
those with a defect at the edge), it is also possible to set, in the
environment file, up to five crystalline topologies along with a small
cutoff for ignoring them. Either or both of them can be left out if not
needed.

The use of many crystalline topologies is useful in the case of an
ordered compound, for example.

To get this label, it is necessary to run k-ART on a perfect crystal,
for example, with `TOPO_RADIUS` set to the same value of
`CRYST_TOPO_RADIUS`. Run for some seconds, and from *sortiproc.0* go to
line "`Listing topologies in system`\", and get the number (if system is
not perfect crystal, the most repeated number is the `CRYST_TOPOID`).
IMPORTANT: After this, reset `TOPO_RADIUS` to its previous value and
check that the crystalline topology is already removed by rerunning.
Then, in *sortiproc.0* at the end of "`Listing topologies in system`\",
look for "`topo has no events x`\". Verify that the number $x$
corresponds indeed to the crystalline topology (realize that $x$ is
different to the number set in `CRYST_TOPOID` as `TOPO_RADIUS` is
reset).

Note that all new topology labels are written in the *topos.list* and
Topologies files. The file *topo_stat.dat* keeps track of the statistics
associated with topology labels per KMC step.

### Leaving specific events aside

In some cases it is good idea to ignore some problematic events. There
are two ways to do this.

If an undesired events (sometimes an error) has popped in the catalog,
then it might be useful to remove it. This is achieved by creating a
filed called `Event_remove` that contains the list of all undesired
events. It has the following structure.

      # Event id (one per line)
        330236
        576443
        ...
        576443

where the numbers can be found, for example, in the `event_list_xx`
files.

Because it removes the events from a previously constructed catalog, it
needs the file .utf@ in order to work. If there is no previous catalog,
the code will crash.

Now, it is also possible prevent given events to be stored in the
catalog. Because the events ids are not uniquely defined, it is then
necessary to define the events in terms of their topological identifier
at the initial, the saddle and the final states. To allow for further
distinction (for example, a rare, but unrealistic events that involves a
jump to the second neighbour site but through the same saddle point), it
is also necessary to define a minimum displacement threshold. Any event
involving the same topologies but with a smaller displacement will not
be ignored.

The default file name is `Event_ignore` and has the following structure

       #  initial         saddle       final      min_dr
          1247195         80279       1247195      0.0
       ...

For both files, comments can be added to that file using the character
`#` at the beginning of line.

## Activated Event generation

For each topology label found, k-ART launches a series of ART nouveau
searches to find all the activated events. The events associated with a
topology label are called **generic** events. K-ART assumes that all the
atoms sharing the same topology label have the same **generic** events.
While the actual values of the activated energy and displacements can
change slightly atom to atom, they all have them.

The important non-ART nouveau parameters for the **generic** event
generation are `SEARCH_FREQUENCY` and `USE_LOG_SEARCH`. These control
how many event searches are launched per topology. If `USE_LOG_SEARCH`
is false, then this number is simply `SEARCH_FREQUENCY`. If
`USE_LOG_SEARCH` is set to `true` (default value), then the number of
searches launched are a function of how many times we have seen this
topology before. The number of searches is calculated every turn and is
equal to

$$\label{log_searches}
 max(SEARCH\_FREQ \times (1 + \log_{10} (COUNTS)) - ATTEMPTS, 0)
%SEARCH\_FREQ \times (1 + \log_{10} (COUNTS)) - ATTEMPTS
%a = b$$

where $COUNTS$ is the number of times the topology has been seen and
$ATTEMPTS$ is the previous number of event searches launched. This
logarithmic search function is used so that topologies seen more often
are searched more. Moreover, topologies are also searched multiple times
during a simulation (and not just when first found), albeit less and
less often.

All **generic** events found are written to a binary file called
*events.uft* by default. This file can be reused to restart a simulation
or to start a new one. The topology file *topos.list* also needs to be
provided. If the parameter `USE_TXT_EVENTFILE` is set to true, a folder
called *EVENT_DIR* will also be created and a text version of the event
files will be created during the simulation. Note that in the TOOLS
directory, the program *kART_uft2txt.f90* is provided. This stand-alone
program can convert an *events.uft* file to text format for
visualization purposes (see the *READ_ME.txt* file).

### Generic events

Generic events are stored in the catalog. They are used to construct, at
every step, the specific events that are put in the KMC tree.

Every time a new topology is found, a search is launched to associate
generic events with this topology. The number of searches is defined by
`SEARCH_FREQUENCY`.

As a topology is found more often, new searches are made to minimize the
probability of having missed an important barrier. This is done on a
logarithmic scale, i.e., the number of addition searches is given by the
logarithm of a topology's occurrence multiplied by `SEARCH_FREQUENCY`.

### Specific events

When arriving in a new minimum, k-ART first updates its list of
topologies and completes the list of generic events.

But default all generic events contributing to 99.99 % (1 in 10 000) of the rate are reconstructed for specific configurations, with the saddle
points fully relaxed. This ensures that elastic and geometric effects
are fully taken into account

It is possible to modify this proportion with the `SPECIFIC_THRESHOLD` paramter.

This one is set, by default, as :

    setenv SPECIFIC_THRESHOLD   99.99

To reconstruct as specific only events having a propability of 1 in 1000, for example, enter:

    setenv SPECIFIC_THRESHOLD 99,9


A number of environment variables relate to the reconstruction and
relaxation of specific events. In general, the default values can be
set.

However, for topologies with high symmetries (e.g. relaxed interstitial
in a crystalline environment), there can be problems with the mapping of
the events onto the system. For $n$-fold symmetry, the chances of
correct mapping are $1/n$ . If in the sortieproc.0 file there is "Mapping
\... failed", it can help to increase the parameter `REFINE_ATTEMPTS` to
for example $N = 50$, reducing the overall probability of mapping
failure $(1-1/n)^N \ll 1$. Increasing `REFINE_ATTEMPTS` comes at almost
no cost.

    setenv REFINE_ATTEMPTS  2   # Default value for this parameter


### Event analysis

## Using local forces for accelerating simulations of large systems

Because events are intrinsically local, it is not necessary to compute
the force on all atoms to generate generic events when the box is
sufficiently large, since these will be further refined, taking full
account of the complete system, when relevant.

In this case, kART can therefore compute forces only on a limited subset
of atoms, independently of system size, which can considerably
accelerated the generation of a catalog and even the relaxation of
specific events as described below.

In the next subsection, we describe the parameters associated with this
option. In the following one, its implementation.

### Selecting parameters

The option `LOCAL_FORCE` determines the use of local force. By default,
this option is turned down.

When this option is , the global simulation box is cut into cells of
width defined by `CELL_LENGTH`. The same length is used in all
directions, irrespective of the box size. In general, as explained below
this length should be of the order or larger than the potential cut-off.

The second option, `REGION_WIDTH`, defines the number of cells included
in the local region on *each side* of the central cell. For example, if
`REGION_WIDTH` is set to 2, the total length of the local region for
force calculations is 5 cells (2 on each side plus the central cell).

This allow the selection of a *spherical* region around the central atom
defined with a first *inner* region with a radius of ( 2\*REGION_WIDTH
-1)/2 followed by an *outer* shell of width REGION_WIDTH to
(2\*REGION_WIDTH+1)/2.

Since, the local region is set-up with open boundary conditions, as
shown in Fig. [\[fig:local\]] fig:local{reference-type="ref"
reference="fig:local"}, forces on the atoms in the outer shell cannot be
used. This is why, only atoms in the inner sphere are considered for
activation and relaxation. This explains why it is recommended that the
`CELL_LENGTH` be at least as long as the potential cut-off (or, at
least, sufficiently large to ensure that the local-region's boundary
does not affect the results.

If `REGION_WIDTH` is set to 2, as per default, this means that the inner
core, where atoms can move with local forces, is 3 cells large. When
elastic deformation is more long range, it might be necessary to
increase the region width.

Adapt these parameters if the saddle points found with local force are
systematically lost during the final global force convergence.

To check that what these parameters do, the outputs show the number of
atoms in the local region : `nat_inner`, `nat_outer` and
`nat_local = nat_inner + nat_outer` are printed every time a local
region is defined.

Similarly, the convergence to saddle point for specific events indicates
whether local or global forces are used, allowing a check on the
validity of the local region.

If there is little change between saddle points found after local and
global convergence, or if you do not need high accuracy on the saddle
point, you might disable global convergence at the saddle (global
convergence is still used at the final minimum). In this case, set
`GLOBAL_CONVERGENCE` to .false.@

[\[fig:local\]]{#fig:local label="fig:local"}

For large box, it might be difficult to converge the eigenvalue and
eigenvector surrouding a local deformation because of the way the
initial random vector in the Lanczos calculation is set up. To
circumvent this problem it might be useful to set `LOCAL_LANCOS` to
true. In this case, global forces are systematically used, but the
initial deformation for construction the Lanczos matrix is done on a
local region defined by the same parameters ass for `LOCAL_FORCE`.

### Implementation

When set-up, local forces are used for finding all generic events. In
LAMMPS, the local region is positionned in the same box as for the full
system and atoms are not recentered. Periodic boundary conditions
determined by the large box are used to ensure the continuity of the
local region, allowing the algorithm to work even when there are
surfaces and interfaces in the original box.

Convergence of specific events is also first done with local forces,
once the local force is below threshold, convergence is relaunched with
global forces, using the eigenvector obtained with local forces as seed.

## Breaking the kinetics : approach to get specific results

Kinetic ART implements a number of approach that break the correct
kinetics but allow to generate specific desired events. While, these
cases, the kinetics and the sequence of events may be wrong, it can
provide useful information difficult to obtain otherwise.

The following options are possible:

### Active/inactive species

A number of options are available to set some active and inactive species as described in detail here: https://kart-doc.readthedocs.io/en/latest/user_guide.html#details-of-the-system-atomic-types-and-active-inactive-species

Since this option is a very useful way to focus on the right events without too much bias, it is the most used one of this section.


### Adding additional conditions for event selection 

It is possible to program in specific conditions on events to be added to the catalog. This is useful, for example, when one is interested in specific type of mechanisms. For example, if one want to impose diffusion from the surface to the bulk, it is possible to assess, after each event is generated, whether this event is compatible with insertion, something that might not be possible to impose by the initial displacement or even looking at the saddle. 

Setting `ADDITIONAL_CONDITIONS` to `TRUE`  will force the code to call the `additional_conditions` subroutine (to be found in `src/lib`) after an event generated. It is then possible to perform additional tests to decide whether or not the event will be added to the catalog or not. 

The `additional_conditions.f90` file shows an example for an atom diffusing in the bulk. The first lines of the routines define positions and energies at the initial, saddle and final states for use on the various conditions. To prevent an event from being added to the catalog, set:

    success = .false.

Otherwise, if the event should be kepts, then set
    success = .true.


### Remove detailed balance

By default, all events are checked for connectivity. From the saddle point, the configuration is pushed forward and backward to converge both on the new (final) energy minimum and the original (initial) one.

Two parameters are important here:

    setenv CHECK_INI_SAD_CONNECTIVITY     .true. # Impose a check on the connectivity (detailed balance; def: .true.)
    setenv SADDLE_PUSH_PARAM              0.25   # Fraction of displacement for push back to initial point

`CHECK_INI_SAD_CONNECTIVITY` is a logical parameter than enforce the construction of a connected path between initial-saddle-final states. If this parameter is `TRUE`, then any event that is not fully reversible is not added to the catalogue. 

To reconstruct the pathway, the system is first pushed away from the saddle point along the direction of negative curvature. `SADDLE_PUSH_PARAM` determines the size of this push (typically betwen a fraction between 0.1 and 0.25 (10 to 25 %) of the displacement from the initial minimum to the saddle). If the reconstruction fails, this fraction is decreased automatically and a new attemp is made by the code to converge to the initial minimum.

### When not to impose detailed balance?

For some very rough landscape, it might be appropriate to set the `CHECK_INI_SAD_CONNECTIVITY` parameter to `FALSE`. In general, though, this is not recommended. 

### Bias the initial displacement

By default, the initial random displacement is selected from a complete hypersphere. It is also possible to bias the displacement by forcing the initial displacement to be in the negative z direction to favor motion into a surface which is normal to +z. **This option is only applied when the central atom belows to an active species**. This is down with :

    setenv  INITIAL_PUSH_DOWNWARDS   .true.   # Push in the negative z direction 
                                              # (xy anything) (default: .false.)

This should be used with caution as it will then apply to all atom types in the box. 

### Directed pathway

Finally, kART provides an option for constructed a pathway step by step.  Setting the `DIRECTED_PATHWAY` to `TRUE` allows the use to choose the next step among events available in the catalog.  This means that once the event catalog for a configuration is completed, the code will stop and wait for the user to indicate what event to select.

To do that first set 
    setenv DIRECTED_PATHWAY   .true.
in `KMC.sh`


Then launch the simulation. 

One the event catalog for the current state is completed, The code stops, providing the following instructions:

    KMC step no:           1 (           1 for this run)
    **************************************************************************
    Waiting for instructions regarding the next event, event number :     1
  
    The file, named, 'imposed_event', must have the following format: 
  
       event_number  AtomId  IniTopoId eventId  Spec_id  
  
   ****************************************************************************

The code then waits for a few seconds, attempting to read the file. If now file is created, then the code stops. 

To move to the next step, a file `imposed_event` much be created. With the information taken from either `Event_catalogue.dat`or from an event catalog in `EVLIST_DIR`. 

For example, to select the 4th event from 

    #TypeId    AtomId IniTopoId SadTopoId FinTopoId   eventId   Spec_id     barrier     inv_bar    asy_ener   true_rate   inisad_dr   inifin_dr   refined
    #******   *******   *******   *******   *******   *******   *******     *******   *********   *********   *********   *********   *********   *******
          1       331    236619   1253516   1108930    421298         1    2.351033    0.084875    2.266158   2.007E-11    1.610530    1.972890         F
          1       331    236619    624389    296339   1223290         2    2.854080    0.495831    2.358248   1.707E-16    1.627572    2.170290         F
          1       357    236619   1253516   1108930    421298         1    2.351033    0.084875    2.266158   2.007E-11    1.610530    1.972890         F
          1       357    236619    624389    296339   1223290         2    2.854080    0.495831    2.358248   1.707E-16    1.627572    2.170290         F
          1       363    236619   1253516   1108930    421298         1    2.351033    0.084875    2.266158   2.007E-11    1.610530    1.972890         F
          1       363    236619    624389    296339   1223290         2    2.854080    0.495831    2.358248   1.707E-16    1.627572    2.170290         F
          1       365    236619   1253516   1108930    421298         1    2.351033    0.084875    2.266158   2.007E-11    1.610530    1.972890         F

We create a file `imposed_event` with the line:
    1    357    236619 1223290         2 

Then relaunch the code, making sure to set :
   setenv RESTART_KMC           .true.