# 03 – Geometry optimization

In our first example, the positions of the atoms in the molecule, commonly called the geometry of the molecule, were given so that they are close to a local minimum of energy. However, this is generally not the case in our input data. Generally we have an initial idea of how the geometry of the molecule should be, but the distances and angles are not given precisely. In these situations, we can employ atomistic simulations to find the geometry that minimizes the total energy of the system. This process of varying the distances and angles of an atomistic geometry to find a local minimum of energy is commonly called relaxation or geometry optimization. Knowing that total energy is one of the output data of our simulation, we can use a rather simplistic (and not very intelligent) method that is to manually vary distances and angles between the atoms and to evaluate the total energy in function of these variables. Since we can have many distances and angles in a complex atomistic geometry, let’s perform that kind of simulation on the simplest molecule I can think of right now.

### Example 02 (H2 distance)

For the molecule H2, there is only one distance that we can vary to evaluate the minimum energy. Let’s create several atomistic simulations with different distances between the atoms of H. To create this variation, we set the position of an atom of H in 0.0 0.0 0.0, and another atom in the position 0.0 0.0 VAR_DISTANCE, where this keyword will be replaced by values between 0.50 and 1.2 Å, with steps of 0.05 Å. That means 15 different simulations. We can create a simple shell program (script) to do this. We could even use another programming language, but the task is so simple that it’s worth doing in shell. The program should create a loop with values from 0.50 to 1.20, create a folder for each value, copy the pseudopotential file and create a new input.fdf file that has the keyword VAR_DISTANCE with the new value for the distance between H. This last task can be done with the shell sed command, as shown in the code below.

```#!/bin/bash
# file: create_inputs.sh

for dist in \$(seq -w 0.50 0.05 1.201).  # Use 1.201 to include 1.20
do
mkdir dist_\$dist
cd dist_\$dist
cp ../H.gga.psf .
sed "s/VAR_DISTANCE/\$dist/g" ../input.fdf > input.fdf
cd ..
done```

Note that we use the value 1.201 for the end of the loop, rather than just 1.2. This was done to include the value 1.20. Because of the precision of “float” variables, there may be cases where we need to do this to include the last value. This is not necessary for integers.

After saving the program, we can execute it with the command:

`\$ bash create_inputs.sh`

Or as an executable program with a shebang in the first line of the code.

```\$ chmod +x create_inputs.sh
\$ ./create_inputs.sh```

This will create 15 folders with the input.fdf and pseudopotential files.

To perform the simulations, we can also create a simple shell program to do this task. An example program to perform these simulations by varying the distance is:

```#!/bin/bash
# file: run.sh

for dist in \$(seq -w 0.50 0.05 1.201)
do
cd dist_\$dist
siesta-omp input.fdf > job.out
cd ..
done```

The execution of this program can be done with

```\$ chmod +x run.sh
\$ ./run.sh &```

After the completion of all 15 simulations, we can analyze the data to find the distance that minimizes the energy. To do this analysis, we will also make a shell program to extract the total energies of each simulation, taking advantage of the previous shell programs. The get_energy.sh program to extract these total energies is shown below.

```#!/bin/bash
# file: get_energy.sh

for dist in \$(seq -w 0.50 0.05 1.201)
do
cd dist_\$dist
energy=`grep 'Total =' job.out | cut -d = -f 2`
echo \$dist \$energy
cd ..
done```

After saving the program, we can execute it.

```\$ chmod +x get_energy.sh
\$ ./get_energy.sh > energyXdistance.dat```

Now the energyXdistance.dat file contains the total energies for distances between 0.50 and 1.20 Å. Plotting this data using appropriate software (xmgrace, gnuplot, pyplot, …), we will find the result shown in Figure 1. The minimum energy is found around d = 0.8 Å. Figure 1. Total energy with H-H distance in a H2 molecule.

## Algorithms for energy minimization

For larger molecules, there are more distances and angles between atoms that must be taken into account. For example, for a water molecule (H2O) there are 9 degrees of freedom (3 coordinates for each atom). However, 3 of these degrees of freedom are related to the translation of the molecule, and another 3 degrees of freedom are angles for rigid rotation of the molecule. That leaves 3 internal degrees of freedom to vary to find the optimal geometry of H2O. These 3 degrees of freedoms can be identified as the two distances between H and O, and the angle formed between these HOH bonds. The total energy will be a function of 3 variables, forming a hypersurface in the space formed by these degrees of freedom.

Variations of these independent degrees of freedom can be quite cumbersome. However, we can employ algorithms to find minima of this total energy hypersurface. In the SIESTA program, there are at least 3 algorithms that can be employed to find the optimized geometry of a molecule. The implemented algorithms are: Conjugate Gradient (CG), Broyden, and Fast Inertial Relaxation Engine (FIRE). These algorithm are iterative, being performed in an external loop besides the self-consistent loop to solve the Kohm-Sham equations.

### Example 03 (Water)

To relax a simple molecule as H2O, we shall start from a non-relaxed atomistic geometry, as H2O with H-O distance of 1.0 Å, and H-O-H angle of 90°, as shown in Figure 2. We need to include a “new block” in our input file (input.fdf), we specifications of what algorithm we want to use (CG, Broyden or FIRE), the maximum number of steps (iterations) to minimize the forces, and the force tolerance. The atomistic geometry will be called relaxed if all force components acting in all atoms are smaller than the threshold specified in the input file (MD.MaxForceTol).

The “molecular dynamics” block to be include in the input.fdf is:

```# in file: input.fdf

# -- MOLECULAR DYNAMICS --

MD.TypeOfRun Broyden
MD.NumCGsteps 1000
MD.MaxForceTol 0.01 eV/Ang``` Figure 2. Initial and final H2O geometry in our example.

## Benchmarking algorithms

We can compare the performance of these three algorithms implemented in SIESTA. As can be seen in Figure 3, the Broyden algorithm was the most efficient to minimize the maximum components of the residual forces. Figure 3. Maximum forces evolution with number of steps for minimization.

Previous: 02 – First simulation with SIESTA

Next: 04 – Total-energy data analysis