Commit 8e08cfda authored by Matti Javanainen's avatar Matti Javanainen
Browse files

Permeation, issues with NGLview / 'lastfile' / insane remain

parent 15ba354e
Loading
Loading
Loading
Loading
+6 KiB

File added.

No diff preview for this file type.

+321 −0
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Tutorial for alchemical free energy calculations using using AWH
## Case study: Extracting the oil-water partition coefficient for a small molecule

In this tutorial, we will use the accelerated weight histogram (AWH) method to calculate the partitioning free energy of a chosen small molecule between water and oil (cyclohexane). Partitioning is calculated as the difference in solvation free energies into the two solvents following the thermodynamic cycle. Here, an alchemical reaction coordinate is sampled using AWH, which provides an easy-to-use, computationally efficient, and fast-converging method to calculate PMFs along physical and alchemical reaction coordinates. The tutorial is performed using the coarse-grained Martini force field ([Martini web site](http://www.cgmartini.nl)) for speed, yet the same procedure is directly applicable to atomistic force fields.

## The workflow in a nutshell

* Choose a small molecule
* Insert the small molecule into water and cyclohexane
* Energy-minimize these two systems
* Run an AWH simulation where the small molecule appears and disappears many times
* Construct the PMF profile from the weight histogram and extract ΔG for both solvents
* Calculate the partititioning free energy ΔΔG based on the thermodynamic cycle

## Learning outcomes

* Set up alchemical simulations using GROMACS
* Understand the meaning of the key simulation parameters
* Understand the working principle of AWH
* Understand the outputs of AWH

## Alchemical transitions & Thermodynamic cycles

* Partitioning free energies could be calculated using equilibrium simulations in a two-phase solvent (water & oil), but sampling could be limited in case the solute is very hydrophobic/philic.
* Alternatively, we could use physical reaction coordinates to 'pull' the molecule from one phase to another and extract the PMF, but this typically leads to large hysteresis effects.
* In simulations, we can also perform alchemical transitions and exploit thermodynamic cycles.
* In an alchemical simulation, we make molecules disappear/appear, or change into other molecules.
* Here, we use the cycle below, in which the dim solutes refer to them being decoupled from the system, i.e. they do not interact with the solvent, but retain their internal interactions. Alternatively, this could be drawn as them being in vacuum.
* To estimate $ \Delta G^\mathrm{partitioning}_\mathrm{water-oil} $, we make a round around the circle, sum all the contributions. This must be equal to 0, since we returned to the starting point, and free energy is a state function (independent of the path).

$$
\Delta G_\mathrm{water}^\mathrm{solvation}+\Delta G_\mathrm{water-oil}^\mathrm{partitioning}-\Delta G_\mathrm{oil}^\mathrm{solvation}+\Delta G_\mathrm{decoupled}^\mathrm{partitioning}=0
$$

* Moreover, the term $\Delta G_\mathrm{decoupled}^\mathrm{partitioning}$, the partitioning free energy is 0.
* Thus, we can simplify to:

$$
\Delta G_\mathrm{water-oil}^\mathrm{partitioning}=\Delta G_\mathrm{oil}^\mathrm{solvation}-\Delta G_\mathrm{water}^\mathrm{solvation}.
$$

* This provides us with a simple way to calculate partitioning free energies based on the solvation energies into oil and water phases, which can be calculated using an alhemical simulations.

![title](cycle.png)

%% Cell type:markdown id: tags:

Load the required libraries (?)

%% Cell type:code id: tags:

``` python
%run init.py
```

%% Cell type:markdown id: tags:

First select a small molecule from the [Martini 3 small molecule library](https://github.com/ricalessandri/Martini3-small-molecules/). Then download the corresponding structure (.gro) file by changing the "TOLU" in the command below

%% Cell type:code id: tags:

``` python
%env MOLNAME=TOLU
!wget https://raw.githubusercontent.com/ricalessandri/Martini3-small-molecules/main/models/gros/${MOLNAME}.gro
```

%% Cell type:markdown id: tags:

## Generate two solvent boxes and insert the small molecule

* Let's first take a look at the gmx solvate tool and its options.

%% Cell type:code id: tags:

``` python
!gmx solvate -h
```

%% Cell type:markdown id: tags:

Look at the options. We can select the size of the simulation box with "-box". Alternatively, we can tell how many solvent molecules to insert with "-maxsol". Here, we will generate a cubic box with 800 solvent molecules. The radius needs to be adjusted to account for the large size of Martini water beads.

%% Cell type:code id: tags:

``` python
!gmx solvate -cp ${MOLNAME}.gro -cs W.gro -box 5 5 5 -radius 0.235 -maxsol 800 -o solvated_w.gro
```

%% Cell type:markdown id: tags:

Now we will generate the topology for the system with 1 solute molecule and 800 waters. We have a template file available, into which we will simply change the name of the chosen solute.

%% Cell type:code id: tags:

``` python
!sed "s/MOLECULE/$MOLNAME/" topol_w_template.top > topol_w.top
!cat topol_w.top
```

%% Cell type:markdown id: tags:

Let's take a look at the generated solvent system.

%% Cell type:code id: tags:

``` python
import MDAnalysis as mda # SHOULD THESE BE IN THE INIT?
import nglview as nv # SHOULD THESE BE IN THE INIT?
u = mda.Universe('solvated_w.gro')
view = nv.show_mdanalysis(u)
view.add_spacefill(lipid.value)
view.add_unitcell()
view
```

%% Cell type:markdown id: tags:

Now we'll energy-minimize and equilibrate the system so that we obtain a correct density before production (AWH) simulations.

%% Cell type:code id: tags:

``` python
!gmx grompp -f MDPs/minimization.mdp -c solvated_w.gro -p topol_w.top -o min_w.tpr
!gmx mdrun -deffnm min_w -v

!gmx grompp -f MDPs/equilibration.mdp -c min_w.gro -p topol_w.top -o eq_w.tpr
!gmx mdrun -deffnm eq_w -v
```

%% Cell type:markdown id: tags:

Let's see if the box size has converged. With isotropic pressure coupling all box vectors are equal (cubic box), so it is enough to look at the x component ("Box-X"), which can be extracted from the output files with "gmx energy". We then plot it.

%% Cell type:code id: tags:

``` python
!echo "Box-X" | gmx energy -f eq_w.edr -o box_w.xvg
import matplotlib.pyplot as plt
import numpy as np

box = np.loadtxt('box_w.xvg', comments=['#', '@'])
plt.plot(box[:, 0], box[:, 1])
plt.xlabel('Time (ps)')
plt.ylabel('Box edge (nm)')
plt.show()
```

%% Cell type:markdown id: tags:

The system should look well converged, so we can proceed to the AWH simulation.

## AWH simulation

If we have an equilibrated system, we can proceed to the alchemical AWH run. Let's first take a look at the mdp file template. Some key parameters include:
* "free energy" turns on the alchemical calculation
* "couple-lambda0/1" set what is on/off at different lambda values
* "couple-moltype" is the name of the molecule to solvate. This will be changed in the next step
* "couple-intramol" sets whether bonded and intra-molecular non-bonded interactions are coupled.
* "init-lambda-state" sets the state where the simulation starts defined as the element of the vectors below
* "vdw/coul_lambdas" sets the scaling factors for LJ and electrostatic interactions for the different lambda states (here 30). The scaling parameters of the last one (0,0) correspond to a non-interacting molecule (couple-lambda0=none), and this is the initial state (init-lambda-state=30).
* "calc-lambda-neighbors" sets the number of neighbors for which dH will be calculated. 1 for typical BAR, -1 means all neighbors, in line with MBAR and AWH
* "sc_alpha/sigma/power" set the shape of the soft-core potential used to rid issues at lambda values close to 0 or 1
* AWH parameters here!

%% Cell type:code id: tags:

``` python
!cat MDPs/awh.mdp
```

%% Cell type:markdown id: tags:

Let's change the name of the molecule to couple in the mdp file to the chosen one.

%% Cell type:code id: tags:

``` python
!sed "s/TOCOUPLE/$MOLNAME/" MDPs/awh.mdp > awh.mdp
```

%% Cell type:markdown id: tags:

Now we'll generate the tpr and run the awh simulation.

%% Cell type:code id: tags:

``` python
!gmx grompp -f awh.mdp -c eq_w.gro -p topol_w.top -o awh_w
!gmx mdrun -deffnm awh_w -v
```

%% Cell type:markdown id: tags:

## Analyze the AWH run

Here, we run the tool "gmx awh" to extract the PMF and other profiles stored in the energy file. The tool outputs a file every "awh-nstout" steps (see mdp file), so we put them in a folder, find the last file, move it to the parent folder, and delete the others.

%% Cell type:code id: tags:

``` python
# THE LASTFILE DOES NOT WORK (GETS BRACKETS AROUND FILENAME)
#!mkdir OUTPUT
#!gmx awh -f awh_w.edr -s awh_w.tpr -more -o OUTPUT/awh
#lastfile = !ls -tr OUTPUT/*xvg | tail -n 1
#!echo {lastfile}
#!mv {lastfile} pmf_w.xvg
#!rm -rf OUTPUT
!cat pmf_w.xvg | awk '/ 0.000/ {print "The solvation free energy for water is " $2} /30.000/ {print "- " $2 " kJ/mol"}'
```

%% Cell type:markdown id: tags:

Now let's visualize the result. Let's first plot the resulting PMF and the coordinate bias. If the PMF does not have higher curvatures than what can be described by the bias generated using Gaussians, these should look identical (the bias is needed to obtain the target distribution = flat). Here, for uncharged molecules, the first 10 lambdas are equal, as they correspond to turning of the charges. The 20 consecutive ones correspond to turning of the Lennard-Jones interactions (van der Waals forces + steric repulsion). The solvation free energy is obtained as PMF($\lambda=1$) - PMF($\lambda=0$).

%% Cell type:code id: tags:

``` python
import matplotlib.pyplot as plt
import numpy as np

pmf = np.loadtxt('pmf_w.xvg', comments=['#', '@'])
plt.plot(pmf[:, 0], pmf[:, 2])
plt.plot(pmf[:, 0], pmf[:, 1])
plt.legend(['Coordinate bias','PMF'])
plt.xlabel('$\lambda$')
plt.ylabel('PMF (kJ/mol)')
plt.show()
```

%% Cell type:markdown id: tags:

Let's now look at some other AWH outputs. The target distribution was set to be uniform, and depending on the simulation length, we will have better or worse convergence to it. The error in the target distribution can be checked from the output file.

%% Cell type:code id: tags:

``` python
import matplotlib.pyplot as plt
import numpy as np

pmf = np.loadtxt('pmf_w.xvg', comments=['#', '@'])
plt.plot(pmf[:, 0], pmf[:, 4],'b')
plt.plot(pmf[:, 0], pmf[:, 5],'r')
plt.legend(['Reference value distribution','Target ref value distribution'])
plt.xlabel('$\lambda$')
plt.ylabel('PMF (kJ/mol)')
plt.show()
```

%% Cell type:code id: tags:

``` python
!grep error pmf_w.xvg
```

%% Cell type:markdown id: tags:

Finally, we'll check the friction metric, which tells how easy it is to diffuse in the alchemical space along $\lambda$.

%% Cell type:code id: tags:

``` python
import matplotlib.pyplot as plt
import numpy as np

pmf = np.loadtxt('pmf_w.xvg', comments=['#', '@'])
plt.plot(pmf[:, 0], pmf[:, 6],'b')
plt.ylabel('Friction metric')
plt.xlabel('$\lambda$')
plt.show()
```

%% Cell type:markdown id: tags:

Now, we will finally extract the free energy of solvating the chosen solute in water.

%% Cell type:code id: tags:

``` python
!grep " 0.000" pmf_w.xvg | awk '{print "Solvation free energy in water is " $2 " kJ/mol"}'
```

%% Cell type:markdown id: tags:

Now we will repeat the calculation for hexadecane in one block.

%% Cell type:code id: tags:

``` python
!gmx solvate -cp ${MOLNAME}.gro -cs CHEX.gro -box 5 5 5 -maxsol 600 -o solvated_chex.gro
!sed "s/MOLECULE/$MOLNAME/" topol_chex_template.top > topol_chex.top
!gmx grompp -f MDPs/minimization.mdp -c solvated_chex.gro -p topol_chex.top -o min_chex.tpr
!gmx mdrun -deffnm min_chex -v
!gmx grompp -f MDPs/equilibration.mdp -c min_chex.gro -p topol_chex.top -o eq_chex.tpr
!gmx mdrun -deffnm eq_chex -v
!gmx grompp -f awh.mdp -c eq_chex.gro -p topol_chex.top -o awh_chex.tpr
!gmx mdrun -deffnm awh_chex -v
```

%% Cell type:code id: tags:

``` python
# THE LASTFILE DOES NOT WORK (GETS BRACKETS AROUND FILENAME)
#!mkdir OUTPUT
#!gmx awh -f awh_w.edr -s awh_w.tpr -more -o OUTPUT/awh
#lastfile = !ls -tr OUTPUT/*xvg | tail -n 1
#!echo {lastfile}
#!mv {lastfile} pmf_w.xvg
#!rm -rf OUTPUT
!cat pmf_chex.xvg | awk '/ 0.000/ {print "The solvation free energy for hexadecane is " $2} /30.000/ {print "- " $2 " kJ/mol"}'
```

%% Cell type:markdown id: tags:

# Final questions and Conclusions

Now go back to the free energy cycle in the beginning and calculate the partitioning free energy based on the solvaiton free energy values.

* Is the molecule hydrophobic or hydrophilic? Does the result make sense?
* Can you find an experimental reference value online? Does it agree with the computed one?
+141 −0
Original line number Diff line number Diff line
; STANDARD MD INPUT OPTIONS FOR MARTINI 3.x
; Updated 30 Jan 2017 by PCTS
;
; for use with GROMACS 5

; TIMESTEP IN MARTINI 
; Default timestep of 20 fs. 

integrator               = md
dt                       = 0.02
nsteps                   = 50000000 ; 1 microseconds
nstcomm                  = 100
comm-grps		 = 

nstxout                  = 0
nstvout                  = 0
nstfout                  = 0
nstlog                   = 50000
nstenergy                = 50000
nstxout-compressed       = 5000
compressed-x-precision   = 100
compressed-x-grps        = 

; NEIGHBOURLIST and MARTINI 
; To achieve faster simulations in combination with the Verlet-neighborlist
; scheme, Martini can be simulated with a straight cutoff. In order to 
; do so, the cutoff distance is reduced 1.1 nm. 
; Neighborlist length should be optimized depending on your hardware setup:
; updating ever 20 steps should be fine for classic systems, while updating
; every 30-40 steps might be better for GPU based systems.
; The Verlet neighborlist scheme will automatically choose a proper neighborlist
; length, based on a energy drift tolerance.
;
; Coulomb interactions can alternatively be treated using a reaction-field,
; giving slightly better properties.
; Please realize that electrostVatic interactions in the Martini model are 
; not considered to be very accurate to begin with, especially as the 
; screening in the system is set to be uniform across the system with 
; a screening constant of 15. When using PME, please make sure your 
; system properties are still reasonable.
;
; With the polarizable water model, the relative electrostatic screening 
; (epsilon_r) should have a value of 2.5, representative of a low-dielectric
; apolar solvent. The polarizable water itself will perform the explicit screening
; in aqueous environment. In this case, the use of PME is more realistic.


cutoff-scheme            = Verlet
nstlist                  = 20
ns_type                  = grid
pbc                      = xyz
verlet-buffer-tolerance  = 0.005

coulombtype              = reaction-field 
rcoulomb                 = 1.1
epsilon_r                = 15	; 2.5 (with polarizable water)
epsilon_rf               = 0
vdw_type                 = cutoff  
vdw-modifier             = Potential-shift-verlet
rvdw                     = 1.1

; MARTINI and TEMPERATURE/PRESSURE
; normal temperature and pressure coupling schemes can be used. 
; It is recommended to couple individual groups in your system separately.
; Good temperature control can be achieved with the velocity rescale (V-rescale)
; thermostat using a coupling constant of the order of 1 ps. Even better 
; temperature control can be achieved by reducing the temperature coupling 
; constant to 0.1 ps, although with such tight coupling (approaching 
; the time step) one can no longer speak of a weak-coupling scheme.
; We therefore recommend a coupling time constant of at least 0.5 ps.
; The Berendsen thermostat is less suited since it does not give
; a well described thermodynamic ensemble.
; 
; Pressure can be controlled with the Parrinello-Rahman barostat, 
; with a coupling constant in the range 4-8 ps and typical compressibility 
; in the order of 10e-4 - 10e-5 bar-1. Note that, for equilibration purposes, 
; the Berendsen barostat probably gives better results, as the Parrinello-
; Rahman is prone to oscillating behaviour. For bilayer systems the pressure 
; coupling should be done semiisotropic.

tcoupl                   = v-rescale 
tc-grps                  = LIPIDS SOLVENT
tau_t                    = 1.0   1.0
ref_t                    = 320   320
Pcoupl                   = Berendsen
Pcoupltype               = semiisotropic
tau_p                    = 4.0   ;parrinello-rahman is more stable with larger tau-p, DdJ, 20130422
compressibility          = 3e-4  0
ref_p                    = 1.0   1.0

gen_vel                  = yes
gen_temp                 = 330
gen_seed                 = 473529

; MARTINI and CONSTRAINTS 
; for ring systems and stiff bonds constraints are defined
; which are best handled using Lincs. 

constraints              = none 
constraint_algorithm     = Lincs

pull                     = yes
pull_ngroups             = 2
pull_ncoords             = 1

pull-print-ref-value     = yes
pull-nstxout             = 5000
pull-nstfout             = 0

pull_group1_name	 = LIPIDS
pull_group2_name         = PERMEANT

pull-group1-pbcatom	 = 1

pull_coord1_type		= external-potential
pull_coord1_potential_provider 	= AWH
pull_coord1_geometry           	= direction
pull_coord1_groups         	= 1 2
pull_coord1_dim            	= N N Y
pull_coord1_vec            	= 0 0 1
pull-coord1-init            	= 4

awh			   	= yes
awh-potential			= convolved
awh-nstout		   	= 50000
awh-nbias		   	= 1

awh1-ndim		   	= 1
awh1-target			= cutoff
awh1-target-cutoff		= 40
awh1-error-init			= 10
awh1-growth			= exp-linear

awh1-dim1-coord-provider	= pull
awh1-dim1-coord-index	   	= 1
awh1-dim1-start		   	= -4.5
awh1-dim1-end		   	= 4.5
awh1-dim1-force-constant   	= 1e8
awh1-dim1-diffusion	   	= 1e-5

pull-pbc-ref-prev-step-com 	= yes
+141 −0

File added.

Preview size limit exceeded, changes collapsed.

+101 −0

File added.

Preview size limit exceeded, changes collapsed.

Loading