Commit 63b28f1d authored by Matti Javanainen's avatar Matti Javanainen
Browse files

Unified the handling of the MOLNAME variable into Python variable

parent 9ef682cd
Loading
Loading
Loading
Loading
+164 −9
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Tutorial for alchemical free energy calculations using using AWH
## Case study: Extracting the oil-water partition coefficient for a small molecule

In this tutorial, we will use the accelerated weight histogram (AWH) method to calculate the partitioning free energy of a chosen small molecule between water and oil (cyclohexane). Partitioning is calculated as the difference in solvation free energies into the two solvents following the thermodynamic cycle. Here, an alchemical reaction coordinate is sampled using AWH, which provides an easy-to-use, computationally efficient, and fast-converging method to calculate PMFs along physical and alchemical reaction coordinates. The tutorial is performed using the coarse-grained Martini force field ([Martini web site](http://www.cgmartini.nl)) for speed, yet the same procedure is directly applicable to atomistic force fields.

## The workflow in a nutshell

* Choose a small molecule
* Insert the small molecule into water and cyclohexane
* Energy-minimize these two systems
* Run an AWH simulation where the small molecule appears and disappears many times
* Construct the PMF profile from the weight histogram and extract ΔG for both solvents
* Calculate the partititioning free energy ΔΔG based on the thermodynamic cycle

## Learning outcomes

* Set up alchemical simulations using GROMACS
* Understand the meaning of the key simulation parameters
* Understand the working principle of AWH
* Understand the outputs of AWH

## Alchemical transitions & Thermodynamic cycles

* Partitioning free energies could be calculated using equilibrium simulations in a two-phase solvent (water & oil), but sampling could be limited in case the solute is very hydrophobic/philic.
* Alternatively, we could use physical reaction coordinates to 'pull' the molecule from one phase to another and extract the PMF, but this typically leads to large hysteresis effects.
* In simulations, we can also perform alchemical transitions and exploit thermodynamic cycles.
* In an alchemical simulation, we make molecules disappear/appear, or change into other molecules.
* Here, we use the cycle below, in which the dim solutes refer to them being decoupled from the system, i.e. they do not interact with the solvent, but retain their internal interactions. Alternatively, this could be drawn as them being in vacuum.
* To estimate $ \Delta G^\mathrm{partitioning}_\mathrm{water-oil} $, we make a round around the circle, sum all the contributions. This must be equal to 0, since we returned to the starting point, and free energy is a state function (independent of the path).

$$
\Delta G_\mathrm{water}^\mathrm{solvation}+\Delta G_\mathrm{water-oil}^\mathrm{partitioning}-\Delta G_\mathrm{oil}^\mathrm{solvation}+\Delta G_\mathrm{decoupled}^\mathrm{partitioning}=0
$$

* Moreover, the term $\Delta G_\mathrm{decoupled}^\mathrm{partitioning}$, the partitioning free energy is 0.
* Thus, we can simplify to:

$$
\Delta G_\mathrm{water-oil}^\mathrm{partitioning}=\Delta G_\mathrm{oil}^\mathrm{solvation}-\Delta G_\mathrm{water}^\mathrm{solvation}.
$$

* This provides us with a simple way to calculate partitioning free energies based on the solvation energies into oil and water phases, which can be calculated using an alhemical simulations.

![title](cycle.png)

%% Cell type:markdown id: tags:

Load the required libraries (?)

%% Cell type:code id: tags:

``` python
%run init.py
from ipywidgets import widgets
from IPython.display import display
import warnings
warnings.filterwarnings("ignore")

import MDAnalysis as mda
import nglview as nv
import matplotlib.pyplot as plt
import numpy as np
import wget
```

%% Output

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
Cell     In[7], line 7
          4 warnings.filterwarnings("ignore")
          6 import MDAnalysis as mda
    ----> 7 import nglview as nv
          8 import matplotlib.pyplot as plt
          9 import numpy as np
File     ~/Library/Python/3.9/lib/python/site-packages/nglview/__init__.py:4
          1 import warnings
          3 # for doc
    ----> 4 from . import adaptor, datafiles, show, widget
          5 from ._version import get_versions
          6 from .adaptor import *
File     ~/Library/Python/3.9/lib/python/site-packages/nglview/show.py:13
          3 from . import datafiles
          4 from .adaptor import (ASEStructure, ASETrajectory, BiopythonStructure,
          5                       FileStructure, HTMDTrajectory, IODataStructure,
          6                       IOTBXStructure, MDAnalysisTrajectory, MDTrajTrajectory,
       (...)
         11                       RdkitStructure,
         12                       TextStructure)
    ---> 13 from .widget import NGLWidget
         15 __all__ = [
         16     'demo',
         17     'show_pdbid',
       (...)
         40     'show_biopython',
         41 ]
         44 def show_pdbid(pdbid, **kwargs):
File     ~/Library/Python/3.9/lib/python/site-packages/nglview/widget.py:19
         15 from traitlets import (Bool, CaselessStrEnum, Dict, Instance, Int, Integer,
         16                        List, Unicode, observe, validate)
         17 import traitlets
    ---> 19 from . import color, interpolate
         20 from .adaptor import Structure, Trajectory
         21 from .component import ComponentViewer
File     ~/Library/Python/3.9/lib/python/site-packages/nglview/color.py:114
        110         else:
        111             raise ValueError(f"{obj} must be either list of list or string")
    --> 114 ColormakerRegistry = _ColormakerRegistry()
File     ~/Library/Python/3.9/lib/python/site-packages/nglview/base.py:10, in _singleton.<locals>.getinstance()
          8 def getinstance():
          9     if cls not in instances:
    ---> 10         instances[cls] = cls()
         11     return instances[cls]
File     ~/Library/Python/3.9/lib/python/site-packages/nglview/color.py:47, in _ColormakerRegistry.__init__(self, *args, **kwargs)
         45 try:
         46     get_ipython() # only display in notebook
    ---> 47     self._ipython_display_()
         48 except NameError:
         49     pass
File     ~/Library/Python/3.9/lib/python/site-packages/nglview/color.py:54, in _ColormakerRegistry._ipython_display_(self, **kwargs)
         52 if self._ready:
         53     return
    ---> 54 super()._ipython_display_(**kwargs)
    AttributeError: 'super' object has no attribute '_ipython_display_'

%% Cell type:markdown id: tags:

First select a small molecule from the [Martini 3 small molecule library](https://github.com/ricalessandri/Martini3-small-molecules/). Then download the corresponding structure (.gro) file by changing the "TOLU" in the command below

%% Cell type:code id: tags:

``` python
%env MOLNAME=TOLU
!wget https://raw.githubusercontent.com/ricalessandri/Martini3-small-molecules/main/models/gros/${MOLNAME}.gro
MOLNAME="TOLU"
!echo {MOLNAME}
```

%% Output

    TOLU

%% Cell type:code id: tags:

``` python
!wget https://raw.githubusercontent.com/ricalessandri/Martini3-small-molecules/main/models/gros/{MOLNAME}.gro
```

%% Cell type:markdown id: tags:

## Generate two solvent boxes and insert the small molecule

* Let's first take a look at the gmx solvate tool and its options.

%% Cell type:code id: tags:

``` python
!gmx solvate -h
```

%% Cell type:markdown id: tags:

Look at the options. We can select the size of the simulation box with "-box". Alternatively, we can tell how many solvent molecules to insert with "-maxsol". Here, we will generate a cubic box with 800 solvent molecules. The radius needs to be adjusted to account for the large size of Martini water beads.

%% Cell type:code id: tags:

``` python
!gmx solvate -cp ${MOLNAME}.gro -cs W.gro -box 5 5 5 -radius 0.235 -maxsol 800 -o solvated_w.gro
!gmx solvate -cp {MOLNAME}.gro -cs W.gro -box 5 5 5 -radius 0.235 -maxsol 800 -o solvated_w.gro
```

%% Output

                         :-) GROMACS - gmx solvate, 2021.3 (-:
    
                                GROMACS is written by:
         Andrey Alekseenko              Emile Apol              Rossen Apostolov
             Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar
           Christian Blau           Viacheslav Bolnykh             Kevin Boyd
         Aldert van Buuren           Rudi van Drunen             Anton Feenstra
        Gilles Gouaillardet             Alan Gray               Gerrit Groenhof
           Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang
          Aleksei Iupinov           Christoph Junghans             Joe Jordan
        Dimitrios Karkoulis            Peter Kasson                Jiri Kraus
          Carsten Kutzner              Per Larsson              Justin A. Lemkul
           Viveca Lindahl            Magnus Lundborg             Erik Marklund
            Pascal Merz             Pieter Meulenhoff            Teemu Murtola
            Szilard Pall               Sander Pronk              Roland Schulz
           Michael Shirts            Alexey Shvetsov             Alfons Sijbers
           Peter Tieleman              Jon Vincent              Teemu Virolainen
         Christian Wennberg            Maarten Wolf              Artem Zhmurov
                               and the project leaders:
            Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
    
    Copyright (c) 1991-2000, University of Groningen, The Netherlands.
    Copyright (c) 2001-2019, The GROMACS development team at
    Uppsala University, Stockholm University and
    the Royal Institute of Technology, Sweden.
    check out http://www.gromacs.org for more information.
    
    GROMACS is free software; you can redistribute it and/or modify it
    under the terms of the GNU Lesser General Public License
    as published by the Free Software Foundation; either version 2.1
    of the License, or (at your option) any later version.
    
    GROMACS:      gmx solvate, version 2021.3
    Executable:   /Users/javamatt/Software/gromacs21/bin/gmx
    Data prefix:  /Users/javamatt/Software/gromacs21
    Working dir:  /Users/javamatt/Library/CloudStorage/Dropbox/Springschool/spring_school2023_notebooks/AWH_partitioning
    Command line:
      gmx solvate -cp TOLU.gro -cs W.gro -box 5 5 5 -radius 0.235 -maxsol 800 -o solvated_w.gro
    
    
    -------------------------------------------------------
    Program:     gmx solvate, version 2021.3
    Source file: src/gromacs/commandline/cmdlineparser.cpp (line 275)
    Function:    void gmx::CommandLineParser::parse(int *, char **)
    
    Error in user input:
    Invalid command-line options
      In command-line option -cp
        File 'TOLU.gro' does not exist or is not accessible.
        The file could not be opened.
          Reason: No such file or directory
          (call to fopen() returned error code 2)
    
    For more information and tips for troubleshooting, please check the GROMACS
    website at http://www.gromacs.org/Documentation/Errors
    -------------------------------------------------------

%% Cell type:markdown id: tags:

Now we will generate the topology for the system with 1 solute molecule and 800 waters. We have a template file available, into which we will simply change the name of the chosen solute.

%% Cell type:code id: tags:

``` python
!sed "s/MOLECULE/$MOLNAME/" topol_w_template.top > topol_w.top
!sed "s/MOLECULE/{MOLNAME}/" topol_w_template.top > topol_w.top
!cat topol_w.top
```

%% Output

    #include "../TOP/martini_v3.0.0.itp
    #include "../TOP/martini_v3.0.0_solvents_v1.itp
    #include "../TOP/martini_v3.0.0_small_molecules_v1.itp
    
    [ system ]
    Chosen molecule in water
    
    [ molecules ]
    TOLU     1
    W	    800

%% Cell type:markdown id: tags:

Let's take a look at the generated solvent system.

%% Cell type:code id: tags:

``` python
import MDAnalysis as mda # SHOULD THESE BE IN THE INIT?
import nglview as nv # SHOULD THESE BE IN THE INIT?
u = mda.Universe('solvated_w.gro')
view = nv.show_mdanalysis(u)
view.add_spacefill(lipid.value)
view.add_unitcell()
view
```

%% Cell type:markdown id: tags:

Now we'll energy-minimize and equilibrate the system so that we obtain a correct density before production (AWH) simulations.

%% Cell type:code id: tags:

``` python
!gmx grompp -f MDPs/minimization.mdp -c solvated_w.gro -p topol_w.top -o min_w.tpr
!gmx mdrun -deffnm min_w -v

!gmx grompp -f MDPs/equilibration.mdp -c min_w.gro -p topol_w.top -o eq_w.tpr
!gmx mdrun -deffnm eq_w -v
```

%% Cell type:markdown id: tags:

Let's see if the box size has converged. With isotropic pressure coupling all box vectors are equal (cubic box), so it is enough to look at the x component ("Box-X"), which can be extracted from the output files with "gmx energy". We then plot it.

%% Cell type:code id: tags:

``` python
!echo "Box-X" | gmx energy -f eq_w.edr -o box_w.xvg
import matplotlib.pyplot as plt
import numpy as np

box = np.loadtxt('box_w.xvg', comments=['#', '@'])
plt.plot(box[:, 0], box[:, 1])
plt.xlabel('Time (ps)')
plt.ylabel('Box edge (nm)')
plt.show()
```

%% Cell type:markdown id: tags:

The system should look well converged, so we can proceed to the AWH simulation.

## AWH simulation

If we have an equilibrated system, we can proceed to the alchemical AWH run. Let's first take a look at the mdp file template. Some key parameters include:
* "free energy" turns on the alchemical calculation
* "couple-lambda0/1" set what is on/off at different lambda values
* "couple-moltype" is the name of the molecule to solvate. This will be changed in the next step
* "couple-intramol" sets whether bonded and intra-molecular non-bonded interactions are coupled.
* "init-lambda-state" sets the state where the simulation starts defined as the element of the vectors below
* "vdw/coul_lambdas" sets the scaling factors for LJ and electrostatic interactions for the different lambda states (here 30). The scaling parameters of the last one (0,0) correspond to a non-interacting molecule (couple-lambda0=none), and this is the initial state (init-lambda-state=30).
* "calc-lambda-neighbors" sets the number of neighbors for which dH will be calculated. 1 for typical BAR, -1 means all neighbors, in line with MBAR and AWH
* "sc_alpha/sigma/power" set the shape of the soft-core potential used to rid issues at lambda values close to 0 or 1
* AWH parameters here!

%% Cell type:code id: tags:

``` python
!cat MDPs/awh.mdp
```

%% Cell type:markdown id: tags:

Let's change the name of the molecule to couple in the mdp file to the chosen one.

%% Cell type:code id: tags:

``` python
!sed "s/TOCOUPLE/$MOLNAME/" MDPs/awh.mdp > awh.mdp
!sed "s/TOCOUPLE/{MOLNAME}/" MDPs/awh.mdp > awh.mdp
```

%% Cell type:markdown id: tags:

Now we'll generate the tpr and run the awh simulation.

%% Cell type:code id: tags:

``` python
!gmx grompp -f awh.mdp -c eq_w.gro -p topol_w.top -o awh_w
!gmx mdrun -deffnm awh_w -v
```

%% Cell type:markdown id: tags:

## Analyze the AWH run

Here, we run the tool "gmx awh" to extract the PMF and other profiles stored in the energy file. The tool outputs a file every "awh-nstout" steps (see mdp file), so we put them in a folder, find the last file, move it to the parent folder, and delete the others.

%% Cell type:code id: tags:

``` python
# THE LASTFILE DOES NOT WORK (GETS BRACKETS AROUND FILENAME)
#!mkdir OUTPUT
#!gmx awh -f awh_w.edr -s awh_w.tpr -more -o OUTPUT/awh
#lastfile = !ls -tr OUTPUT/*xvg | tail -n 1
#!echo {lastfile}
#!mv {lastfile} pmf_w.xvg
#!rm -rf OUTPUT
!cat pmf_w.xvg | awk '/ 0.000/ {print "The solvation free energy for water is " $2} /30.000/ {print "- " $2 " kJ/mol"}'
```

%% Cell type:markdown id: tags:

Now let's visualize the result. Let's first plot the resulting PMF and the coordinate bias. If the PMF does not have higher curvatures than what can be described by the bias generated using Gaussians, these should look identical (the bias is needed to obtain the target distribution = flat). Here, for uncharged molecules, the first 10 lambdas are equal, as they correspond to turning of the charges. The 20 consecutive ones correspond to turning of the Lennard-Jones interactions (van der Waals forces + steric repulsion). The solvation free energy is obtained as PMF($\lambda=1$) - PMF($\lambda=0$).

%% Cell type:code id: tags:

``` python
import matplotlib.pyplot as plt
import numpy as np

pmf = np.loadtxt('pmf_w.xvg', comments=['#', '@'])
plt.plot(pmf[:, 0], pmf[:, 2])
plt.plot(pmf[:, 0], pmf[:, 1])
plt.legend(['Coordinate bias','PMF'])
plt.xlabel('$\lambda$')
plt.ylabel('PMF (kJ/mol)')
plt.show()
```

%% Cell type:markdown id: tags:

Let's now look at some other AWH outputs. The target distribution was set to be uniform, and depending on the simulation length, we will have better or worse convergence to it. The error in the target distribution can be checked from the output file.

%% Cell type:code id: tags:

``` python
import matplotlib.pyplot as plt
import numpy as np

pmf = np.loadtxt('pmf_w.xvg', comments=['#', '@'])
plt.plot(pmf[:, 0], pmf[:, 4],'b')
plt.plot(pmf[:, 0], pmf[:, 5],'r')
plt.legend(['Reference value distribution','Target ref value distribution'])
plt.xlabel('$\lambda$')
plt.ylabel('PMF (kJ/mol)')
plt.show()
```

%% Cell type:code id: tags:

``` python
!grep error pmf_w.xvg
```

%% Cell type:markdown id: tags:

Finally, we'll check the friction metric, which tells how easy it is to diffuse in the alchemical space along $\lambda$.

%% Cell type:code id: tags:

``` python
import matplotlib.pyplot as plt
import numpy as np

pmf = np.loadtxt('pmf_w.xvg', comments=['#', '@'])
plt.plot(pmf[:, 0], pmf[:, 6],'b')
plt.ylabel('Friction metric')
plt.xlabel('$\lambda$')
plt.show()
```

%% Cell type:markdown id: tags:

Now, we will finally extract the free energy of solvating the chosen solute in water.

%% Cell type:code id: tags:

``` python
!grep " 0.000" pmf_w.xvg | awk '{print "Solvation free energy in water is " $2 " kJ/mol"}'
!cat pmf_w.xvg | awk '/ 0.000/ {print "The solvation free energy for water is " $2} /30.000/ {print "- " $2 " kJ/mol"}'
```

%% Output

    cat: pmf_chex.xvg: No such file or directory

%% Cell type:markdown id: tags:

Now we will repeat the calculation for hexadecane in one block.

%% Cell type:code id: tags:

``` python
!gmx solvate -cp ${MOLNAME}.gro -cs CHEX.gro -box 5 5 5 -maxsol 600 -o solvated_chex.gro
!sed "s/MOLECULE/$MOLNAME/" topol_chex_template.top > topol_chex.top
!gmx solvate -cp {MOLNAME}.gro -cs CHEX.gro -box 5 5 5 -maxsol 600 -o solvated_chex.gro
!sed "s/MOLECULE/{MOLNAME}/" topol_chex_template.top > topol_chex.top
!gmx grompp -f MDPs/minimization.mdp -c solvated_chex.gro -p topol_chex.top -o min_chex.tpr
!gmx mdrun -deffnm min_chex -v
!gmx grompp -f MDPs/equilibration.mdp -c min_chex.gro -p topol_chex.top -o eq_chex.tpr
!gmx mdrun -deffnm eq_chex -v
!gmx grompp -f awh.mdp -c eq_chex.gro -p topol_chex.top -o awh_chex.tpr
!gmx mdrun -deffnm awh_chex -v
```

%% Cell type:code id: tags:

``` python
# THE LASTFILE DOES NOT WORK (GETS BRACKETS AROUND FILENAME)
#!mkdir OUTPUT
#!gmx awh -f awh_w.edr -s awh_w.tpr -more -o OUTPUT/awh
#lastfile = !ls -tr OUTPUT/*xvg | tail -n 1
#!echo {lastfile}
#!mv {lastfile} pmf_w.xvg
#!rm -rf OUTPUT
!cat pmf_chex.xvg | awk '/ 0.000/ {print "The solvation free energy for hexadecane is " $2} /30.000/ {print "- " $2 " kJ/mol"}'
```

%% Cell type:markdown id: tags:

# Final questions and Conclusions

Now go back to the free energy cycle in the beginning and calculate the partitioning free energy based on the solvaiton free energy values.

* Is the molecule hydrophobic or hydrophilic? Does the result make sense?
* Can you find an experimental reference value online? Does it agree with the computed one?
+1 −1
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Tutorial for extracting the PMF of a lipid bilayer permeation using AWH

%% Cell type:markdown id: tags:

In this tutorial, we will use the accelerated weight histogram (AWH) method to calculate the potential of mean force of a chosen small molecule along the direction normal to the plane of the membrane, which estimates the free energy profile for bilayer permeation. The AWH method provides an easy-to-use, computationally efficient, and fast-converging method to calculate PMFs along various reaction coordinates. The tutorial is performed using the coarse-grained Martini force field ([Martini web site](http://www.cgmartini.nl)) for speed, yet the same procedure is directly applicable to atomistic force fields.

## The workflow in a nutshell

* Choose a small molecule
* Set up a coarse-grained lipid bilayer
* Insert the small molecule into the system
* Minimize & equilibrate
* Run an AWH simulation where the small molecule permeates the bilayer many times
* Construct the PMF profile from the weight histogram

## Learning outcomes

* Set up biased simulations using the pull code in GROMACS
* Understand the meaning of the key simulation parameters
* Understand the working principle of AWH
* Understand the outputs of AWH

%% Cell type:code id: tags:

``` python
import MDAnalysis as mda
import nglview as nv
import matplotlib.pyplot as plt
import numpy as np
from ipywidgets import widgets
from IPython.display import display
import warnings
warnings.filterwarnings("ignore")
import wget
```

%% Cell type:markdown id: tags:

First select a small molecule from the [Martini 3 small molecule library](https://github.com/ricalessandri/Martini3-small-molecules/). Then download the corresponding structure (.gro) file.

%% Cell type:code id: tags:

``` python
MOLNAME="TOLU"
!echo {MOLNAME}
```

%% Cell type:code id: tags:

``` python
!wget https://raw.githubusercontent.com/ricalessandri/Martini3-small-molecules/main/models/gros/{MOLNAME}.gro
```

%% Cell type:markdown id: tags:

## Generate the lipid bilayer

This is done using the [insane.py](https://doi.org/10.1021/acs.jctc.5b00209) tool. Let's see the options and select suitable ones.

%% Cell type:code id: tags:

``` python
!wget 'https://github.com/Tsjerk/Insane/releases/download/v1.0rc1/insane'
!chmod u+x insane
!python3 insane -h
```

%% Cell type:markdown id: tags:

Now, think what options to use to construct a lipid bilayer (bilayer.gro) and the corresponding draft for the topology file (topol.top) that contains a chosen composition (avoid charged lipids) in both leaflets. A suitable size is 8 x 8 x 12 nm^3. Use water as a solvent. A model is provided below, but feel free to change the composition.

%% Cell type:code id: tags:

``` python
!python3 insane -o bilayer.gro -p topol.top -box 8,8,12 -sol W:1 -l DOPC:0.5 -l DOPE:0.5
```

%% Cell type:markdown id: tags:

Let's take a look at the generated bilayer. It is extremely ordered, but will be relaxed quickly.

%% Cell type:code id: tags:

``` python
# TEST THIS ON PUHTI
u = mda.Universe('bilayer.gro')
view = nv.show_mdanalysis(u)
view.add_spacefill(lipid.value)
view.add_unitcell()
view
```

%% Cell type:markdown id: tags:

Let's take a look at the preliminary topology file to see what our gro file contains.

%% Cell type:code id: tags:

``` python
!cat topol.top
```

%% Cell type:markdown id: tags:

The topology refers to different moleculetypes (lipids + water), but the itp files are missing. We will include topologies for:
* General particle types (martini_v3.0.0.itp)
* Solvents (martini_v3.0.0_solvents_v1.itp)
* Small molecules (martini_v3.0.0_small_molecules_v1.itp)
* Phospholipids (martini_v3.0.0_phospholipids_v1.itp)

Either include these by hand into the beginning of topol.top, or run the dirty command below:

%% Cell type:code id: tags:

``` python
!printf '%s\n%s\n%s\n%s\n' "#include \"../TOP/martini_v3.0.0.itp\"" "#include \"../TOP/martini_v3.0.0_solvents_v1.itp\"" "#include \"../TOP/martini_v3.0.0_small_molecules_v1.itp\"" "#include \"../TOP/martini_v3.0.0_phospholipids_v1.itp\""  | cat - topol.top | grep -v "martini.itp" > temp
!mv temp topol.top
```

%% Cell type:markdown id: tags:

Check the updated topol.top file

%% Cell type:code id: tags:

``` python
!cat topol.top
```

%% Cell type:markdown id: tags:

## Add the permeant to the system (structure + topology)

First, we will include one molecule of the chosen type with a GROMACS command. You can take a look at the command options with the handle -h. Here, we add one molecule (-nmol 1) of the downloaded molecule (-ci). We try this so many times that the molecule certainly gets included (-try 100). The output file is called bilayer_permeant.gro (-o)

%% Cell type:code id: tags:

``` python
!gmx insert-molecules -f bilayer.gro -ci {MOLNAME}.gro -nmol 1 -o bilayer_permeant.gro -try 100 -ip positions.dat -dr 3 3 3
```

%% Cell type:markdown id: tags:

Now we will also include the permeant molecule in the topology file (topol.top), and also print the output to see the change.

%% Cell type:code id: tags:

``` python
!printf '%s\n' "$MOLNAME 1" >> topol.top
!cat topol.top
```

%% Cell type:markdown id: tags:

## Energy minimization and equilibration

Now we will run simple energy minimization followed by equilibration with a smaller time step and efficient pressure coupling to relax the bilayer.

%% Cell type:code id: tags:

``` python
!gmx grompp -f MDPs/minimization.mdp -c bilayer_permeant.gro -p topol.top -o em.tpr
```

%% Cell type:code id: tags:

``` python
!gmx mdrun -deffnm em -v -nt 15
```

%% Cell type:code id: tags:

``` python
!gmx grompp -f MDPs/equilibration.mdp -c em.gro -p topol.top -o eq.tpr
```

%% Cell type:code id: tags:

``` python
!gmx mdrun -deffnm eq -v -nt 15
```

%% Cell type:markdown id: tags:

Let's visualize the convergence of the bilayer by looking at the time evolution of the bilayer edge length. Due to semi-isotropic pressure coupling, the membrane remains square, so area per lipid can be obtained by diviving the square of this length by the number of lipids per leaflet.

%% Cell type:code id: tags:

``` python
!echo "Box-X" | gmx energy -f eq.edr -o box.xvg
```

%% Cell type:code id: tags:

``` python
energy = np.loadtxt('box.xvg', comments=['#', '@'])
plt.plot(energy[:, 0], energy[:, 1])
plt.xlabel('Time (ps)')
plt.ylabel('Box edge (nm)')
plt.show()
```

%% Cell type:markdown id: tags:

Area per lipid values for phospholipids are typically 0.50-0.70 nm^2. Is your result in the right ballpark?

%% Cell type:markdown id: tags:

## AWH simulation

Next up, we will take a look at the pull and AWH options at the end of the mdp file, starting from "pull = yes".
We have 2 pull groups; the reference group of "LIPIDS" and the one for the permeant "PERMEANT", which will be changed to the correct molecule name in the next step. We then define one pull coordinate based on the distance between the centres of mass of these two groups. It acts along the direction of the z axis, and uses the external AWH bias.

We use a convolved AWH potential, which contains ... The target distribution of our AWH bias is 1-dimensional and cut off at 40 kJ/mol to avoid the sampmling of unlikely conformations. GROWTH?

The coordinate will be sampled between z values of -4.5 and 4.5, which spans the bilayer and reaches some 2 nanometers to the aqueous phase on both sides of the bilayer. As our simulation system was set to be 12 nm tall, periodicity is not an issue. The initial large force constant ensures rapid convergence of the initial XX state.

%% Cell type:code id: tags:

``` python
!cat MDPs/awh_template.mdp
```

%% Cell type:markdown id: tags:

Let's change the name of the permant in the mdp file.

%% Cell type:code id: tags:

``` python
!sed "s/PERMEANT/${MOLNAME}/g" MDPs/awh_template.mdp > awh.mdp
!sed "s/PERMEANT/{MOLNAME}/g" MDPs/awh_template.mdp > awh.mdp
```

%% Cell type:markdown id: tags:

Let's generate an index file that contains groups for lipids and solvent. This is needed for both the definition of the pull coordinate as well as for temperature coupling groups.

%% Cell type:code id: tags:

``` python
!gmx select -f eq.gro -on index.ndx -s eq.tpr -select "LIPIDS = not resname W and not resname $MOLNAME; SOLVENT = not LIPIDS; SOLVENT; LIPIDS; $MOLNAME;"
```

%% Cell type:markdown id: tags:

Now we will generate the tpr file and run the AWH simulation.

%% Cell type:code id: tags:

``` python
!gmx grompp -f awh.mdp -c eq.gro -p topol.top -n index.ndx -o awh
```

%% Cell type:code id: tags:

``` python
!gmx mdrun -deffnm awh -v -nt 15 -stepout 10000
```

%% Cell type:markdown id: tags:

## Analyze the AWH run

We will first check how the simulation has sampled different $\lambda$ values as a function of time. It should cover them all multiple times for convergence. Here, the simulation was very short for demonstration purposes, so this might not be the case for each chosen solute.

%% Cell type:code id: tags:

``` python
awh = np.loadtxt('awh.xvg', comments=['#', '@'])
plt.plot(awh[:, 0]/1000, awh[:, 1])
plt.ylabel('$z$ coordinate')
plt.xlabel('Time (nanoseconds)')
plt.show()
```

%% Cell type:markdown id: tags:

Let's also take a look at the histogram of the $z$ coordinate, which better visualizes the sampling.

%% Cell type:code id: tags:

``` python
plt.hist(awh[:,1], 50,
         histtype='bar',
         facecolor='b')
plt.xlabel('$z$ coordinate')
plt.ylabel('Occurrence')

plt.show()
```

%% Cell type:markdown id: tags:

Next, we run the tool "gmx awh" to extract the PMF and other profiles stored in the energy file. The tool outputs a file every "awh-nstout" steps (see mdp file), so we put them in a folder, find the last file, move it to the parent folder, and delete the others.

%% Cell type:code id: tags:

``` python
# This doesn't work yet :(
!mkdir OUTPUT
!gmx awh -f awh.edr -s awh.tpr -more -o OUTPUT/awh
lastfile = !ls -tr OUTPUT/*xvg | tail -n 1
!echo {lastfile}
!mv {lastfile} pmf.xvg
!rm -rf OUTPUT
```

%% Cell type:markdown id: tags:

Then we take a look at the AWH outputs. First the PMF and the coordinate bias, which should converge to the same profile given that the curvature of the PMF is not too high at some regions, in which case the bias potential cannot capture its shape.

%% Cell type:code id: tags:

``` python
awh = np.loadtxt('pmf.xvg', comments=['#', '@'])
plt.plot(awh[:, 0], awh[:, 1],'r-')
plt.plot(awh[:, 0], awh[:, 2],'b-')
plt.legend(['PMF','Coordinate bias'])
plt.ylabel('PMF (kJ/mol)')
plt.xlabel('$z$ coordinate (nm)')
plt.show()
```

%% Cell type:markdown id: tags:

We can also check how well we sample the target distribution.

%% Cell type:code id: tags:

``` python
plt.plot(awh[:, 0], awh[:, 4],'r-')
plt.plot(awh[:, 0], awh[:, 5],'b-')
plt.legend(['Reference value distribution','Target ref. value distribution'])
plt.ylabel('PMF (kJ/mol)')
plt.xlabel('$z$ coordinate (nm)')
plt.show()
```

%% Cell type:markdown id: tags:

Finally, let's look at the friction metric to see which regions have the highest friction for your permeant.

%% Cell type:code id: tags:

``` python
plt.plot(awh[:, 0], awh[:, 6],'r-')
plt.ylabel('Friction')
plt.xlabel('$z$ coordinate (nm)')
plt.show()
```

%% Cell type:markdown id: tags:

# Final questions and Conclusions

* Did your simulation converge?
* Where does your solute prefer to locate itself? Where are the free energy barriers?
* From the PMF, how much is the partitioning free energy from the aqueous phase to the bilayer core?
* Does this agree with the water–oil partitioning free energy from the alchemical simulations?

%% Cell type:markdown id: tags:

The MD simulation will be accelerated using multiple CPU cores. `orterun -n $SLURM_CPUS_PER_TASK` will be used to tell the job to use all cores that were requested when this notebook was started.

The run will take a few tens of seconds to a couple of minutes. Feel free to change `$SLURM_CPUS_PER_TASK` to a smaller number. How does it affect the simulation wall-time and performance (ns/day)? These are printed at the end of the output.

Running simulations in parallel using several CPU cores that are each responsible for a subset of the total computational task is the fundamental principle of high-performance computing!

%% Cell type:code id: tags:

``` python
# CHANGE MDRUN COMMANDS TO WORK ON PUHTI?
# !orterun -n $SLURM_CPUS_PER_TASK --oversubscribe gmx_mpi mdrun -deffnm md -v
```

%% Cell type:code id: tags:

``` python
# PREPARE THIS SCRIPT TO CLEAN THE FILES
# !bash clean.sh
```