switched to GROMACS hydrogen bonding analysis. NGL is fine. Added some figures... (702c6c60) · Commits · courses / Gromacs

Membrane_protein/protlig.ipynb

+53 −30

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Tutorial on protein–small molecule and lipid–protein interactions

		In this tutorial, we will set up a simulation system containing a membrane protein, the bound small molecule, the host lipid membrane, and the solvent environment. This will be performed using the [CHARMM-GUI](https://www.charmm-gui.org/) portal. The process of setting up simulations for lipid systems, soluble proteins, or nanomaterils is very similar. Then, we will skip the simulation stage (~2 weeks on Puhti per system), download pre-computed trajectories for this system, and analyze them. Towards the end of the tutorial, we will also analyze an additional systems, listed below.

		## The workflow in a nutshell

		* Generate a simulation setup with the help of CHARMM-GUI
		* Upload the protein, pre-process the protein PDB
		* Set up lipid and solvent environments
		* Set up the ligand parametrization
		* Download files, equilibrate locally
		* Download pre-computed trajectories from Zenodo
		* Analyze the simulations as far as you get (increasing difficulty?)
		* Protein behavior: RMSD, RMSF, RMSF mapped onto structure
		* Behavior of the ligand: RMSD, conformational ensemble
		* Protein--small molecule interactions: hydrogen bonding, contact map, distances
		* Analyze the simulations as far as you get
		* Protein behavior: RMSD, RMSF, distances
		* Protein--small molecule interactions: hydrogen bonding, distances
		* Effect of amino acid mutation on ligand binding
		* Effect of protein onto the host membrane (thickness, order, curvature) (FATSLiM(?))
		* Protein-free reference system for same properties

		## Learning outcomes

		* Set up simulations for (membrane) proteins and ligands in CHARMM-GUI
		* Perform basic analyses using the built-in tools of GROMACS
		* Perform more advanced analyses using Python libraries
		* Perform more advanced analyses using the MDAnalysis library

		## The science behind

		This tutorial is based on [the recent publication](https://doi.org/10.1101/2022.07.03.498529) describing the mechanism of inhibition of the Sec61 translocon by a cotransin derivative. During the biogenesis of membrane proteins and secreted proteins, the translating ribosome sits attached to the transmembrane Sec61 protein complex and pushes the nascent polypeptide chain (unfolded protein) through the Sec61 tunnel either into the ER lumen (secreted and thus soluble proteins exit Sec61 from the end of the tunnel) or to the ER membrane (membrane proteins escape via the lateral gate between the two halves of a clam-like Sec61 structure facing the membrane). As many proteins processed by Sec61 are of therapeutic interest (viral proteins, inflammatory cytokines, growth factor receptors related to cancer to name a few), the substrate-selective inhibition of Sec61 is desireable.

		Cotransin is a cyclic depsipeptide that has been demonstrated to inhibit the translocation of proteins from the ribosome into the ER during co-translational translocation. The cotransin derivative, KZR8445 used in this work, was found to be substrate-selective using biochemical assays. Moreover, it was efficacious towards rheumatoid arthritis in mice, and inhibited the production of COVID-19 capsid protein (CHECK!). Thus, to understand its mechanism of function could help further refine the selectivity of cotransins towards new and specific protein targets.

		To explain the substrate-selective inhibition, cryo-EM of KZR8445-bound Sec61 was performed, and additional density was found at the Sec61 lateral gate, i.e. at the site where the transmembrane proteins exit the channel. KZR8445 was modelled into that density. Here, we will use this structure of the Sec61 bound by KZR8445. We will study the effects of the inhibitor on the protein structure, notably its dynamics and the conformation of the lateral gate. Then, we will find the key hydrogen bonding partners for KZR8445, and confirm their importance using computational mutagenesis. We will also study the penetration of water into the Sec61 tunnel.

		<img src="cryoem.png" alt="fishy" class="bg-primary" width="500px">

		%% Cell type:code id: tags:

		``` python
		import numpy as np
		import MDAnalysis as mda
		from MDAnalysis.analysis import distances
		from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA
		import matplotlib.pyplot as plt
		import nglview as nv
		from ipywidgets import widgets
		from IPython.display import display
		import warnings
		warnings.filterwarnings("ignore")
		```

		%% Cell type:markdown id: tags:

		Let's visualize the protein first!
		Let's visualize the protein & the inhibitor first!

		%% Cell type:code id: tags:

		``` python
		u = mda.Universe('Sec61_KZR8445.pdb')
		view = nv.show_mdanalysis(u)
		view.add_spacefill(lipid.value)
		view.add_unitcell()
		view
		```

		%% Cell type:markdown id: tags:

		## 1) Now let's learn how to set the system in CHARMM-GUI:

		* Go to [CHARMM-GUI](https://www.charmm-gui.org/) and log in
		* Note: along the process, you can always look at the generated structure in the browser!
		* Go to input generator -> Membrane builder -> Bilayer builder
		* Typically, we could use a PDB identifier here. We will use a structure coresponding to [PDB entry 7ZL3](https://www.rcsb.org/structure/7ZL3), but to have matching atom names for the ligand parametrization, we will upload the local PDB "Sec61_KZR8445.pdb" and continue
		* Select all chains (Protein has 3 subunits, the "Hetero" is the bound small molecule (inhibitor) and continue
		* Parametrize the ligand: upload the corresponding MOL2 file
		* Now all sorts of protein modifications are possible, but we'll omit them and continue
		* We use PPM2.0 to orient the protein to the membrane, and use the main subunit (Sec61α, "PROA") for this alignment and continue
		* Let's hydrate the system with 80 waters per lipid, and place it in a membrane containing a total of 400 POPC molecules. Try to balance the lipid numbers in the leaflets so that the leaflet areas agree and we don't have much tension. Here, we can create as complex of a membrane as we like, but we are not going to do a literature review on the lipidomics of the ER membrane now :) When you reach balanced leaflets, continue
		* Let's include 150 mM (0.15 M) of NaCl and neutralizing ions and continue
		* Confirm that there is no penetration (e.g. lipid tails through ring structures) and continue
		* Select a suitable for field combination (we will now use CHARMM36m) and simulation program (GROMACS) and continue
		* Download the .tgz file containing the simulation system and the files required to run an MD simulation for it.

		%% Cell type:markdown id: tags:

		Let's first decompress the tgz file and see whats inside.

		%% Cell type:code id: tags:

		``` python
		!mv ~/Downloads/charmm-gui.tgz .
		!tar -xvf charmm-gui.tgz
		!ls charmm*/gromacs/
		```

		%% Cell type:markdown id: tags:

		CHARMM-GUI outputs handy equilibration scripts that we can now run to relax the protein and lipid structures, and get the proper density of the water. This information is included in the README file. Let's take a look at its contents.

		%% Cell type:code id: tags:

		``` python
		!mv charmm/gromacs/ .
		!rm -rf charmm*
		!ls
		```

		%% Cell type:markdown id: tags:

		For a demonstration, we can run it for a while and then cancel it with CTRL+C

		%% Cell type:code id: tags:

		``` python
		!cat README
		!chmod u+x README
		!./README
		```

		%% Cell type:markdown id: tags:

		Now we could run the entire README script, and it would eventually generate 100 ns of trajectory. But this would take 8 nodes for a day, so we will instead download 1000 ns-long trajectories from a public repository. The pre-computed trajectories are in [Zenodo](https://doi.org/10.5281/zenodo.7303653). Read the description carefully so you know what is in the files. For the next steps, we'll need a run input file (.tpr), a trajectory (.xtc), and an energy file (.edr).

		%% Cell type:code id: tags:

		``` python
		!curl "https://zenodo.org/record/7303653/files/Sec61_KZR8445_R1.xtc?download=1" --output inhibitor.xtc
		!curl "https://zenodo.org/record/7303653/files/Sec61_KZR8445_R1.tpr?download=1" --output inhibitor.tpr
		!curl "https://zenodo.org/record/7303653/files/Sec61_KZR8445_R1.edr?download=1" --output inhibitor.edr
		!curl "https://zenodo.org/record/7303653/files/Sec61_KZR8445_R1.gro?download=1" --output inhibitor.gro
		```

		%% Cell type:markdown id: tags:

		We will also download the same file set for a system without the inhibitor present.

		%% Cell type:code id: tags:

		``` python
		!curl "https://zenodo.org/record/7303653/files/Sec61_noinhibitor_R1.xtc?download=1" --output noinhibitor.xtc
		!curl "https://zenodo.org/record/7303653/files/Sec61_noinhibitor_R1.tpr?download=1" --output noinhibitor.tpr
		!curl "https://zenodo.org/record/7303653/files/Sec61_noinhibitor_R1.edr?download=1" --output noinhibitor.edr
		!curl "https://zenodo.org/record/7303653/files/Sec61_noinhibitor_R1.gro?download=1" --output noinhibitor.gro
		```

		%% Cell type:markdown id: tags:

		We will now start with some basic calculations of protein structure. First, we calculate the root-mean-squared deviation (RMSD) of the protein backbone structure as a function of time with the reference taken from the TPR file (initial configuration). We will do it for the protein with the inhibitor bound, but the same calculation can be repeated for the system without the bound inhibitor.

		%% Cell type:code id: tags:

		``` python
		!echo "Backbone" "Backbone" \| gmx rms -f inhibitor.xtc -s inhibitor.tpr -dt 1000 -o rmsd_inhi
		!echo "Backbone" "Backbone" \| gmx rms -f noinhibitor.xtc -s noinhibitor.tpr -dt 1000 -o rmsd_noinhi
		```

		%% Cell type:markdown id: tags:

		Let's plot the RMSD curves for the two cases.

		%% Cell type:code id: tags:

		``` python
		rmsd_inhi = np.loadtxt('rmsd_inhi.xvg', comments=['#', '@'])
		rmsd_noinhi = np.loadtxt('rmsd_noinhi.xvg', comments=['#', '@'])
		plt.plot(rmsd_inhi[:, 0]/1e6, rmsd_inhi[:, 1],'red')
		plt.plot(rmsd_noinhi[:, 0]/1e6, rmsd_noinhi[:, 1],'blue')
		plt.legend(['With inhibitor', 'Without inhibitor'])
		plt.xlabel('Time (microseconds)')
		plt.ylabel('RMSD (nm)')
		plt.show()
		```

		%% Cell type:markdown id: tags:

		Is this RMSD for a protein normal? Does the inhibitor play a significant role in stabilizing the protein?
		Next, we'll find the more and less flexible regions of the protein with root mean squared fluctuation analysis of the residues.

		%% Cell type:code id: tags:

		``` python
		!echo "Backbone" \| gmx rmsf -f inhibitor.xtc -s inhibitor.tpr -o rmsf_inhi -res
		!echo "Backbone" \| gmx rmsf -f noinhibitor.xtc -s noinhibitor.tpr -o rmsf_noinhi -res
		```

		%% Cell type:code id: tags:

		``` python
		rmsd_inhi = np.loadtxt('rmsf_inhi.xvg', comments=['#', '@'])
		rmsd_noinhi = np.loadtxt('rmsf_noinhi.xvg', comments=['#', '@'])
		plt.plot(rmsd_inhi[:, 0], rmsd_inhi[:, 1],'r*')
		plt.plot(rmsd_noinhi[:, 0], rmsd_noinhi[:, 1],'b*')
		plt.legend(['With inhibitor', 'Without inhibitor'])
		plt.xlabel('Residue no.')
		plt.ylabel('RMSD (nm)')
		plt.show()
		```

		%% Cell type:markdown id: tags:

		It seems that the region around residue 200 is more mobile without the inhibitor. Next, we will check how much the presence of the inhibitor affects the protein conformation. It bounds at the Sec61 lateral gate between helices 2 and 7. We will define the openness of the bate as the distance between the centers of mass of helices 2 and 7. The final conformations of these helices are visualized below for the system with the inhibitor (green, more open gate) and for the system without the inhibitor (yellow, more closed gate).
		It seems that the region around residue 200 is more mobile without the inhibitor. You can go back to the visualization and find out which part of the protein that is.

		Next, we will check how much the presence of the inhibitor affects the protein conformation. The additional density found at the Sec61 lateral gate between helices 2 and 7 was used to fit the inhibitor. At this position, the inhibitor can affect the lateral gate conformation of Sec61. We will define the openness of the bate as the distance between the centers of mass of helices 2 and 7. The final conformations of these helices are visualized below for the system with the inhibitor (green, more open gate) and for the system without the inhibitor (yellow, more closed gate).

		![TM237.png](attachment:TM237.png)
		<div>
		<img src="attachment:TM237.png" width="300"/>
		</div>

		%% Cell type:code id: tags:

		``` python
		u_inhi = mda.Universe('inhibitor.gro', 'inhibitor.xtc')

		# Define the atom selections for TMs 2 and 7
		tm2 = u_inhi.select_atoms('resid 88:96')
		tm7 = u_inhi.select_atoms('resid 288:308')

		# Create an empty list to store the distances
		distances_inhi = []
		time = []

		# Loop over all frames in the trajectory
		for ts in u_inhi.trajectory:
		# Calculate the centers of mass for each selection
		com2 = tm2.center_of_mass()
		com7 = tm7.center_of_mass()

		# Calculate the distance between the centers of mass
		distance = mda.analysis.distances.distance_array(com2,com7)[0][0]

		# Append the distance to the list
		distances_inhi.append(distance)
		time.append(u_inhi.trajectory.time)

		# Print the average distance over the trajectory
		print("Average distance between TM helices 2 and 7 with the inhibitor: %.3f Å" % (sum(distances_inhi) / len(distances_inhi)))
		```

		%% Cell type:markdown id: tags:

		Repeat for the system without inhibitor

		%% Cell type:code id: tags:

		``` python
		u_noinhi = mda.Universe('noinhibitor.gro', 'noinhibitor.xtc')
		tm2 = u_noinhi.select_atoms('resid 88:96')
		tm7 = u_noinhi.select_atoms('resid 288:308')
		distances_noinhi = []

		# Loop over all frames in the trajectory
		for ts in u_noinhi.trajectory:
		# Calculate the centers of mass for each selection
		com2 = tm2.center_of_mass()
		com7 = tm7.center_of_mass()

		# Calculate the distance between the centers of mass
		distance = mda.analysis.distances.distance_array(com2,com7)[0][0]

		# Append the distance to the list
		distances_noinhi.append(distance)

		# Print the average distance over the trajectory
		print("Average distance between TM helices 2 and 7 without the inhibitor: %.3f Å" % (sum(distances_noinhi) / len(distances_noinhi)))
		```

		%% Cell type:markdown id: tags:

		Plot the distances in the same plot.

		%% Cell type:code id: tags:

		``` python
		plt.plot(time,distances_noinhi)
		plt.plot(time,distances_inhi)
		plt.legend(['Without inhibitor', 'With inhibitor'])
		plt.xlabel('Time (microseconds)')
		plt.ylabel('Distance (nm)')
		plt.show()
		```

		%% Cell type:markdown id: tags:

		The inhibitor will clearly push the gate open and occypy that space.

		We will then move on to study hydrogen bonding. Experiments on various Sec61 inhibitors have identified N300 and Q127 as residues that maintain the lateral gate in a closed conformation. Since the inhibitor occupies this space, it could well interact with these two residues. We will first do a simple distance calculation between these two residues and the ligand.
		We will then move on to study the interactions of the inhibitor with the protein. Experiments on various Sec61 inhibitors have identified N300 and Q127 as residues that maintain the lateral gate in a closed conformation. Since the inhibitor occupies this space, it could well interact with these two residues. We will first do a simple distance calculation between these two residues and the ligand.

		%% Cell type:code id: tags:

		``` python
		# Define the atom selections for the ligand and residue 300 of the protein
		ligand = u_inhi.select_atoms('resname KZC')
		residue300 = u_inhi.select_atoms('resnum 300')
		residue127 = u_inhi.select_atoms('resnum 127')

		# Create an empty list to store the minimum distances
		min_distances_127 = []
		min_distances_300 = []
		time=[]

		# Loop over all frames in the trajectory
		for ts in u_inhi.trajectory:
		# Calculate the distance between residue 300 and the ligand using the MDAnalysis analysis.distances module
		distances_127 = mda.analysis.distances.distance_array(residue127.positions, ligand.positions)
		distances_300 = mda.analysis.distances.distance_array(residue300.positions, ligand.positions)
		# Find the minimum distance and append it to the list of minimum distances
		min_distance_127 = np.min(distances_127)
		min_distance_300 = np.min(distances_300)
		min_distances_127.append(min_distance_127)
		min_distances_300.append(min_distance_300)
		time.append(u_inhi.trajectory.time)

		# Convert the minimum distances list to a NumPy array
		min_distances_127 = np.array(min_distances_127)
		min_distances_300 = np.array(min_distances_300)
		time=np.array(time)

		plt.plot(time/1e6,min_distances_127/10,'r-')
		plt.plot(time/1e6,min_distances_300/10,'b-')
		plt.legend(['Q127', 'N300'])
		plt.xlabel('Time (microseconds)')
		plt.ylabel('Distance (nm)')
		plt.show()
		```

		%% Cell type:markdown id: tags:

		N300 looks like a promising hydrogen bonding candidate, whereas Q127 seems to drift further as the lateral gate is pushed open. Let's do a hydrogen bonding analysis for N300.
		N300 looks like a promising hydrogen bonding candidate, whereas Q127 seems to drift further as the lateral gate is pushed open. Let's do a hydrogen bonding analysis for N300. We'll simply generate GROMACS index groups for residues 127 and 300 and calculate the time evolution of hydrogen bonds between these residues and the inhbitor.

		%% Cell type:code id: tags:

		``` python
		hbonds = HBA(universe=u_inhi, between=['resname KZC', 'protein'])
		protein_hydrogens_sel = "protein and name H*"
		protein_acceptors_sel = "protein and name O* N*"
		inhi_hydrogens_sel = "resname KZC and name H*"
		inhi_acceptors_sel = "resname KZC and name O* N*"
		!gmx select -f inhibitor.gro -on hbond.ndx -s inhibitor.tpr -select "KZC = resname KZC; Q127 = resnr 127; N300 = resnr 300; Q127; N300; KZC;"
		!echo "Q127" "KZC" \| gmx hbond -f inhibitor.xtc -s inhibitor.tpr -n hbond.ndx -num hbnum_q127.xvg
		!echo "N300" "KZC" \| gmx hbond -f inhibitor.xtc -s inhibitor.tpr -n hbond.ndx -num hbnum_n300.xvg
		```

		%% Cell type:markdown id: tags:

		Now, let's plot the time development of the hydrogen bond numbers between the inhibitor and these two key residues.

		#inhi_hydrogens_sel = "resname KZC and name H1 H2"
		#inhi_acceptors_sel = "resname KZC and name OH2"
		%% Cell type:code id: tags:

		hbonds.run()
		``` python
		hb_q127 = np.loadtxt('hbnum_q127.xvg', comments=['#', '@'])
		hb_n300 = np.loadtxt('hbnum_n300.xvg', comments=['#', '@'])
		plt.plot(hb_q127[:, 0]/1e6, hb_q127[:, 1],'r*')
		plt.plot(hb_n300[:, 0]/1e6, hb_n300[:, 1],'b*')
		plt.legend(['Q127', 'N300'])
		plt.xlabel('Time (microseconds)')
		plt.ylabel('Hydrogen bonds')
		plt.show()
		```

		%% Cell type:markdown id: tags:

		Let's now take a look at a simulation where we mutated asparagine 300 involved in hydrogen bonding with the inhibitor into nonpolar alanine

		%% Cell type:code id: tags:

		``` python
		!curl "https://zenodo.org/record/7303653/files/Sec61_KZR8445_N300A.xtc?download=1" --output mutant.xtc
		!curl "https://zenodo.org/record/7303653/files/Sec61_KZR8445_N300A.tpr?download=1" --output mutant.tpr
		!curl "https://zenodo.org/record/7303653/files/Sec61_KZR8445_N300A.gro?download=1" --output mutant.gro
		```

		%% Cell type:code id: tags:

		``` python
		# Define the atom selections for the ligand and residue 300 of the protein
		u_mut = mda.Universe('mutant.gro', 'mutant.xtc')

		ligand = u_mut.select_atoms('resname KZC')
		residue300 = u_mut.select_atoms('resnum 300')

		# Create an empty list to store the minimum distances
		min_distances_300_mut = []

		# Loop over all frames in the trajectory
		for ts in u_mut.trajectory:
		# Calculate the distance between residue 300 and the ligand using the MDAnalysis analysis.distances module
		distances_300_mut = mda.analysis.distances.distance_array(residue300.positions, ligand.positions)
		# Find the minimum distance and append it to the list of minimum distances
		min_distance_300_mut = np.min(distances_300_mut)
		min_distances_300_mut.append(min_distance_300_mut)

		# Convert the minimum distances list to a NumPy array
		min_distances_300_mut = np.array(min_distances_300_mut)

		plt.plot(time/1e6,min_distances_300/10,'r-')
		plt.plot(time/1e6,min_distances_300_mut/10,'b-')
		plt.legend(['Wild type', 'N300A mutant'])
		plt.xlabel('Time (microseconds)')
		plt.ylabel('Distance (nm)')
		plt.show()
		```

		%% Cell type:markdown id: tags:

		The N300A mutation clearly leads to the detachment of the inhibitor. This is also evidenced by functional assays in which the inhibitory effect of KZR8445 is significantly reduced upon the same mutation.
		The N300A mutation clearly leads to the detachment of the inhibitor. This is also evidenced by functional assays in which the inhibitory effect of KZR8445 is significantly reduced upon the same mutation. Here, the inhibitor clearly inhibits protein translocation as evidenced by Gaussia luciferase reporter constructs. With the N300A mutation, the inhibition decreases, and based on the MD simulations, this is due to the inhibitor leaving its binding site.

		<img src="inhibition.png" alt="fishy" class="bg-primary" width="500px">

		%% Cell type:markdown id: tags:

		## POSSIBLE OTHER ANALYSES TO INCLUDE:
		* MEMBRANE SHAPE (FATSLiM)
		* ORDER AND THICKNESS MAPS (COMPARE TO PROTEIN-FREE MEMBRANE?)
		* LIPID-PROTEIN INTERACTIONS (PyLipInt)
		## POSSIBLE OTHER ANALYSES TO INCLUDE (HECTOR?):
		* DENSITY PROFILES
		* WATER BETWEEN THE DENSITY PROFILE PEAKS (INSIDE PROTEIN CHANNEL)
		* FLIP-FLOP (matti will check for any)
		* LATERAL DIFFUSION (plot coordinates)