Preparing molecular models for MD simulations

© 2010 Universitat Pompeu Fabra. Author: Toni Giorgino - Draft version (Jul/2010)

Introduction

Preparation of molecular models for MD simulations cannot be covered in a short tutorial. This guide is meant as a primer for non-experts. Almost all of the steps provided here can be questioned in some way. We shall discuss how to prepare structures to be built in the CHARMM or AMBER forcefield, depending on which of the two is more convenient; both rely on well-tested forcefields and can be run with the ACEMD MD code (as well as many others). We will not cover the specifics of VMD or of the molecular dynamics programs, nor the building of DNA/RNA fragments.

Requirements. In this tutorial we shall assume that you have proficiency with the VMD program; the ability to rename residues is especially important. The following (free) programs should be installed: VMD, AmberTools. Other packages will be useful as well: OpenBabel (format conversions and PDB cleanup), the pdb_ter VMD plugin (insertion of TER cards between molecules), USCF Chimera (easy manipulation, automated AMBER building). Note that AmberTools, even though it is a part of the amber package, it is distributed for free. The psfgen utilities are distributed with VMD.

The starting structure

We assume that you have a 3D model of the protein to be modelled. Such models are usually retrieved from the RCSDB database in PDB format. (In principle, homology models can also be used with special care.)

The PDB format provides the three-dimensional coordinates of (a fraction of) the atoms in the protein ("structure"). Our final objective will be the creation of

  • a pdb structure of the complete system to be modelled (including solvent and ions)

  • a file that provide a complete description of which atoms is connected to which ("topology"), and

  • one with the energy terms of all the interactions ("parameters"). For a description of these parameters, see any MD primer.

In the following, we shall decide whether to use the CHARMM or the AMBER topology formats.

CHARMM:

  • we shall use the "psfgen" software to build a molecular topology, in .psf format
  • parameters, in a .par file, are packaged with NAMD and other software

AMBER:

  • we shall use the "tleap" program to build a combined topology+parameters file, called ".prmtop".
  • parameters for standard residues are provided with the AmberTools distribution

  • parameters for non-standard residues are stored in these files
    • .prepi - coordinates, topology and partial charges
    • .frcmod - forcefield parameters
    • (or, alternatively, combined in a .lib or .off "proprietary" format for later reuse)

Fixing the structure

Proteins in the PDB are not usually ready to run. The objective of the preparation stage is to fix one or more of the following problems.

  • Missing residues. (Which may or may not be an indication of a disordered region). If the missing region is large, you probably have to omit it from the simulation altogether. In this case, you may decide to cap the truncated peptide bonds with uncharged acetylathed (ACE/NME) termini.
  • Non-standard residues. The library of parameters provided by topology-building programs covers only standard residues. Other parameters need to be constructed somehow. Parameters for many non-standard residues can be found on the internet, usually in CHARMM or AMBER formats. This should orient you towards one of the two formats. In any case, make sure that the residues are really non-standard, rather than simple renaming of existing ones (e.g. D-chirality forms, which can use the same parameters as the L-, as in 1JNO).
  • In case of a transmembrane protein, the bilayer will be missing. If you are attempting to model a transmembrane protein, you will first have to align its transmembrane helices with one of the major axes. Then, you have to embed the molecule in a bilayer. Finally, the bilayer parameters have to be obtained.
    • The CHARMM forcefield already includes terms for lipids, and is therefore recommended.
    • Recently, AmberTools include the GLYCAM parameter set; however, building lipids is not straightforward (see antechamber manual).

    • Alternatively, if using AMBER, lipids can be modelled with the GAFF forcefield as if they were a drug (see below). [fixme: papers for the three]
    • In all cases, an important task will be to rename ATOM names in the PDB file to match those expected by the parameter set. In case of GAFF, atom names should be unique.
    • Keep into account that equilibration of a membrane-embedded structure may take tens or hundreds of nanoseconds, and you may have to do multiple attempts.
  • Disulphide bonds. If present, they will be marked in the PDB header. One must make sure that they are included in the topology. In CHARMM, it is done with the "patch" function of psftools. In AMBER, see the tleap manual and the "bond" command.
  • Undefined protonation state on histidines. You will have to assign a proper protonation state for each histidine residue. It is recommended that you run the structure through one of the following services [to be fixed]. The HIS residue names should be changed accordingly.
  • Undefined rotameric states for some side-chains. Use the same sites as above.
  • Crystallographic water molecules. These should be retained. Make sure that they have a standard residue name such as WAT or HOH.
  • Small molecules used for crystallization. These can be removed.
  • Drugs and other ligands of interest in the model. A widely used option for modelling drugs in MD is the GAFF forcefield. GAFF provides bonded and non-bonded terms for most atom types in a way that is compatible with the AMBER forcefield for proteins. You will therefore be forced to use AmberTools and the AMBER format.

    • Before you can proceed, you should compute the correct protonation state (main microspecie) of the drug and, if appropriate, the main stereoisomer. Use the MarvinSketch or PROPKA programs.

    • If necessary, change PDB HETATM records into ATOM.
    • Make sure that the atom names are unique, i.e. the drug should have only one atom of each type. Usually antechamber does the renaming for you.
    • Then you will have to derive partial charges for the various atoms of the drug. There are two main roads. (a) Use the AM1-BCC semiempirical approach; or (b) an ab-initio quantum program (Gaussian) with RESP fitting. See the corresponding antechamber tutorial. The two should give comparable results (check), except in the presence of buried atoms.

    • GAFF may be missing some of the required parameters. For this, use the prmchk utility of AmberTools (explained in the same tutorial above). Save the output in a ".frcmod" file, to be used later.

    • Partial charges calculations are conformation and orientation-dependent. If going for (a), make sure the initial structure is approximately minimized (use OpenBabel). In principle, one should compute them in an ensemble of molecular conformations. This is taken care of in the R.E.D. suite (which however requires a special license).

Automated programs

In a few simple cases you may succeed to use automated-building programs. In VMD, try the "autopsf" tool; it should identify disulphide bonds and chain termini. It will not work in the presence of non-standard residues. Make sure that each disconnected chain is assigned a different "segment ID". For more extended instructions, see ....

For the AMBER topology, you can try the "build prmtop" function inside the Chimera modelling program. If non-standard residues are found, chimera will attempt to assign partial charges using AM1-BCC. Note that, in AMBER, molecules should be separated by "TER" cards. If missing, these can be added with the pdb_ter plugin of VMD.

Before continuing, be aware that there are various conventions for atom names. In particular, AMBER and psftools disagree on the names for hydrogen atoms (IUPAC versus old PDB). As a quick fix, you may have to remove hydrogens altogether, or use OpenBabel (command line) or YASARA view (gui) to convert.

License

Licenza Creative Commons
Questo/a opera รจ pubblicato sotto una Licenza Creative Commons.

Acknowledgments

Work partially supported by the VPH-NoE.

http://img191.imageshack.us/img191/496/vphlogo.png

Copyright 2008-2009. All rights reserved.