If you do not have the PDB documentation, you may find the following summary of the PDB file format useful. The Protein Data Bank is a computer-based archival database for macromolecular structures. The database was established in 1971 by Brookhaven National Laboratory, Upton, New York, as a public domain repository for resolved crystallographic structures. The Bank uses a uniform format to store atomic coordinates and partial bond connectivities as derived from crystallographic studies. In 1999 the Protein Data Bank moved to the Research Collaboratory for Structural Biology.
PDB file entries consist of records of 80 characters each. Using the punched card analogy, columns 1 to 6 contain a record-type identifier, the columns 7 to 70 contain data. In older entries, columns 71 to 80 are normally blank, but may contain sequence information added by library management programs. In new entries conforming to the 1996 PDB format, there is other information in those columns. The first four characters of the record identifier are sufficient to identify the type of record uniquely, and the syntax of each record is independent of the order of records within any entry for a particular macromolecule.
The only record types that are of major interest to the RasMol program are the ATOM and HETATM records which describe the position of each atom. ATOM/HETATM records contain standard atom names and residue abbreviations, along with sequence identifiers, coordinates in Ångstrom units, occupancies and thermal motion factors. The exact details are given below as a FORTRAN format statement. The "fmt" column indicates use of the field in all PDB formats, in the 1992 and earlier formats or in the 1996 and later formats.
Residues occur in order starting from the N-terminal residue for proteins and 5'-terminus for nucleic acids. If the residue sequence is known, certain atom serial numbers may be omitted to allow for future insertion of any missing atoms. Within each residue, atoms are ordered in a standard manner, starting with the backbone (N-C-C-O for proteins) and proceeding in increasing remoteness from the alpha carbon, along the side chain.
HETATM records are used to define post-translational modifications and cofactors associated with the main molecule. TER records are interpreted as breaks in the main molecule's backbone.
If present, RasMol also inspects HEADER, COMPND, HELIX, SHEET, TURN, CONECT, CRYST1, SCALE, MODEL, ENDMDL, EXPDTA and END records. Information such as the name, database code, revision date and classification of the molecule are extracted from HEADER and COMPND records, initial secondary structure assignments are taken from HELIX, SHEET and TURN records, and the end of the file may be indicated by an END record.
Atoms located at 9999.000, 9999.000, 9999.000 are assumed to be Insight pseudo atoms and are ignored by RasMol. Atom names beginning ' Q' are also assumed to be pseudo atoms or position markers.
When a data file contains an NMR structure, multiple conformations may be placed in a single PDB file delimited by pairs of MODEL and ENDMDL records. RasMol displays all the NMR models contained in the file.
Residue names "CSH", "CYH" and "CSM" are considered pseudonyms for cysteine "CYS". Residue names "WAT", "H20", "SOL" and "TIP" are considered pseudonyms for water "HOH". The residue name "D20" is consider heavy water "DOD". The residue name "SUL" is considered a sulphate ion "SO4". The residue name "CPR" is considered to be cis-proline and is translated as "PRO". The residue name "TRY" is considered a pseudonym for tryptophan "TRP".
RasMol uses the HETATM fields to define the sets hetero, water, solvent and ligand. Any group with the name "HOH", "DOD", "SO4" or "PO4" (or aliased to one of these names by the preceding rules) is considered a solvent and is considered to be defined by a HETATM field.
RasMol only respects CONECT connectivity records in PDB files containing fewer than 256 atoms. This is explained in more detail in the section on determining molecule connectivity. CONECT records that define a bond more than once are interpreted as specifying the bond order of that bond, i.e. a bond specified twice is a double bond and a bond specified three (or more) times is a triple bond. This is not a standard PDB feature.
RasMol also accepts the supplementary COLO record type in the PDB files. This record format was introduced by David Bacon's Raster3D program for specifying the colour scheme to be used when rendering the molecule. This extension is not currently supported by the PDB. The COLO record has the same basic record type as the ATOM and HETATM records described above.
Colours are assigned to atoms using a matching process. The Mask field is used in the matching process as follows. First RasMol reads in and remembers all the ATOM, HETATM and COLO records in input order. When the user-defined ('User') colour scheme is selected, RasMol goes through each remembered ATOM/HETATM record in turn, and searches for a COLO record that matches in all of columns 7 through 30. The first such COLO record to be found determines the colour and radius of the atom.
Note that the Red, Green and Blue components are in the same positions as the X, Y, and Z components of an ATOM or HETA record, and the van der Waals radius goes in the place of the Occupancy. The Red, Green and Blue components must all be in the range 0 to 1.
In order that one COLO record can provide colour and radius specifications for more than one atom (e.g. based on residue, atom type, or any other criterion for which labels can be given somewhere in columns 7 through 30), a 'don't-care' character, the hash mark "#" (number or sharp sign) is used. This character, when found in a COLO record, matches any character in the corresponding column in a ATOM/HETATM record. All other characters must match identically to count as a match. As an extension to the specification, any atom that fails to match a COLO record is displayed in white.
Once multiple NMR conformations have been loaded they may be manipulated with the atom expression extensions described in 'Primitive Expressions'. In particular, the command 'restrict */1' will restrict the display to the first model only.
CIF is the IUCr standard for presentation of small molecules and mmCIF is intended as the replacement for the fixed-field PDB format for presentation of macromolecular structures. RasMol can accept data sets in either format.
There are many useful sites on the World Wide Web where information tools and software related to CIF, mmCIF and the PDB can be found. The following are good starting points for exploration:
The International Union of Crystallography (IUCr) provides access to software, dictionaries, policy statements and documentation relating to CIF and mmCIF at: IUCr, Chester, England (www.iucr.org/iucr-top/cif/) with many mirror sites.
The Nucleic Acid Database Project provides access to its entries, software and documentation, with an mmCIF page giving access to the dictionary and mmCIF software tools at Rutgers University, New Jersey, USA (http://ndbserver.rutgers.edu/NDB/mmcif) with many mirror sites.
This version of RasMol restricts CIF or mmCIF tag values to essentially the same conventions as are used for the fixed-field PDB format. Thus chain identifiers and alternate conformation identifiers are limited to a single character, atom names are limited to 4 characters, etc. RasMol interprets the following CIF and mmCIF tags:
A search is made through multiple data blocks for the desired tags, so a single dataset may be composed from multiple data blocks, butmultiple data sets may not be stacked in the same file.
A script is a text file containing a set of commands that are executed sequentially by RasMol using the 'script' command. The RasMol command 'source' is synonymous with the 'script' command.
All command are allowed in a script file; this includes the 'load' command to load a molecular coordinate file during this execution of the script. Be sure in this case to write a 'zap' command just before to erase any molecule already loaded.
A RasMol script file may contain a further 'script' command up to a maximum "depth" of 10, allowing complicated sequences of actions to be executed. This is particularly useful for teaching purpose in combination with the 'pause' command to interrupt transiently the script execution.
The 'echo' command produces a text output on the command line allowing online annotation. This command is particularly useful when preceding a 'pause' command for example.
RasMol ignores all characters after the first '#' character on each line allowing the script text files to be annotated.
The 'refresh off' command stops the refreshing of the screen and next script commands molecules are executed "in the background". The 'refresh' command refreshes the current drawing. 'Refresh' commands must appear by pair in a script ('refresh off' <script> 'refresh on' ), otherwise serious scratches may result. Alternatively the 'molecule hide' and 'molecule show' command allows a molecule to be invisible during transformations.
The 'exit' command executed within a script terminates its current execution. The command applies for the current 'script level' only and is useful for mixing molecular coordinate files with script files (see below inline script).
Inline script files combine scripting information with molecular coordinates in a unique file.
The format is typically:
#!rasmol -script (optional) zap load <fileformat> inline ... exit <molecular data>
Typically this is used in the command 'load pdb inline'. The 'exit' command terminates execution of the current script and returns control to the command line (or the calling script). This means any lines following 'exit' are never interpreted by RasMol. These may be used to store atomic coordinates in PDB, CIF or mmCIF file format.
Sometimes we want to download a molecule and analyze its structure without going into the cumbersome procedure of multiple selections and drawing to generate a simple rendering. Some atom sets are also difficult to define such as those implicating lateral chains in proteins and some DNA subsets. The idea of templates is to have ready-to-use rendering scripts that would do the job. RasTop package contains one example in the folder data; it is not complete and does not give a very nice rendering, but illustrates the principle. Here is a sort of backbone to organize a template:
#!rasmol -script # template name # authorship(s) select <atoms of interest> <erase all drawings> <make specific drawings> select <successively each kind of atoms, ex: proteins, helix, RNA, DNA, water, etc.> <erase all drawings> <make specific drawing> select all
Send your template suggestions for inclusion in the RasTop package.
RSM scripts are RasTop specific.
RSM scripts are an attempt to link the scripting information with the molecular coordinates in a unique file in a slightly different way from inline scripts. The term RSM files with the <*.rsm> extension is just a convenient way for RasTop to handle these files and does not mean in any way that the molecular coordinates that they contain are written in a specific molecular format. Indeed, the original extension, for example <*.cif> or <*.pdb>, may be conserved.
When a file is opened, RasTop looks for the field '#!rasmol -rsm file'. When found, RasTop looks for the field 'load <format> inline' on the next line. If it finds one, RasTop asks RasMol to reopen the file and to load the molecule under the specified format. Then RasTop comes back to the initial script line and executes the remaining of the script. For compatibility between molecular files and RSM files, it is judicious to stop the molecular file before the start of the script. This is done in pdb files by the keyword END located at the beginning of a new line. The script can make use of the keyword 'exit'. This causes the script to stop and allows the user to put other kind of information at the end of the file. In RasTop 1.3.1 the 'exit' keyword is automatically entered at the end.
<molecular coordinate data in format> #!rasmol -rsm file load <format> inline <rasmol script to describe the molecule> exit (optional)
The tag '#!rasmol -rsm file' is a required field and marks the beginning of the script. The <format> field indicates the format of the molecular coordinates. The 'exit' command at the end of the script is optional.
Starting version 2.0, RasTop saves multiple molecules in a single <RSM> script. The first molecule is saved in the main script <filename.rsm>, molecules of higher index are saved with the same filename appended by the suffix 'wn' where n is the molecule index (filenamew1, filenamew2, etc.). Following is a typical main script annotated for better understanding:
#!rasmol -rsm file #required
zap set connect on load <format> inline #required set title <title>
# Colour details <...>
# Transformation reset rotate molecule #molecules are loaded in a world reset to origin set picking ident #some commands are picking mode dependent! set world size <max value> #make translation values coherent zoom <value> #define zoom before translation <...> <...> # Molecules add <title>w2 #add multiple molecules add <title>w3 add <title>w4 <...> # World Transformation refresh off #update translation but don't show it on screen rotate world #start world loading centre origin #only way to make zooming and rotation coherent <...> refresh off #update translation before centering molecule <n> #select centre molecule centre <...> #centre molecule 1 #select first molecule refresh #you must refresh exit
Note: Rotation and centering commands are updated immediately; translation and zooming commands need an update (refresh) to take effect.
Scripting a world is not an easy task. RasTop version 2.0 is an attempt for some kind of world scripting and there is no warranty that future versions of RasMol will use the same set of keywords (see RasTop World Introduction for more details).
Clipboard formats are RasTop specific.
RasTop allows on the command line the command 'clipboard image', 'clipboard selected', and 'clipboard position' to copy the current image, current atom selection, or current molecule position to the clipboard, respectively. The two latter are in text format with the following starting tags:
#!rasmol -selection <rasmol script to describe the current atom selection>
#!rasmol -position <rasmol script to describe the current molecule or world position>
When the world is active, the world position is copied to the clipboard. When the molecule is active, the molecule position is copied to the clipboard.
The command 'clipboard paste' is used to paste the clipboard content. A different running instance of RasTop may paste the clipboard.
RasTop recognizes the three following tags:
#!rasmol -script #!rasmol -position #!rasmol -selection
Therefore, it is possible to copy a script from a text editor and to paste it into RasTop, to the condition that it starts with the '#!rasmol -script' tag.