RasMol atom expressions uniquely identify an arbitrary group of atoms within a molecule. Atom expressions are composed of either primitive expressions, predefined sets, comparison operators, 'within' expressions, or logical (boolean) combinations of the above expression types.
The logical operators allow complex queries to be constructed out of simpler ones using the standard boolean connectives 'and', 'or' and 'not'. These may be abbreviated by the symbols "&", "|" and "!", respectively. Parentheses (brackets) may be used to alter the precedence of the operators. For convenience, a comma may also be used for boolean disjunction.
The atom expression is evaluated for each atom, hence 'protein and backbone' selects protein backbone atoms, not the protein and [nucleic] acid backbone atoms!
RasMol primitive expressions are the fundamental building blocks of atom expressions. There are two types of primitive expression. The first type is used to identify a given residue number or range of residue numbers. A single residue is identified by its number (position in the sequence), and a range is specified by lower and upper bounds separated by a hyphen character. For example 'select 5,6,7,8' is also 'select 5-8'. Note that this selects the given residue numbers in all macromolecule chains.
The second type of primitive expression specifies a sequence of fields that must match for a given atom. The first part specifies a residue (or group of residues) and an optional second part specifies the atoms within those residues. The first part consists of a residue name, optionally followed by a residue number and/or chain identifier.
A residue name typically consists of up to three alphabetic characters, which are case insensitive. Hence the primitive expressions 'SER' and 'ser' are equivalent, identifying all serine residues. Residue names that contain non-alphabetic characters, such as sulphate groups, may be delimited using square brackets, i.e. '[SO4]'.
The residue number is intended to be the residue's position in the macromolecule sequence, but negative sequence numbers, gaps in numbering, or even reverse numbering are permitted in the PDB format. Care must be taken when specifying both residue name and number. If the group at the specified position isn't the specified residue then no atoms are selected.
The chain identifier is typically a single case-insensitive alphabetic or numeric character. Numeric chain identifiers must be distinguished or separated from residue numbers by a colon character. For example, "SER70:A" for the alphabetic chain identifier, "A", or "SER70:1" for the numeric chain identifier, "1".
A second colon is used to specify an alternate conformer or an NMR model. For example the expression "CYS32:A:25.SG" denotes the gamma sulfur of residue cysteine 32 in chain A of model 25.
The second part consists of a period character followed by an atom name. An atom name may be up to four alphabetic or numeric characters. An optional semicolon or a slash followed by a conformation identifier or a model number may also be appended to the atom name.
An asterisk may be used as a wild card for a whole field and a question mark as a single character wildcard. The wildcard "*" may be dropped in residue identifier, e.g. ":A" for chain A, ":A:4" for chain A of model 4, or "::4" for all atoms in all chains of NMR model 4.
Parts of a molecule may also be distinguished using equality, inequality and ordering operators on their properties. The format of such comparison expression is a property name, followed by a comparison operator and then an integer value.
The atom properties that may be used in RasMol are 'atomno' for the atom serial number, 'elemno' for the atom's atomic number (element), 'resno' for the residue number, 'radius' for the spacefill radius in RasMol units (or zero if not represented as a sphere) and 'temperature' for the PDB isotropic temperature value.
The equality operator is denoted either "=" or "==". The inequality operator as either "<>", "!=" or "/=". The ordering operators are "<" for less than, "<=" for less than or equal to, ">" for greater than, and ">=" for greater than or equal to.
Examples: resno < 23 temperature >= 900 atomno == 487
A RasMol 'within' expression allows atoms to be selected on their proximity to another set of atoms. A 'within' expression takes two parameters separated by a comma and surrounded by parentheses. The first argument is an integer value called the "cut-off" distance of the within expression and the second argument is any valid atom expression. The cut-off distance is expressed in either integer RasMol units or Ångstroms containing a decimal point. An atom is selected if it is within the cut-off distance of any of the atoms defined by the second argument. This allows complex expressions to be constructed containing nested 'within' expressions.
For example, the command 'select within(3.2,backbone)' selects any atom within a 3.2 Ångstrom radius of any atom in a protein or nucleic acid backbone. 'Within' expressions are particularly useful for selecting the atoms around an active site.
The following table gives some useful examples of RasMol atom expressions.
Expression Interpretation * All atoms cys Atoms in cysteines hoh Atoms in heterogeneous water molecules as? Atoms in either asparagine or aspartic acid *120 Atoms at residue 120 of all chains *p Atoms in chain P *.n? Nitrogen atoms cys.sg Sulphur atoms in cysteine residues ser70.c? Carbon atoms in serine-70 hem*p.fe Iron atoms in the Heme groups of chain P *.*;A All atoms in alternate conformation A */4 All atoms in model 4
Examples using combination of basic expressions
backbone and not helix within( 8.0, ser70 ) not (hydrogen or hetero) not *.FE and hetero 8, 12, 16, 20-28 arg, his, lys