Forensics, DNA Fingerprinting, and CODIS
Forensics, DNA Fingerprinting, and CODIS
DNA is present in nearly every cell of our bodies, and we leave cells behind everywhere we go without even realizing it. Flakes of skin, drops of blood, hair, and saliva all contain DNA that can be used to identify us. In fact, the study of forensics, commonly used by police departments and prosecutors around the world, frequently relies upon these small bits of shed DNA to link criminals to the crimes they commit. This fascinating science is often portrayed on popular television shows as a simple, exact, and infallible method of finding a perpetrator and bringing him or her to justice. In truth, however, teasing out a DNA fingerprint and determining the likelihood of a match between a suspect and a crime scene is a complicated process that relies upon probability to a greater extent than most people realize. Government-administered DNA databases, such as the Combined DNA Index System (CODIS), do help speed the process, but they also bring to light complex ethical issues involving the rights of victims and suspects alike. Thus, understanding the ways in which DNA evidence is obtained and analyzed, what this evidence can tell investigators, and how this evidence is used within the legal system is critical to appreciating the true ethical and legal impact of forensic genetics.
How Does DNA Identification Work?
Although the overwhelming majority of the human genome is identical across all individuals, there are regions of variation. This variation can occur anywhere in the genome, including areas that are not known to code for proteins. Investigation into these noncoding regions reveals repeated units of DNA that vary in length among individuals. Scientists have found that one particular type of repeat, known as a short tandem repeat (STR), is relatively easily measured and compared between different individuals. In fact, the Federal Bureau of Investigation (FBI) has identified 13 core STR loci that are now routinely used in the identification of individuals in the United States, and Interpol has identified 10 standard loci for the United Kingdom and Europe. Nine STR loci have also been identified for Indian populations.
As its name implies, an STR contains repeating units of a short (typically three- to four-nucleotide) DNA sequence. The number of repeats within an STR is referred to as an allele. For instance, the STR known as D7S820, found on chromosome 7, contains between 5 and 16 repeats of GATA. Therefore, there are 12 different alleles possible for the D7S820 STR. An individual with D7S820 alleles 10 and 15, for example, would have inherited a copy of D7S820 with 10 GATA repeats from one parent, and a copy of D7S820 with 15 GATA repeats from his or her other parent. Because there 12 different alleles for this STR, there are therefore 78 different possible genotypes, or pairs of alleles. Specifically, there are 12 homozygotes, in which the same allele is received from each parent, as well as 66 heterozygotes, in which the two alleles are different.
The Statistical Strength of a 13-STR Profile
Within the U.S., the 13-STR profile is a widely used means of identification, and this technology is now routinely employed to identify human remains, to establish or exclude paternity, or to match a suspect to a crime scene sample.
In order to utilize STR information as a means of human identification, the FBI established the frequency with which each allele of each of the 13 core STRs naturally occurs in people of different ethnic backgrounds. To this end, the FBI analyzed DNA samples from hundreds of unrelated Caucasian, African American, Hispanic, and Asian individuals. Assuming that all 13 STRs follow the principle of independent assortment (and they should, as they are scattered widely across the genome) and that the population randomly mates, a statistical calculation based upon the FBI-determined STR allele frequencies reveals that the probability of two unrelated Caucasians having identical STR profiles, or so-called "DNA fingerprints," is approximately 1 in 575 trillion (Reilly, 2001).
This very small number needs to be put into perspective. Note that this figure refers to pairs of people, and there are many pairs of people in the world. Indeed, for the 100 million Caucasians in the world, there are 5,000 trillion pairs of people, so roughly eight or nine pairs would be expected to match at the 13 STR loci. This predicted matching does not specify which profile is shared by two people, and the chance that anyone matches the particular profile associated with a crime is still very small. The distinction between two people sharing a profile and one person having a particular profile is an example of the so-called "birthday problem." Here, the probability that a person has a particular birthday is 1 in 365, ignoring February 29, but there is a 50% chance that two people in a random group of 23 people have the same unspecified birthday (Weir, 2007).
DNA Extraction and Analysis
To perform a forensic DNA analysis, DNA is first extracted from a sample. Just one nanogram of DNA is usually a sufficient quantity to provide good data. The region containing each STR is then PCR amplified and resolved according to size, giving an overall profile of STR sizes (alleles). The 13 core STRs vary in length from 100 to 300 bases, allowing even partially degraded DNA samples to be successfully analyzed. The costs of analysis, both in time and reagents, are significantly reduced by the amplification of all 13 STRs in just two multiplex PCR reactions.
Depending on the complexity of the repeat unit, the different alleles of an STR can vary by as little as a single nucleotide. For instance, the aforementioned D7S820 STR is relatively simple and contains between 5 and 16 repeats of GATA. Another STR, D21S11, has a more complex repeat pattern consisting of a mixture of tetra- and trinucleotides, and it therefore has alleles that vary in size by a single base pair. Because of the need to differentiate single-base differences, PCR products are typically resolved using automated DNA sequencing technologies with software that recognizes allele patterns by comparison to a known "ladder."
Making an STR Match
In order to match, for example, crime scene evidence to a suspect, a lab would determine the allele profile of the 13 core STRs for both the evidence sample and the suspect's sample. If the STR alleles do not match between the two samples, the individual would be excluded as the source of the crime scene evidence. However, if the two samples have matching alleles at all 13 STRs, a statistical calculation would be made to determine the frequency with which this genotype is observed in the population. Such a probability calculation takes into account the frequency with which each STR allele occurs in the individual's ethnic group. Given the population frequency of each STR allele, a simple Hardy-Weinberg calculation gives the frequency of the observed genotype for each STR. Multiplying together the frequencies of the individual STR genotypes then gives the overall profile frequency.
Table 1: Example DNA Profiles Showing the STR Alleles for Each Sample and the Genotype Frequency of Suspect B for Each STR Locus
STR Locus | Evidence Sample | Suspect A | Suspect B | Suspect B's Genotype Frequency for Each STR |
D3S1358 | 15, 17 | 17, 17 | 15, 17 | 0.13 |
vWA | 15, 16 | 18, 19 | 15, 16 | 0.22 |
FGA | 23, 27 | 21, 23 | 23, 27 | 0.31 |
D8S1179 | 12, 13 | 14, 15 | 12, 13 | 0.34 |
D21S11 | 28, 30 | 27, 30.2 | 28, 30 | 0.06 |
D18S51 | 12, 18 | 14, 18 | 12, 18 | 0.11 |
D5S818 | 13, 13 | 9, 12 | 13, 13 | 0.29 |
D13S317 | 12, 12 | 12, 12 | 12, 12 | 0.21 |
D7S820 | 10, 11 | 9, 10 | 10, 11 | 0.26 |
CSF1PO | 8, 11 | 11, 12 | 8, 11 | 0.18 |
TPOX | 7, 8 | 8, 8 | 7, 8 | 0.30 |
THO1 | 9.3, 9.3 | 6, 9.3 | 9.3, 9.3 | 0.38 |
D16S539 | 9, 13 | 11, 12 | 9, 13 | 0.10 |
In the fictional case shown in Table 1, Suspect A is excluded as the source of the crime scene sample. Suspect B, on the other hand, matches the crime scene sample at all 13 STRs. A calculation of the frequency of Suspect B's genotype, based upon the STR allele frequencies within Suspect B's ethnic group, reveals that the likelihood that a random member of this ethnic group has this profile is about 1 in 1.5 billion. It is important to understand that this number is the probability of seeing this DNA profile if the crime scene evidence did not come from the suspect but from some other person. To regard the number as the probability that the suspect is the source of the crime scene evidence is to commit the "prosecutor's fallacy"; this is the logical fallacy of equating "the probability that an animal has four legs if it is an elephant" (high) with "the probability that an animal is an elephant if it has four legs" (low). To go from one probability to another requires the use of Bayes' theorem and some prior (before the DNA profile) probabilities of the suspect being the source of the evidence. In addition, it is important to note that the probability of 1 in 1.5 billion is substantially increased if the actual source is a person related to the suspect, especially a sibling.
Confounding Circumstances of DNA Profiling
Databases of DNA Profiles
DNA evidence is used in court almost routinely to connect suspects to crime scenes, to exonerate people who were wrongly convicted, and to establish or exclude paternity. DNA data is considered to be more reliable than many other kinds of crime scene evidence. For this reason, tissue samples are frequently collected by law enforcement officials from those individuals who are implicated (even loosely) in a crime. The unique profile of each DNA sample is analyzed for comparison to crime scene evidence. The DNA profile can also be stored in a database to compare with crime scene evidence from past and future crimes.
But under what circumstances should an individual be compelled to provide a sample for a DNA database? Originally, statutes mandating collection of tissue for DNA typing applied only to those people convicted of sex crimes or murder. This was due to the fact that there was usually an abundance of DNA at the scene of a rape or murder to compare to a suspect's DNA. More recent DNA collection laws have applied to all convicted felons, reflecting advances in DNA technologies that allow sufficient DNA samples to be obtained from scenes of more common crimes, such as burglary (Figure 1).
Forced DNA Profiling
Those people opposed to DNA banking for law enforcement purposes note that arrestees are often found innocent of crimes. Retention of an innocent person's DNA can be seen as an intrusion of personal privacy and a violation of civil liberties. It is interesting to note that in the United States, under any other circumstance, the provision of a DNA sample would require informed consent and other protections for the donor. In contrast, an arrestee's DNA profile, once entered into a database, can be accessed by police, forensic scientists, or researchers without the consent of the donor. Another problem with the DNA database system is an exacerbation of the ethnic bias already present in the criminal justice system. If people from one ethnic group are more often arrested, tried, and convicted of felonies, they will be overrepresented in the database, potentially leading to even more arrests within that ethnic group.
Proponents of DNA databanks argue that major crimes often involve people who have also committed other offenses. Having DNA banked could potentially make it easier to identify suspects. It could also significantly cut down the cost of an investigation if an automated computer search could eliminate suspects or link a suspect to a crime scene
Does the DNA Databank System Help Solve Crimes?
The current DNA database maintained by the FBI, known as the Combined DNA Index System (CODIS), contains case samples (DNA samples from crime scenes or "rape kits") and individuals' samples (collected from convicted felons or arrestees) that are compared automatically by the system's software as new samples are entered. As of February 2007, CODIS had produced over 45,400 "hits," which assisted in more than 46,300 investigations (Federal Bureau of Investigation, n.d.). However, contrary to how DNA analysis is portrayed on popular television shows, DNA samples are not analyzed within the course of an hour. Rather, the U.S. currently has an enormous backlog of samples waiting to be typed and entered into the database. Some of these samples are from cases that have outlasted their statutes of limitation, so even if these samples could help solve a crime, the crime can no longer be tried.
This delay brings up the dilemma of the validity of statutes of limitation. These statutes were established at a time when large quantities of physical evidence were required to match a suspect to a sample and when extended time periods significantly decreased law enforcement's ability to find a match, as well as the likelihood of successful prosecution. With the advent of DNA databanks and the possibility of storing samples indefinitely, the very notion of a statute of limitation now seems extremely outdated.
Of course, there are many other debatable issues concerning DNA banking. For instance, should the original tissue sample be stored indefinitely after the DNA profile has been entered into the database? Detractors note threats to genetic privacy, but proponents argue that future DNA typing methods will undoubtedly be developed and that old samples might have to be reanalyzed using new techniques. Also at issue is the reopening of old cases on the basis of new (DNA-based) evidence. Which cases should be eligible for reanalysis in light of this new evidence? Can equitable rules be established to allow reexamination of cases that were analyzed with less powerful lab techniques? Further public awareness of the power of DNA forensic technology will help lawmakers decide these issues in a way that seeks to strike a balance between protecting individuals' genetic privacy and protecting innocent citizens from crime.
Conclusion
References and Recommended Reading
Federal Bureau of Investigation. "CODIS-NDIS Statistics." (accessed August 1, 2008)
Jobling, M., et al. Encoded evidence: DNA in forensic analysis. Nature Reviews Genetics 5, 739–751 (2004) doi:10.1038/nrg1455 (link to article)
Oak Ridge National Laboratory. "DNA Forensics." (accessed August 1, 2008)
National Conference of State Legislatures. "DNA Databanks." (accessed August 1, 2008)
Reilly, P. Legal and public policy issues in DNA forensics. Nature Reviews Genetics 2, 313–317 (2001) doi:10.1038/35066091 (link to article)
Weir, B. The rarity of DNA profiles. Annals of Applied Statistics 1, 358–370 (2007) doi:10.1214/07-AOAS128