Innate, page 26
Rare mutations are hard to find with traditional genetic methods, which have relied on either an obvious and characteristic syndromic presentation to recognize a specific condition, or on very large families with many affected individuals to map a specific gene. But new technologies, applied on a massive scale, are making the identification of rare, high-risk mutations much easier.
The first of these technologies is called “comparative genomic hybridization” and it relies on genomic “microarrays.” It is an extremely powerful and cost-effective method to detect small deletions or duplications of segments of chromosomes across large numbers of patients. In this technique, a patient’s DNA is collected and labeled with a fluorescent dye. A control person’s DNA is also collected and labeled with a different colored dye. Then both sets of DNA are “hybridized” to an array of DNA from the human genome, which is dotted in an orderly way in small segments onto a glass slide. “Hybridization” means that the DNA from a patient’s chromosome 1 will stick to the DNA on the slide that comprises chromosome 1, and so on. By comparing the intensity of the two dyes, it is possible to identify small regions of the genome where the patient has either less DNA or more DNA than the control. Usually this means the patient has a deletion of a small segment, so that there is only one copy of that region, or a duplication, so that there are three copies.
Some such copy number variants arise at an appreciable frequency in the population, due to small repeated sequences of DNA, which confuse the machinery that recombines and separates chromosomes during DNA replication. This means it is possible to identify many people carrying the exact same deletion or duplication, so that their effects can be measured statistically, not just in single cases.
It turns out we all carry some background level of CNVs. Indeed, though we may only have a few of these, they actually make up a large part of the genetic differences between people because they encompass so many bases of DNA sequence. That said, most of them don’t seem to do anything, mainly because most of them arise in the 97% of the genome that does not encode proteins (i.e., they don’t affect genes). But when they do affect genes they can have very deleterious consequences.
When researchers compared autism patients with controls, they found that the patients had a significant excess of CNVs. By looking across many hundreds of patients they could identify particular CNVs that were at much higher frequency in autism patients than in controls, indicating that they dramatically increased risk of the condition. Interestingly, when the same kinds of studies were carried out with patients with schizophrenia or epilepsy or developmental delay or intellectual disability, many of the same CNVs were found. There is now a very long list of these pathogenic CNVs—they include several well-known examples, such as the deletion at 22q11.2 (referring to a specific genomic position on chromosome 22), which is now known to be the cause of what used to be called velo-cardio-facial syndrome, as well as deletions or duplications at 1q21.1, 3q29, 15q11.2 (associated with Angelman and Prader–Willi syndromes), 16p11.2, and many others (see figure 10.1).
Figure 10.1 Neurodevelopmental disorders. The idiopathic pools of autism (ASD), epilepsy (E), intellectual disability (ID), or schizophrenia (SZ) have been shrinking as more and more specific genetic causes are identified, using new technologies such as comparative genomic hybridization and exome or genome sequencing. (Modified from K. J. Mitchell, “The Genetic Architecture of Neurodevelopmental Disorders,” in The Genetics of Neurodevelopmental Disorders, ed. K. J. Mitchell [Hoboken, NJ: Wiley Blackwell, 2015].)
Each of these is very rare, usually carried by fewer than 1 in 1,000 people. Collectively, the known CNVs account for 1%–2% of cases of schizophrenia and perhaps 5% of cases of autism. Of course, there may be many other such mutations that we have not yet identified, especially if they are individually rarer or have smaller effects on risk.
The degree of risk associated with each CNV is strongly correlated with whether it tends to arise de novo or is inherited from a parent. This can be determined by also characterizing the DNA of the patient’s parents to see if either of them carries the same CNV. Mutations that confer very high risk and cause more severe illness (such as autism and intellectual disability) almost always arise de novo, because affected people tend not to have children, while less severe ones are more often inherited. Determining this fact can obviously have very important consequences for future reproductive decisions in a family.
POINT MUTATIONS
CNVs are just one type of mutation that can cause disease—we happen to know a lot about them because they are easy to detect, thanks to the development of genomic microarrays. More importantly, we can see that specific CNV mutations increase risk of neuropsychiatric disorders because they tend to recur over and over again at the same spots in the genome. That means researchers could look across many people with the exact same mutation and see that it was greatly enriched in people with disease.
That job gets a lot tougher when we are talking about point mutations—changes to a single letter of the DNA sequence. These occur, at random, all over the genome, every time the DNA is replicated, including when sperm or egg cells are being made. Fortunately, most such copying errors are corrected by a dedicated set of proofreading and DNA repair enzymes, but some creep past that system and become new genetic variants in the population. Up until the past few years, we had no good way of detecting these, unless we had a reason to look in one particular part of the genome (say, from segregation patterns in a large pedigree with many affected individuals). But for most cases of neuropsychiatric disorders we have no such reason—that is, there is every reason to think there is a causal mutation somewhere in the three billion letters of the person’s genome, we just don’t have a reason to look in one place versus another.
That means we need to sequence the whole thing. We need to read the entire code of an individual’s genome and then compare that to some reference (or to a large number of other people’s genomes) to see where they may have a difference that could be causing disease. This is where the pace of technological change over just the past few years has been truly transformative.
The first human genome to be sequenced—“the” human genome of the Human Genome Project (really assembled from five different people)—took 10 years to complete, with a final draft published in 2003. It involved hundreds of researchers from all over the globe and cost several billion dollars. There were warehouses of sequencing machines working night and day and enormous banks of computers required to process all that data. As I am writing now, in 2017, it is possible to sequence a human genome in a day for under $1,000. Much of this can now be carried out on machines that fit in the palm of your hand and plug directly into your laptop.
This has completely changed genetic research and is poised to change medicine. By sequencing the genomes of thousands of people with intellectual disability, developmental delay, epilepsy, autism, schizophrenia, or related conditions, researchers have been able to detect multiple people with very rare mutations in the same genes. The problem with just looking at the genome sequence of any individual patient is that every person carries a couple of hundred serious mutations that disrupt a gene, altering the protein it encodes or blocking the expression of it altogether. Perhaps only one of these mutations is actually contributing to high risk of the disease, but recognizing which one among all the mutations we all carry is almost impossible, if you only have one patient’s sequence. But once you start sequencing hundreds of patients you begin to see repeat hits in the same genes, more than you would expect by chance.
Those efforts have only begun to be scaled up over the past couple of years but are already revealing hundreds of new genetic disorders. Most of these affect genes directly involved in or required for neural development. Each of these conditions is extremely rare, responsible for less than 1% of cases of the general clinical categories listed above. But collectively they are common—much more common than we ever realized. The ones that have been easiest to spot are, not surprisingly, the ones that cause the severest forms of disease. Many such cases are caused by de novo mutations, for the same reason we discussed in relation to CNVs—people with severe neurodevelopmental disorders tend not to have children, so mutations that cause high risk of such conditions will hardly ever be inherited.
De novo mutations are thus far more likely to cause disease than inherited mutations, which makes it easier to recognize them as the culprits. They can be detected by sequencing a person’s DNA along with that of his or her parents. On average, we each have about 70 new mutations that were not present in our parents’ genomes. Because these occur at random, and because only ~3% of our DNA actually comprises genes, most of them won’t have any effect. The number of de novo mutations that actually hit a gene in any individual is around 1 (it ranges from 0 to 2, compared with about 200 inherited, gene-disrupting mutations). But if you’re unlucky, that gene may be one of the several thousand absolutely required (in two working copies) for normal brain development or function.
One of the striking findings from these kinds of sequencing studies is that most de novo mutations happen in the paternal germline (about 75%). There’s a good reason for this: in a man’s testes, there are stem cells that continue to divide throughout his life—each time they do that there is a small chance of a new mutation happening. Over time, these mutations accumulate in the stem cells and show up in the sperm. By contrast, females are born with all the eggs they will ever produce, so new mutations of this type do not accumulate with age in females (though the chance of abnormalities involving the segregation of whole chromosomes does increase with age). The number of de novo mutations in individuals is therefore linearly related to their fathers’ age when they were conceived—offspring born to 40-year-old fathers have about twice as many new mutations as those born to 20-year-old fathers. Not surprisingly, paternal age is also strongly correlated with risk of genetic disease in the offspring. This has long been known for rare genetic conditions of all sorts, but has also recently been recognized for common conditions like autism and schizophrenia, where risk to offspring of fathers over 45 is about four times that of offspring of fathers under 25.
Recent data implicate one other important kind of mutation—ones that happen in the developing embryo itself, known as somatic mutations (because they happen in the body, or soma, rather than in the germline). Mutations that arise in a single cell of the early embryo may be inherited through cell divisions by a significant proportion of the cells of the body, including the brain. If these mutations disrupt development then they may result in a neurodevelopmental disease even if they are “mosaic,” or present in only some of a person’s cells. Presumed pathogenic mutations of this type have been found in a small percentage of autism patients.
A SPECTRUM OF GENETIC EFFECTS
De novo mutations are the easiest ones to recognize as pathogenic (contributing to disease) because we have fewer of them and they are likely to cause the most severe effects. They explain many of the sporadic cases of disease with no family history. On the other hand, many cases of neurodevelopmental disorders are caused by inherited mutations, which is why they also tend to run in families. These are harder to identify, because they tend to have less drastic and more variable effects, and are far less likely to be acting alone. One way to judge whether a mutation in a person’s genome is likely pathogenic is to see whether other people in the population also carry it. Sequencing of tens of thousands of healthy people has provided a map of genetic variation across the population. Like the dog that didn’t bark in the night, the real information in that map comes from the mutations we don’t see.
Many genes show a dramatic absence, or at least a shortage, of damaging mutations when we look across the healthy population. This is not because mutations don’t happen in these genes—they happen everywhere in the genome—it’s because when they do happen, people get ill or die. These genes are intolerant to genetic variation that knocks out their function. When we see a mutation in a gene like that in someone with disease, it is therefore much more likely that it is pathogenic. And the rarer a mutation is in the general population, the more severe its effects are likely to be.
There is thus a spectrum of genetic variation that can contribute to neurodevelopmental disorders—this ranges from de novo and ultrarare mutations that can have individually large effects, through inherited mutations that have moderate effects and that can persist in populations for some time, to much more common genetic variants that have been around in the population for a long time and that make only tiny individual contributions to risk.
The common ones can be detected using genome-wide association studies. As described in some of the previous chapters, these studies look at the frequency of a given version of a genetic variant in people with a disease, compared with the frequency in people without (controls). If a variant is more frequent in disease cases, it is said to be associated with the disease and is therefore, statistically, a risk factor. This is just like doing epidemiological studies for environmental risk factors. For example, smoking is much more common among people with lung cancer (around 95%) than among people without lung cancer (around 30%). The degree of difference lets you estimate how much of an effect on risk the factor is having. What we measure is how much more likely people are to have exposure to a certain factor, given they have the disease. But this can be flipped around to calculate what is called relative risk—how much more likely they are to have the disease if they are exposed to a specific factor (environmental or genetic), compared with the risk for people who are unexposed. For smoking, the effect size is around 100—people who smoke are around 100 times more likely to develop lung cancer than people who don’t.
The challenge in identifying common variants that increase risk of disease is that their effect sizes are usually tiny—on the order of 1.1 or even less. That means people with the common “risk variant” are 1.1 times more likely to develop a disease than people without it. That’s literally an almost negligible effect, but not completely. Especially if many common variants combine together—they could then theoretically have a much larger collective effect on risk. What it does mean, though, is that we need massive sample sizes to detect that kind of effect with any statistical confidence (i.e., to distinguish a tiny difference in variant frequency between cases and controls as “real,” as opposed to just being noise in the data).
That is exactly what has now been achieved in GWAS of schizophrenia—these have recently been carried out on samples of tens of thousands of patients and over a hundred thousand controls. They have identified over 100 spots in the genome where there are genetic variants where one version is at a higher frequency in patient cases than in controls. As with the rare mutations, the implicated genes are highly enriched for genes involved in neural development. As expected, these common variants each have only a tiny effect on risk—most increasing it by less than a factor of 1.1. Collectively, the ones currently identified explain less than 10% of the total variance in liability to schizophrenia, though the total contribution from additional common risk variants yet to be discovered could be much larger.
PUTTING IT ALL TOGETHER
So, what does this all mean? How can we think about all these different types of genetic effects? One way is to think of some proportion of cases being caused by specific rare mutations, while the remainder are caused by the combined effects of many common variants. The idea of the latter model is that we all carry some burden of common risk variants but only when some threshold is reached does this actually cause disease. This dichotomy would entail two really very different things—the former would be quite distinct genetic conditions, while the latter would represent the extreme end of a continuous distribution. But there’s really no good reason to separate them out like that—no reason to think there is any real distinction at all between the rare and supposedly common disorders. Quite the opposite in fact—there’s every reason to think that multiple genetic variants are at play in each individual, even in cases that inherit a high-risk mutation (see figure 10.2).
Figure 10.2 The genetic architecture of neurodevelopmental disorders. A single severe mutation (usually de novo) or a number of less severe ones (de novo and/or inherited) can increase risk. Risk is modified by polygenic background and sex, which influence developmental robustness. Randomness in development determines how this risk plays out in any individual. It may result in developmental brain dysfunction, with diverse clinical presentations.
First, that would be a typical scenario for any Mendelian disease, that is, one caused by a single mutation. Even for conditions like cystic fibrosis or Huntington’s disease, which are always caused by mutations in one specific gene, other genetic variants exist that modify the severity and age of onset and nature of the clinical symptoms. By themselves, these variants don’t cause disease—they only have an effect when the person has a rare mutation that causes those conditions. The same is true for all sorts of conditions and we certainly should expect it for neurodevelopmental disorders.
Second, we have direct evidence that specific mutations can manifest in very different ways in different people. Some carriers of specific CNVs or specific rare mutations in single genes develop epilepsy, others develop autism, or schizophrenia, and others don’t develop any clinical symptoms at all. In some studies, the cases with the more severe presentation have been found to have a second, also rare, mutation somewhere else in their genome, suggesting a combined effect. There are also specific examples for some conditions—notably Hirschsprung’s disease, a disorder affecting innervation of the gut—where the effects of a rare mutation are exacerbated by the presence of a specific common variant affecting expression of the other copy of the same gene. The common variant by itself has no real effect—it only increases risk or severity in the presence of a rare mutation. Notably, however, it has a big effect in individuals with such a rare mutation, much bigger than the effect size that would be calculated in a GWAS, as that is averaged across a whole cohort.
