Environmental Factor, September 2011, National Institute of Environmental Health Sciences
Mardis discusses current genomic technologies and cancer models
By Archana Dhasarathy
Mardis plays a key role in The Cancer Genome Atlas consortium, which catalogues mutations in cancers obtained through genomic sequencing. (Photo courtesy of Steve McCaw)
NIEHS Deputy Director Rick Woychik, Ph.D., had several questions for Mardis following her talk. (Photo courtesy of Steve McCaw)
French, left, monitored the question and answer session. (Photo courtesy of Steve McCaw)
Elaine Mardis, Ph.D.(http://genome.wustl.edu/people/mardis_elaine) , co-director of The Genome Institute at Washington University in St. Louis(http://genome.wustl.edu/) , presented a seminar titled “Genomic studies of mouse models of human cancer” Aug. 4 at NIEHS. She was hosted by John E. French, Ph.D., staff scientist and acting chief of the NTP Host Susceptibility Group.
Mardis is a leader in the field of cancer genomics, and directs the Genome Institute's efforts in advancing next-generation sequencing technologies. By comparing the genomes of tumor and normal samples, she maintains, researchers will be better able to identify changes in the genome that lead to cancer, and thus provide more personalized diagnoses and treatments.
The promise of next generation sequencing technologies
Whole genome sequencing has come a long way since the human genome was first sequenced ten years ago. The sequencing technologies available today are powerful enough to allow single base-pair level analysis of changes that occur in the genome of people with cancer.
These analyses include variations in the number of copies of specific genomic regions, mutations in DNA sequence, and other structural variants in DNA. “I'm always amazed that from a 30-fold coverage of the genome you can detect so many different types of alterations,” said Mardis. Her lab further validates the information from all this sequencing, to ensure that variations and mutations they've identified are indeed different between normal and tumor samples.
Leukemia genomes: Using mice to understand humans
Acute promyelocytic leukemia (APL) is a subtype of acute myeloid leukemia (AML), in which an abnormal fusion of parts of chromosomes results in the creation of a promyelocytic leukemia-retinoic acid receptor alpha (PML-RARA) fusion oncogene. While this mutated version of DNA can initiate APL in mice, there are other mutations that are also important for this disease.
In order to identify these other mutations, Mardis' group used a mouse model of cancer, wherein mice expressing the PML-RARA oncogene developed APL. They sequenced the tumor genome of one mouse that developed APL, and were able to identify three mutations important in causing the disease. Importantly, similar mutations were found in both additional mouse samples, as well as in human AML samples, thus helping them discover functionally important mutations in human cancers.
“In humans who have leukemia, you have the luxury of collecting samples from patients throughout the progression of the disease,” said Mardis. Samples can be collected at the time of diagnosis, at the start of chemotherapy, and followed up for several months after chemotherapy. Sequencing such samples gives an indication of what kinds of mutations and clonal populations are lost as a result of chemotherapy, and also addresses the question of what mutations arise when tumors relapse.
Breast cancer models
Although scientists understand much about the biology of cancer metastasis, information about the genomic causes for metastasis is still largely unknown. Mardis and her colleagues applied their genomics expertise to analyze both normal and tumor samples from patients with breast cancer. The cells from the biopsy were introduced into mice to produce a xenograft tumor, so they could investigate how the tumor behaved and if it metastasized. The researchers then performed whole-genome sequencing of the trio - normal cells, primary tumor, and mouse xenograft tumor - to search for genomic changes.
They found that the primary tumors differed from the xenograft tumors, mainly in the prevalence of genomic mutations. They also determined that the same type of primary tumor - although from different patients - can develop either into a tumor similar to the primary tumor or a new type of tumor entirely. For instance, a luminal type of breast cancer could develop into a luminal cancer in the mouse xenograft but, in some cases, manifested as a basal type of tumor instead.
“If you can detect mutations that are highly specific to a patient, this could lead to personalized medicine and individually tailored therapies,” Mardis explained. The results provide insight into how cancer genomes evolve as the disease progresses.
In collaboration with other groups, Mardis is also involved in sequencing other mouse models of breast and other cancers. Thus far, several types of human cancers have been sequenced including leukemia, breast, lung, brain, prostate, multiple myeloma, and ovarian cancers.
(Archana Dhasarathy, Ph.D., is a postdoctoral fellow in the Eukaryotic Transcriptional Regulation Group in the NIEHS Laboratory of Molecular Carcinogenesis.)
Next generation sequencing 101
Whole genome sequencing involves figuring out the precise arrangement of all three billion bases of an individual's DNA. To achieve this goal, sequencing machines first generate short stretches of DNA sequence, called reads, that contain random bits of the genome. Computer programs are then used to map these reads, usually by comparing them to a previously sequenced reference genome.
Several parameters are important for speed and accuracy of sequencing. The first parameter is read length, which refers to the length of sequence generated in each read. Current machines can generate reads between 40 and 150 nucleotides. The longer the reads, the easier it is to align them to the reference genome. Just as in a jigsaw puzzle, the larger pieces are easier to fit into place. Reads that are too short might not map uniquely to the genome, and are, hence, problematic. On the other hand, longer reads may take several extra days to generate. Longer reads are also more expensive and generally less accurate than shorter reads.
Besides read length, the number of sequencing reads that can be produced in a single instrument run for a given cost is another important factor. Current instruments routinely produce tens of millions of sequence reads, with numbers constantly improving as technology develops.
Read coverage refers to the total number of reads needed to ensure that every single base in the genome has been sequenced - the greater the coverage, the better the accuracy. However, this can also make it more expensive. Several groups are in a race to develop cheaper, faster, and more accurate means of whole genome sequencing.