PacBio Blog

Thursday, March 26, 2015

In Chronic Myeloid Leukemia Study, SMRT Sequencing Detects Resistance Mutations Early, New Splice Isoforms and More

Scientists from Uppsala University report in a recent paper that using the Iso-Seq™ method with SMRT® Sequencing allowed them to detect and monitor mutations in the BCR-ABL1 fusion gene for patients with chronic myeloid leukemia (CML). Screening mutations in this region is important for determining the point at which these patients become resistant to tyrosine kinase inhibitor (TKI) therapies, and is currently performed in the clinic using Sanger sequencing, quantitative RT-PCR, and other assays.

The paper, “Clonal distribution of BCR-ABL1 mutations and splice isoforms by single-molecule long-read RNA sequencing,” was published last month in BMC Cancer from lead author Lucia Cavelier and collaborators. In it, the scientists describe sequencing samples from six patients who experienced poor response to cancer treatment; samples were collected at diagnosis and at subsequent follow-up periods and sequenced on the PacBio® system.

The team checked for mutations in the BRC-ABL1 fusion transcript, generating on average10,000 full-length sequences of the gene from a single SMRT cell. Short-read sequencers have been tried for this kind of work, the authors note, but their inability to span the entire transcript as well as concerns about bias introduced by nested PCR has limited their utility.

“Here we present for the first time an assay to directly investigate the entire 1,578 bp BCR-ABL1 major fusion transcript, amplified from a single PCR reaction and sequencing on the Pacific Biosciences (PacBio) RS II system,” Cavelier et al. write. “In addition to enabling a rapid workflow at a relatively low cost, the PacBio system produces reads sufficiently long to span across a full length BCR-ABL1 molecule.” They report that the process, which took two to three days to complete, had a 0% false positive rate, attributed to the random error mode of PacBio sequencing data, “which results in highly accurate base calls for molecules that are sequenced at high coverage.”

For each of the six patients studied, the authors report, SMRT Sequencing confirmed the mutations that had already been found with Sanger sequencing. It also detected five low-frequency mutations that were missed by the Sanger pipeline. In one case, the scientists found that PacBio sequencing successfully detected a mutation four months earlier than it was found by Sanger sequencing, indicating that the technology may ultimately accelerate the identification of genetic markers that are important for diagnosis or drug response monitoring.

In addition, long reads from SMRT Sequencing allowed the team to distinguish multiple transcript isoforms for BCR-ABL1 from individual samples. “These results corroborate previous findings that propose alternative splicing as a common mechanism among CML patients undergoing TKI treatment,” the authors write.

Importantly, PacBio data also made it possible to differentiate compound mutations from independent mutations in other molecules, information that cannot be gleaned from Sanger sequencing. “This feature is of major clinical relevance as compound mutations show different resistance profiles compared to individual mutants,” Cavelier et al. report.

Tuesday, March 3, 2015

AGBT Highlights, Day Three: Genomic Medicine, Population Specific Genomes, Goats & Influenza

Day 3 of the AGBT conference was packed with interesting talks - we've covered a few highlights below.  Admittedly, it took a little more caffeine than usual to power through the day.....

In the clinical session, Euan Ashley from Stanford told attendees that genomic medicine is no longer something that we’re aiming for; it’s already here and being used routinely. He expressed concerns about accurate mapping of short-read sequence data for clinical utility, adding that the community needs to make progress in understanding complex genomic regions. Ashley noted that we still don’t have a gold-quality human genome with every single base known, and that achieving that remains an important goal for the field.

Jonathan Mudge from the Sanger Institute presented work by the GENCODE consortium to define Human genes in the ENCODE project data, and said “the functional annotation of the transciptome is in its infancy”.   He describe how the consortia are planning to embark on a large new project using long-read PacBio® data to help improve annotation, and capture true end-points for novel gene transcripts.

Sarah Tishkoff, from the University of Pennsylvania, presented on “Integrative Genomic Studies of Adaptive Traits in Africa”.  She described her work studying novel phenotypes in sub-populations within Africa, and the challenges of linking phenotypes to specific genotypes.  One of the reasons she cited was the lack of representation of the African population specific genome regions and structural variants in the current human genome reference.  Future planned work by the Genome Reference Consortia should help resolve this disparity, as additional population-specific alt loci polymorphic sequences are added to the reference.

During the evening technology session, Tim Smith from USDA’s Agricultural Research Service presented a goat assembly produced with PacBio sequence data. A previous goat assembly generated from short-read data had a contig N50 of about 18 kb with hundreds of thousands of contigs, but the PacBio assembly had a contig N50 of about 2.6 Mb and just 5,902 contigs. To get the highest genome quality, he told attendees, it’s helpful to use long reads. The team is following up the goat effort with new projects to sequence pig, sheep, and cow using PacBio data.

Finally, Vince Magrini from Washington University in St. Louis spoke about using RNA-seq for viral monitoring. He showed data from PacBio sequencing, among other technologies, which was used to characterize clinical isolates of influenza. The long reads were important for filling gaps in a short-read assembly, he said.

With all the talk about precision medicine at the conference, we also really enjoyed this thoughtful blog post by Brian Kreuger (@h2so4hurts) from Columbia University Medical Center entitled 'When Whole Genome Sequencing Doesn’t Give Us the Whole Genome'.

Friday, February 27, 2015

AGBT 2015: PacBio Workshop Review & Recording

Our AGBT workshop attracted more than 500 attendees thanks to the high-profile speakers who shared their perspectives on human genomic research. Because of the exclusivity of AGBT, we decided to live-stream our workshop to reach the broader scientific community. Thanks to the the hundreds of people who tuned in to our live webcast from afar! Here are some highlights from the presentations and the recording of the workshop is at the bottom of this post:

Our CEO, Mike Hunkapiller, started the session with a reflection on the 15-year anniversary of the announcements of the first human genomes, noting these efforts required considerable effort and produced draft assemblies with contig N50s in the 20-24 kb range. Many technologies and methods have been introduced since then, but assembly quality has not improved dramatically and scientists are still missing critical genomic information. He noted structural variations, in particular, have been underestimated, limiting our understanding of human genomes. He then unveiled a PacBio® diploid assembly of Craig Venter’s genome, chosen because it has been so well characterized over the years. Compared to the original iteration of the Venter assembly, the PacBio diploid assembly contains 3004 primary contigs, a contig N50 of 10.4 Mb, and the longest contig is 34.6 Mb. The 4,761 associated contigs, representing potential structural variants, total 189 Mb with a mean length of 39.8 kb.

In his first appearance at Marco Island, Venter (accompanied by his dog, Darwin) offered his vision for Human Longevity, Inc. (HLI), as well as at the J. Craig Venter Institute and Synthetic Genomics. At HLI, his team plans to sequence 1 million genomes in the coming years while also gathering extensive phenotypic information to make meaningful connections from the data. To support this effort, they will produce 30 reference genomes representing ethnogeographic diversity. Venter told attendees the PacBio machine gives you a great reference genome.

Next up was Gene Myers from the Max-Planck Institute, who addressed the concept of a near-perfect human assembly. He believes this level of quality is within reach, made possible by the long reads, random error, and random sampling of SMRT® Sequencing. Myers and his team have been working hard to build new analysis tools for processing this data, including a lightning-fast aligner and a scrubbing algorithm. His tools are available through his Dazzler website.

Deanna Church from Personalis spoke about the importance of a complete, truly representative human reference genome. Having this data is necessary for calling and interpreting variants, noting something as simple as a missing gene in the reference can confound other calls in a new reference-based assembly. She championed the new regions of alternative loci available with the GRCh38 human reference, saying this sequence is essential to ensure you’re not missing valuable information in genome interpretation. Church urged attendees to generate high-quality sequence assemblies and contribute them back to the databases to continue refinement of the reference genome.

Jeong-Sun Seo, CEO of Macrogen, Inc., spoke about his team’s efforts to sequence large numbers of Asian genomes and the generation of a representative diploid human genome reference for the Asian population. His team used PacBio technology, PacBio’s latest diploid assembly methods, optical mapping from BioNano, and BAC sequencing to create the most comprehensive genomic reference possible for a Korean human genome sample. He showed examples of gaps that could be closed or extended within the GRC38 reference thanks to SMRT Sequencing data, and highlighted work to identify structural variants, some of which are implicated in diseases that affect Asian populations more than other populations.

Finally, Dick McCombie from Cold Spring Harbor Laboratory presented work on a breast cancer cell line known to be riddled with rearrangements, amplifications, and other complex events. Working with collaborators at OICR, he is using long-read sequencing to generate a higher-resolution view of the structural variation occurring in this cell line. The project, which began last November and is still ongoing, has already led to promising results, such as detecting complex structural variants missed by short-read sequencers. A de novo assembly generated by DNANexus in 22 hours produced an unprecedented contig N50 of 2.56 Mb. Download the raw data from the Schatz lab website.


AGBT Highlights, Day Two: Human Genomes, Variation, and the Rapidly Evolving Y Chromosome

The first full day of AGBT kicked off with a great talk from Evan Eichler from the University of Washington. Starting with the premise that characterizing genetic variation is key to understanding phenotypes, his presentation offered in-depth looks into human genome projects designed to fully represent data missed in existing assemblies and current whole genome sequencing studies. Eichler pointed out that short-read sequencing misses a lot of structural variation, particularly when it occurs near repeat-rich regions. He said that every genome sequenced with short-read technology is missing important variation, and that a big problem is our inability to quantify just how much is missing. Eichler told attendees that he uses SMRT® Sequencing because it allows direct observation of native DNA, offers long reads, and has very little GC bias. He presented two sequencing projects focused on hydatidiform moles (CHM1 and CHM13), which have haploid human genomes. In one project, he reported detecting 26,015 structural variants, and closing or shrinking of 90 gaps in the human reference genome (many of which included GC-rich sequence) and adding a total of 1.1 Mb of novel sequence. He noted that one of the most important findings of the work was that 92 percent of insertions and 60 percent of deletions found in the genome were novel — including many in protein-coding regions — perhaps indicating how much has been missed in previous human genome population studies. An analysis of STRs found in the SMRT Sequencing-generated assembly showed that they were 3x more abundant and 2.8x longer than STRs in the existing human reference genome, which also suggests that current knowledge is incomplete. (Much of this work was included in this Nature paper from Chaisson et al.)

In a separate project, Eichler’s team compared the information gleaned from SMRT Sequencing of two haploid human genome samples to information obtained through the 1,000 Genomes Project. In the two haploid genomes, he said, they found almost as much structural variation as was found across more than 2,500 diploid human genomes in the public dataset, sequenced using short-read methods. He added that once structural variation is fully catalogued, standard analysis methods can be used to go back and look for those elements in existing human genome data, and resolve about 50% of the SV genotypes. With a fairly small number of human genomes, he said, it may be possible to build a fairly comprehensive view of structural variation in the human genome.

We were also very interested in a presentation from NHGRI Director Eric Green about the recently announced Precision Medicine Initiative. The goals around using genomics to guide targeted treatments mesh nicely with NHGRI’s other efforts to generate a more comprehensive view of human genetic variation and to find the missing heritability in our genomes. Much of this was discussed at a planning session last year, which you can check out in this video. Notably, Evan Eichler also used the term Precision Medicine in his talk, noting that "if you believe in precision medicine, you should want to be comprehensive and precise."

Later in the day, we particularly enjoyed talks from Sarah Tishkoff at the University of Pennsylvania and David Page at the Whitehead Institute. Tishkoff presented great data from genomic studies of people from remote locations in Africa, such as studies of genetic links to short statures among pygmy people. She urged attendees to support sequencing in ethnically diverse populations to generate many reference genomes that can be used to better understand variation in populations not well represented by existing reference genomes. Page’s talk focused on sequencing the X and Y chromosomes, which he called “the genome’s most challenging substrates.” For instance, the human Y chromosome features eight palindromes, the largest of which is nearly 3 Mb; in mouse, the major challenge with this chromosome is 180 copies of a 500 kb repeat unit. Page used BAC clones and an iterative sequencing approach known as SHIMS to characterize these complex regions, which are largely absent from current reference assemblies.

At noon today we’ll be hosting our workshop, “Toward Comprehensive Genomics — Past, Present and Future.” Check back soon for the live-stream video!

Thursday, February 26, 2015

AGBT Highlights, Day One: Advancing Human Reference Assembly & Sequencing in the Clinic

It is great to be here in Marco Island for the AGBT meeting! The 16th annual meeting hit the ground running with a pre-meeting workshop hosted by the Genome Reference Consortium (GRC) followed by an opening session that was more clinically focused than many attendees are used to at this tech-heavy conference. From the dynamic Q&A sessions, it was clear that these were precisely the kind of talks that people have been looking for as this meeting evolves downstream along with genomic science.

The GRC workshop, entitled ‘Advancing the Human Reference Assembly’ included four speakers: Valerie Schneider (NCBI), Tina Graves-Lindsay (TGI), Karyn Meltz Steinberg (TGI) and Deanna Church (Personalis, Inc.)  They stated that the current human genome reference assembly represents a mixture of over 70 individuals’ genomes in a single linear sequence. Thanks to population sequencing efforts, like the 1,000 Genome Project, we now know that there are regions of the human genome that are highly polymorphic, with multiple haplotypes that are segregating in the global population. Many of these regions (like MHC and KIR) are heavily associated with disease and immune-response.

Making accurate genotype calls in these regions, which include a large degree of structural variation, requires improved reference sequence representation of these population-specific haplotypes.  Deanna Church commented on the significance of this by saying, “If you want to do genotype and phenotype association, you better get genotype correct.” To better represent these population-specific haplotype reference sequences, the GRC’s latest human genome build (GRCh38) includes alternative loci.  With the inclusion of these alt loci, Valerie Schneider presented how we’re now entering the “Multiple Genome Era”, where the human reference genome will be expanded to include additional alt loci sequence to better represent the population diversity of these highly polymorphic loci. 

To sequence and assemble these new reference sequences, Tina Graves-Lindsay and Karyn Meltz Steinberg then outlined a new strategy to build gold and platinum quality genome assemblies, using long-read data (PacBio® data) and other complimentary technologies. The GRC presenters also described how the GRC will be adding more alt loci diversity to the Human Genome Reference by using PacBio sequencing to do whole-genome de novo sequencing and assembly of additional individuals from under-represented populations.  Check out the slides from workshop on slideshare. 

The formal session began with a talk from David Goldstein, who recently moved from Duke University to Columbia University, about precision medicine in neurological disease. He focused on several large-scale studies of patients with epilepsy, including a project that sequenced more than 350 affected children and their parents to find de novo mutations. While many of the mutations detected are rare, Goldstein noted that they often are found in common biological pathways so it may be possible to stratify patients. In one example of such grouping, patients with a mutation in the KCNT1 gene were given quinidine, an FDA-approved drug that was never previously indicated for epilepsy. In three cases, two patients saw significant improvement in the severity or frequency of their seizures. Goldstein pointed out that this targeted approach would never have been found without genomics.

As he presented similar examples from other studies, Goldstein noted that the stakes are considerably higher for getting the genetics right when the goal is diagnostic sequencing. He also said that patients’ genomes need to be comprehensively interrogated; in epilepsy, at least, new variants are being found so often that a gene panel approach wouldn’t keep up. His talk was hopeful for the future of clinical sequencing and improved bioinformatics to explain findings: results are already impressive, he said, but “this is only going to get better.”

Rick Lifton from Yale University also spoke in the kickoff session, focusing on the need to determine the function of more of the human genome than is currently understood. “There’s an awful lot of room for new discovery,” he said, pointing out that of the 21,000 known protein-coding genes in the human genome, only 3,000 have been clearly linked to disease. Lifton discussed studies of various conditions, such as hypertension, where typical approaches for understanding Mendelian disease proved useful for more common diseases. Based on success in finding de novo mutations in several studies, Lifton called for routine sequencing in the clinic. He added that truly understanding the human genome will require elucidating noncoding regions, determining the consequence of every mutation, and identifying biological targets for therapeutics.

The rest of the agenda looks just as interesting. We’ll keep reporting from the sessions, so check back for more soon!

Tuesday, February 24, 2015

AGBT 2015: Seeing the Genome in a New Light (Sunshine?)

Like many others, we’re looking forward to an exciting week of science and sun at the 16th annual Advances in Genome Biology and Technology (AGBT) conference! We’re hosting a lunch workshop on Friday, February 27, in the Palms Ballroom from 12:00 pm to 2:00 pm EST. We hope you can join us onsite (please reserve your seat) and even if you’re not at the conference, you can watch the live stream.

Here’s the agenda:

Towards Comprehensive Genomics – Past, Present and Future

The Human Genome: From One to One Million
J. Craig Venter, Human Longevity Inc.

Is Perfect Assembly Possible?
Gene Myers, Max Planck Institute

Finishing Genomes: Why Does It Matter?
Deanna Church, Personalis

De Novo Assembly of a Human Diploid Genome for the Asian Genome Project
Jeong-Sun Seo, Macrogen Inc. and Seoul National University College of Medicine

PacBio Long Read Sequencing and Structural Analysis of a Breast Cancer Cell Line
W. Richard McCombie, Cold Spring Harbor Laboratory

After reviewing the packed AGBT agenda, we’ve already spotted several can’t-miss presentations. These speakers and talks look especially promising and we’ll be covering several of them on the blog later this week:

Evan Eichler, University of Washington: “Resolving the Complexity of Human Genetic Variation by Single-Molecule Sequencing”

Matthew Blow, Joint Genome Institute: “Sequencing-Based Approaches for Genome-Scale Functional Annotation”

Tim Smith, U.S. Meat Animal Research Center: “A Genome Assembly of the Domestic Goat from 70x Coverage of Single Molecule Real Time Sequence”

Amy Ly, The Genome Institute at Washington University: “PacBio Application – Influenza Viral RNA-Seq”

Somasekar Seshagiri, Genentech: “Spectrum of Diverse Genomic Alterations Define Non-Clear Cell Renal Carcinoma Subtypes”

Gene Myers, Max Planck Institute: “Low Coverage, Correction-Free Assembly for Long Reads”AGBT is also known for its excellent poster sessions, and we’ll be spending plenty of time in the poster hall this year. If you’re interested in learning more about SMRT® Sequencing results, be sure to stop by some of these posters.

And if you need a break from the marathon, feel free to put your feet up in our suite (Lanai #189) during our open hours:

Wednesday: 8:00 p.m. – 11:00 p.m.
Thursday: 3:00 p.m. – 6:00 p.m. and 8:00 p.m. – 11:00 p.m.
Friday: 3:00 p.m. – 6:00 p.m.

We look forward to seeing you in Marco Island and for those tuned in at home via our blog for lots of updates and live streaming of the workshop!

Wednesday, February 4, 2015

High-Quality Genome Assembly and Transcriptome of Cotton Using SMRT Sequencing

A recent research partnership with KeyGene, a Dutch plant genomics and crop improvement company, has resulted in an integrated whole-genome assembly and transcriptome of Gossypium hirsutum, or tetraploid cotton. This is the first known complete assembly for a polyploid crop with a genome larger than 2 Gb.

KeyGene has a long established reputation for generating high-quality data even for very complex genomes. For this project, the cotton genome was sequenced with 38x coverage using Single-Molecule, Real-Time (SMRT®) Sequencing. Assembly of PacBio® long reads reduced the number of contigs from more than 1 million in an existing short-read assembly to fewer than 22,000, representing a 47-fold increase in contiguity.

Thursday, January 29, 2015

Register Now: Isoform Sequencing Webinars Offer Tips on Method and Analysis

If full-length transcript information would be useful for your research, please join us for two upcoming webinars. Our scientists will offer tips for how to optimize the Iso-Seq™ method with the PacBio® System to meet your research goals.

Friday, January 23, 2015

Breaking New Frontiers in Grass Genomics to Understand Drought Tolerance with the 2014 SMRT Grant Program Winner: Oropetium thomaeum

Emerging from a myriad of interesting genome nominations, from the American cranberry to South American prawns and African Guava, Oropetium thomaeum submitted by Todd Mockler at the Donald Danforth Plant Science Center was selected as the first winner of the “Most Interesting Genome in the World” SMRT® grant program in 2014.  Also affectionately known as Oro, this grass species can be  revived with water after a long drought exposure. At 250 Mb, the genome is also the smallest amongst grasses due to compaction of complex repeat and gene structures, including previously identified expansions in osmoprotectant biosynthesis pathways.

Kicking off the second annual launch of this program, NSF postdoctoral fellow Robert VanBuren in Mockler’s group presented initial results of the Oro genome assembly and analysis at the recent Plant and Animal Genome (PAG) XXIII international conference in San Diego.  With 18 Gb of sequencing data at 65x genome coverage and read length N50 at 16,485 bp, this yielded an HGAP genome assembly containing 625 contigs at a contig N50 of 2.39 Mb. The maximum contig length was 7.98 Mb despite having a high repeat content of roughly 50% of the genome, which was more than expected.  The impressive assembly, summing to a total of 244.46 Mb, covered 98.3% of the expected genome size.  This achievement is heavily attributed to having high-quality, high-molecular-weight genomic DNA where reads longer than 20 Kb provided 10x coverage of the genome. 

Tuesday, January 20, 2015

Looking Ahead: The 2015 PacBio Technology Roadmap

By Jonas Korlach, Chief Scientific Officer

All of us at Pacific Biosciences are very proud of the momentum SMRT® Sequencing achieved in 2014, especially due to the more than 500 customer publications now in the literature describing its many applications. We remain deeply thankful to all the scientists who have applied our technology to gain new insights into genomes, transcriptomes, and epigenomes. By applying SMRT Sequencing to a wide variety of applications, our customers are demonstrating that long, unbiased reads have brought about new quality standards for many fields of genomic research. This exciting level of scientific activity and collaboration also provides us with important feedback to further optimize and develop sequencing applications for the PacBio® RS II.

In 2015, we plan to continue our track record of delivering improvements in all aspects of SMRT Sequencing. Sample preparation developments include improved and streamlined sample preparation protocols, barcoding solutions for multiplexing many samples in a SMRT Cell run, and protocols for improved yields of very long-insert libraries and full-length cDNA libraries.