PacBio Blog

Friday, February 27, 2015

AGBT 2015: The PacBio Workshop

The PacBio workshop has concluded. If you would like to receive a recording of the talk, click here.

AGBT Highlights, Day Two: Human Genomes, Variation, and the Rapidly Evolving Y Chromosome

The first full day of AGBT kicked off with a great talk from Evan Eichler from the University of Washington. Starting with the premise that characterizing genetic variation is key to understanding phenotypes, his presentation offered in-depth looks into human genome projects designed to fully represent data missed in existing assemblies and current whole genome sequencing studies. Eichler pointed out that short-read sequencing misses a lot of structural variation, particularly when it occurs near repeat-rich regions. He said that every genome sequenced with short-read technology is missing important variation, and that a big problem is our inability to quantify just how much is missing. Eichler told attendees that he uses SMRT® Sequencing because it allows direct observation of native DNA, offers long reads, and has very little GC bias. He presented two sequencing projects focused on hydatidiform moles (CHM1 and CHM13), which have haploid human genomes. In one project, he reported detecting 26,015 structural variants, and closing or shrinking of 90 gaps in the human reference genome (many of which included GC-rich sequence) and adding a total of 1.1 Mb of novel sequence. He noted that one of the most important findings of the work was that 92 percent of insertions and 60 percent of deletions found in the genome were novel — including many in protein-coding regions — perhaps indicating how much has been missed in previous human genome population studies. An analysis of STRs found in the SMRT Sequencing-generated assembly showed that they were 3x more abundant and 2.8x longer than STRs in the existing human reference genome, which also suggests that current knowledge is incomplete. (Much of this work was included in this Nature paper from Chaisson et al.)

In a separate project, Eichler’s team compared the information gleaned from SMRT Sequencing of two haploid human genome samples to information obtained through the 1,000 Genomes Project. In the two haploid genomes, he said, they found almost as much structural variation as was found across more than 2,500 diploid human genomes in the public dataset, sequenced using short-read methods. He added that once structural variation is fully catalogued, standard analysis methods can be used to go back and look for those elements in existing human genome data, and resolve about 50% of the SV genotypes. With a fairly small number of human genomes, he said, it may be possible to build a fairly comprehensive view of structural variation in the human genome.

We were also very interested in a presentation from NHGRI Director Eric Green about the recently announced Precision Medicine Initiative. The goals around using genomics to guide targeted treatments mesh nicely with NHGRI’s other efforts to generate a more comprehensive view of human genetic variation and to find the missing heritability in our genomes. Much of this was discussed at a planning session last year, which you can check out in this video. Notably, Evan Eichler also used the term Precision Medicine in his talk, noting that "if you believe in precision medicine, you should want to be comprehensive and precise."

Later in the day, we particularly enjoyed talks from Sarah Tishkoff at the University of Pennsylvania and David Page at the Whitehead Institute. Tishkoff presented great data from genomic studies of people from remote locations in Africa, such as studies of genetic links to short statures among pygmy people. She urged attendees to support sequencing in ethnically diverse populations to generate many reference genomes that can be used to better understand variation in populations not well represented by existing reference genomes. Page’s talk focused on sequencing the X and Y chromosomes, which he called “the genome’s most challenging substrates.” For instance, the human Y chromosome features eight palindromes, the largest of which is nearly 3 Mb; in mouse, the major challenge with this chromosome is 180 copies of a 500 kb repeat unit. Page used BAC clones and an iterative sequencing approach known as SHIMS to characterize these complex regions, which are largely absent from current reference assemblies.

At noon today we’ll be hosting our workshop, “Toward Comprehensive Genomics — Past, Present and Future.” Check back soon for the live-stream video!

Thursday, February 26, 2015

AGBT Highlights, Day One: Advancing Human Reference Assembly & Sequencing in the Clinic

It is great to be here in Marco Island for the AGBT meeting! The 16th annual meeting hit the ground running with a pre-meeting workshop hosted by the Genome Reference Consortium (GRC) followed by an opening session that was more clinically focused than many attendees are used to at this tech-heavy conference. From the dynamic Q&A sessions, it was clear that these were precisely the kind of talks that people have been looking for as this meeting evolves downstream along with genomic science.

The GRC workshop, entitled ‘Advancing the Human Reference Assembly’ included four speakers: Valerie Schneider (NCBI), Tina Graves-Lindsay (TGI), Karyn Meltz Steinberg (TGI) and Deanna Church (Personalis, Inc.)  They stated that the current human genome reference assembly represents a mixture of over 70 individuals’ genomes in a single linear sequence. Thanks to population sequencing efforts, like the 1,000 Genome Project, we now know that there are regions of the human genome that are highly polymorphic, with multiple haplotypes that are segregating in the global population. Many of these regions (like MHC and KIR) are heavily associated with disease and immune-response.

Making accurate genotype calls in these regions, which include a large degree of structural variation, requires improved reference sequence representation of these population-specific haplotypes.  Deanna Church commented on the significance of this by saying, “If you want to do genotype and phenotype association, you better get genotype correct.” To better represent these population-specific haplotype reference sequences, the GRC’s latest human genome build (GRCh38) includes alternative loci.  With the inclusion of these alt loci, Valerie Schneider presented how we’re now entering the “Multiple Genome Era”, where the human reference genome will be expanded to include additional alt loci sequence to better represent the population diversity of these highly polymorphic loci. 

To sequence and assemble these new reference sequences, Tina Graves-Lindsay and Karyn Meltz Steinberg then outlined a new strategy to build gold and platinum quality genome assemblies, using long-read data (PacBio® data) and other complimentary technologies. The GRC presenters also described how the GRC will be adding more alt loci diversity to the Human Genome Reference by using PacBio sequencing to do whole-genome de novo sequencing and assembly of additional individuals from under-represented populations.  Check out the slides from workshop on slideshare. 

The formal session began with a talk from David Goldstein, who recently moved from Duke University to Columbia University, about precision medicine in neurological disease. He focused on several large-scale studies of patients with epilepsy, including a project that sequenced more than 350 affected children and their parents to find de novo mutations. While many of the mutations detected are rare, Goldstein noted that they often are found in common biological pathways so it may be possible to stratify patients. In one example of such grouping, patients with a mutation in the KCNT1 gene were given quinidine, an FDA-approved drug that was never previously indicated for epilepsy. In three cases, two patients saw significant improvement in the severity or frequency of their seizures. Goldstein pointed out that this targeted approach would never have been found without genomics.

As he presented similar examples from other studies, Goldstein noted that the stakes are considerably higher for getting the genetics right when the goal is diagnostic sequencing. He also said that patients’ genomes need to be comprehensively interrogated; in epilepsy, at least, new variants are being found so often that a gene panel approach wouldn’t keep up. His talk was hopeful for the future of clinical sequencing and improved bioinformatics to explain findings: results are already impressive, he said, but “this is only going to get better.”

Rick Lifton from Yale University also spoke in the kickoff session, focusing on the need to determine the function of more of the human genome than is currently understood. “There’s an awful lot of room for new discovery,” he said, pointing out that of the 21,000 known protein-coding genes in the human genome, only 3,000 have been clearly linked to disease. Lifton discussed studies of various conditions, such as hypertension, where typical approaches for understanding Mendelian disease proved useful for more common diseases. Based on success in finding de novo mutations in several studies, Lifton called for routine sequencing in the clinic. He added that truly understanding the human genome will require elucidating noncoding regions, determining the consequence of every mutation, and identifying biological targets for therapeutics.

The rest of the agenda looks just as interesting. We’ll keep reporting from the sessions, so check back for more soon!

Tuesday, February 24, 2015

AGBT 2015: Seeing the Genome in a New Light (Sunshine?)

Like many others, we’re looking forward to an exciting week of science and sun at the 16th annual Advances in Genome Biology and Technology (AGBT) conference! We’re hosting a lunch workshop on Friday, February 27, in the Palms Ballroom from 12:00 pm to 2:00 pm EST. We hope you can join us onsite (please reserve your seat) and even if you’re not at the conference, you can watch the live stream.

Here’s the agenda:

Towards Comprehensive Genomics – Past, Present and Future

The Human Genome: From One to One Million
J. Craig Venter, Human Longevity Inc.

Is Perfect Assembly Possible?
Gene Myers, Max Planck Institute

Finishing Genomes: Why Does It Matter?
Deanna Church, Personalis

De Novo Assembly of a Human Diploid Genome for the Asian Genome Project
Jeong-Sun Seo, Macrogen Inc. and Seoul National University College of Medicine

PacBio Long Read Sequencing and Structural Analysis of a Breast Cancer Cell Line
W. Richard McCombie, Cold Spring Harbor Laboratory

After reviewing the packed AGBT agenda, we’ve already spotted several can’t-miss presentations. These speakers and talks look especially promising and we’ll be covering several of them on the blog later this week:

Evan Eichler, University of Washington: “Resolving the Complexity of Human Genetic Variation by Single-Molecule Sequencing”

Matthew Blow, Joint Genome Institute: “Sequencing-Based Approaches for Genome-Scale Functional Annotation”

Tim Smith, U.S. Meat Animal Research Center: “A Genome Assembly of the Domestic Goat from 70x Coverage of Single Molecule Real Time Sequence”

Amy Ly, The Genome Institute at Washington University: “PacBio Application – Influenza Viral RNA-Seq”

Somasekar Seshagiri, Genentech: “Spectrum of Diverse Genomic Alterations Define Non-Clear Cell Renal Carcinoma Subtypes”

Gene Myers, Max Planck Institute: “Low Coverage, Correction-Free Assembly for Long Reads”AGBT is also known for its excellent poster sessions, and we’ll be spending plenty of time in the poster hall this year. If you’re interested in learning more about SMRT® Sequencing results, be sure to stop by some of these posters.

And if you need a break from the marathon, feel free to put your feet up in our suite (Lanai #189) during our open hours:

Wednesday: 8:00 p.m. – 11:00 p.m.
Thursday: 3:00 p.m. – 6:00 p.m. and 8:00 p.m. – 11:00 p.m.
Friday: 3:00 p.m. – 6:00 p.m.

We look forward to seeing you in Marco Island and for those tuned in at home via our blog for lots of updates and live streaming of the workshop!

Wednesday, February 4, 2015

High-Quality Genome Assembly and Transcriptome of Cotton Using SMRT Sequencing

A recent research partnership with KeyGene, a Dutch plant genomics and crop improvement company, has resulted in an integrated whole-genome assembly and transcriptome of Gossypium hirsutum, or tetraploid cotton. This is the first known complete assembly for a polyploid crop with a genome larger than 2 Gb.

KeyGene has a long established reputation for generating high-quality data even for very complex genomes. For this project, the cotton genome was sequenced with 38x coverage using Single-Molecule, Real-Time (SMRT®) Sequencing. Assembly of PacBio® long reads reduced the number of contigs from more than 1 million in an existing short-read assembly to fewer than 22,000, representing a 47-fold increase in contiguity.

Thursday, January 29, 2015

Register Now: Isoform Sequencing Webinars Offer Tips on Method and Analysis

If full-length transcript information would be useful for your research, please join us for two upcoming webinars. Our scientists will offer tips for how to optimize the Iso-Seq™ method with the PacBio® System to meet your research goals.

Friday, January 23, 2015

Breaking New Frontiers in Grass Genomics to Understand Drought Tolerance with the 2014 SMRT Grant Program Winner: Oropetium thomaeum

Emerging from a myriad of interesting genome nominations, from the American cranberry to South American prawns and African Guava, Oropetium thomaeum submitted by Todd Mockler at the Donald Danforth Plant Science Center was selected as the first winner of the “Most Interesting Genome in the World” SMRT® grant program in 2014.  Also affectionately known as Oro, this grass species can be  revived with water after a long drought exposure. At 250 Mb, the genome is also the smallest amongst grasses due to compaction of complex repeat and gene structures, including previously identified expansions in osmoprotectant biosynthesis pathways.

Kicking off the second annual launch of this program, NSF postdoctoral fellow Robert VanBuren in Mockler’s group presented initial results of the Oro genome assembly and analysis at the recent Plant and Animal Genome (PAG) XXIII international conference in San Diego.  With 18 Gb of sequencing data at 65x genome coverage and read length N50 at 16,485 bp, this yielded an HGAP genome assembly containing 625 contigs at a contig N50 of 2.39 Mb. The maximum contig length was 7.98 Mb despite having a high repeat content of roughly 50% of the genome, which was more than expected.  The impressive assembly, summing to a total of 244.46 Mb, covered 98.3% of the expected genome size.  This achievement is heavily attributed to having high-quality, high-molecular-weight genomic DNA where reads longer than 20 Kb provided 10x coverage of the genome. 

Tuesday, January 20, 2015

Looking Ahead: The 2015 PacBio Technology Roadmap

By Jonas Korlach, Chief Scientific Officer

All of us at Pacific Biosciences are very proud of the momentum SMRT® Sequencing achieved in 2014, especially due to the more than 500 customer publications now in the literature describing its many applications. We remain deeply thankful to all the scientists who have applied our technology to gain new insights into genomes, transcriptomes, and epigenomes. By applying SMRT Sequencing to a wide variety of applications, our customers are demonstrating that long, unbiased reads have brought about new quality standards for many fields of genomic research. This exciting level of scientific activity and collaboration also provides us with important feedback to further optimize and develop sequencing applications for the PacBio® RS II.

In 2015, we plan to continue our track record of delivering improvements in all aspects of SMRT Sequencing. Sample preparation developments include improved and streamlined sample preparation protocols, barcoding solutions for multiplexing many samples in a SMRT Cell run, and protocols for improved yields of very long-insert libraries and full-length cDNA libraries.

Tuesday, January 6, 2015

PAG 2015: SMRT Sequencing and the “Most Interesting Genome” Grant Program

The 23rd annual International Plant and Animal Genome meeting is right around the corner – it’s taking place January 10-14 in sunny San Diego. The meeting has become an important venue for customers showcasing their Single Molecule, Real-Time (SMRT®) Sequencing data on complex plant and animal genome projects.

This year, more than 50 researchers from around the world will be presenting their work, many representing large consortium efforts, using SMRT Sequencing to assemble de novo references and/or to analyze complex genomes of a variety of plants and animals. This includes data generated with our Iso-Seq™ application for full-length transcript sequencing.  Some exciting PAG program showcase previews include researchers presenting genomic efforts with ice plant, cattle, cuttlefish, legumes, sheep, sugar pine, and more.

Tuesday, December 16, 2014

At ASHI 2014, SMRT Sequencing Meets HLA Typing with Great Results

Earlier this fall, we headed to Denver for ASHI, or the annual meeting of the American Society for Histocompatibility and Immunogenetics. Though we’d attended this conference in the past, this was our first year having an exhibit hall booth and workshop, both of which were enthusiastically received by the conference attendees. Even though it’s a fairly recent development for scientists to apply the PacBio® sequencing platform to analyze the HLA genes, which are often used in histocompatibility research studies, there were already many great examples and exciting data generated by users on the PacBio platform.

Our luncheon workshop on fully phased HLA and KIR typing was packed, and that was no doubt due to our top-tier speakers: Prof Steve Marsh, from the Anthony Nolan Research Institute and University College London; Nezih Cereb, CEO & Co-Founder of Histogenetics; and Martin Maiers, Director of Bioinformatics Research at the National Bone Marrow Donor Program. We were able to record video of their presentations, which you can check out below. You can also peruse some posters showcasing Single Molecule, Real-Time (SMRT®) Sequencing for HLA and KIR analysis, as well as a webinar recorded at the meeting.