PacBio Blog

Friday, January 23, 2015

Breaking New Frontiers in Grass Genomics to Understand Drought Tolerance with the 2014 SMRT Grant Program Winner: Oropetium thomaeum

Emerging from a myriad of interesting genome nominations, from the American cranberry to South American prawns and African Guava, Oropetium thomaeum submitted by Todd Mockler at the Donald Danforth Plant Science Center was selected as the first winner of the “Most Interesting Genome in the World” SMRT® grant program in 2014.  Also affectionately known as Oro, this grass species can be  revived with water after a long drought exposure. At 250 Mb, the genome is also the smallest amongst grasses due to compaction of complex repeat and gene structures, including previously identified expansions in osmoprotectant biosynthesis pathways.

Kicking off the second annual launch of this program, NSF postdoctoral fellow Robert VanBuren in Mockler’s group presented initial results of the Oro genome assembly and analysis at the recent Plant and Animal Genome (PAG) XXIII international conference in San Diego.  With 18 Gb of sequencing data at 65x genome coverage and read length N50 at 16,485 bp, this yielded an HGAP genome assembly containing 625 contigs at a contig N50 of 2.39 Mb. The maximum contig length was 7.98 Mb despite having a high repeat content of roughly 50% of the genome, which was more than expected.  The impressive assembly, summing to a total of 244.46 Mb, covered 98.3% of the expected genome size.  This achievement is heavily attributed to having high-quality, high-molecular-weight genomic DNA where reads longer than 20 Kb provided 10x coverage of the genome. 

Both Mockler and VanBuren were blown away by this new record-breaking plant genome assembly.  “PacBio is a game changer for plant genomics,” says VanBuren, citing they were also able to identify all the telomeres in the genome based on their tandem repeat signatures.  The compact genome serves as an excellent resource for comparative work amongst grass genomes to understand large-scale structural variation, genome structure reorganization, metabolic networks, stress pathways, and other secondary analyses.  View the recording of the preliminary analysis presented for the Oro genome at the PAGXXIII conference.

Following this initial success, the 2015 SMRT Grant program is supported with co-sponsorship from Sage Science, Computomics, and the Arizona Genomics Institute.  The latest P6-C4 chemistry will be utilized for the winning proposal.  This release has been shown to deliver average sequencing read lengths of >10-15 kb with extreme reads in the distribution of > 60 kb on the PacBio® RS II system for complex genome projects.  The average throughput from each SMRT Cell ranged from 500 Mb to 1 Gb depending on the application.  These features also further accelerated PacBio’s Iso-Seq™ application to deliver whole-transcriptome sequencing of full-length cDNA transcripts to distinguish between isoforms for genome annotation, as well as gene discovery.

Other submissions received in 2014 include a critically endangered Hawaiian crow (Corvus hawaiienis), famine-causing ascomycete fungal pathogen (Cercospora zeina), and hermaphroditic fish (Kryptolebias marmoratus).  We look forward to reading (and learning!) about all the exciting work that drives the passion of scientists through the submitted proposals for 2015!

Details for the 2015 “Most Interesting Genome in the World” SMRT Grant program can be found at

Tuesday, January 20, 2015

Looking Ahead: The 2015 PacBio Technology Roadmap

By Jonas Korlach, Chief Scientific Officer

All of us at Pacific Biosciences are very proud of the momentum SMRT® Sequencing achieved in 2014, especially due to the more than 500 customer publications now in the literature describing its many applications. We remain deeply thankful to all the scientists who have applied our technology to gain new insights into genomes, transcriptomes, and epigenomes. By applying SMRT Sequencing to a wide variety of applications, our customers are demonstrating that long, unbiased reads have brought about new quality standards for many fields of genomic research. This exciting level of scientific activity and collaboration also provides us with important feedback to further optimize and develop sequencing applications for the PacBio® RS II.

In 2015, we plan to continue our track record of delivering improvements in all aspects of SMRT Sequencing. Sample preparation developments include improved and streamlined sample preparation protocols, barcoding solutions for multiplexing many samples in a SMRT Cell run, and protocols for improved yields of very long-insert libraries and full-length cDNA libraries.

With regard to sequencing runs, as was the case in the previous three years, we expect to deliver another ~4-fold increase in throughput, reaching >4 Gb of data per SMRT Cell run, with average read lengths increasing to 15-20 kb. We plan to accomplish this through a combination of improvements in the sequencing chemistry, protocol workflows, and software. An example is active loading to increase the efficiency of loading one polymerase per ZMW at frequencies greater than the Poisson limit. In the area of data analysis, we will continue to work with the bioinformatics community to create faster algorithms for de novo genome assemblies, further developing solutions like our FALCON assembler for resolving diploid or polyploid genomes, and streamlined analysis workflows for other applications such as Iso-Seq™, full-length HLA, and others.

It is exciting to think about the new frontiers in genomics research that will be realized by this continued innovation and performance increases in SMRT Sequencing, for example:
High-quality population and disease-specific human reference genomes
Comprehensive views of tissue and disease-specific transcriptome architectures
High-quality plant and animal reference genomes and transcriptomes
Comprehensive characterization of structural variation in genomes
Large-scale microbial genome and epigenome studies

Examples of these successes have already been featured at last week’s Plant & Animal Genome meeting, with over 50 researchers presenting their work on the use of SMRT Sequencing in the plant and animal research space. Of course, we are also looking forward to next month’s AGBT conference, where advances in the human genomics research space will be highlighted as part of the conference program and during our workshop on February 27.

We are excited to interact with many of you at these and other forums as we support the efforts voiced by many in the community to “bring the ‘W’ back into whole-genome sequencing,” e.g. at the NHGRI event held last year on “Future Opportunities for Genome Sequencing and Beyond: A Planning Workshop for the National Human Genome Research Institute.” We wish you continued success in your research, and thank you again for your support!

Tuesday, January 6, 2015

PAG 2015: SMRT Sequencing and the “Most Interesting Genome” Grant Program

The 23rd annual International Plant and Animal Genome meeting is right around the corner – it’s taking place January 10-14 in sunny San Diego. The meeting has become an important venue for customers showcasing their Single Molecule, Real-Time (SMRT®) Sequencing data on complex plant and animal genome projects.

This year, more than 50 researchers from around the world will be presenting their work, many representing large consortium efforts, using SMRT Sequencing to assemble de novo references and/or to analyze complex genomes of a variety of plants and animals. This includes data generated with our Iso-Seq™ application for full-length transcript sequencing.  Some exciting PAG program showcase previews include researchers presenting genomic efforts with ice plant, cattle, cuttlefish, legumes, sheep, sugar pine, and more.

Tuesday, December 16, 2014

At ASHI 2014, SMRT Sequencing Meets HLA Typing with Great Results

Earlier this fall, we headed to Denver for ASHI, or the annual meeting of the American Society for Histocompatibility and Immunogenetics. Though we’d attended this conference in the past, this was our first year having an exhibit hall booth and workshop, both of which were enthusiastically received by the conference attendees. Even though it’s a fairly recent development for scientists to apply the PacBio® sequencing platform to analyze the HLA genes, which are often used in histocompatibility research studies, there were already many great examples and exciting data generated by users on the PacBio platform.

Our luncheon workshop on fully phased HLA and KIR typing was packed, and that was no doubt due to our top-tier speakers: Prof Steve Marsh, from the Anthony Nolan Research Institute and University College London; Nezih Cereb, CEO & Co-Founder of Histogenetics; and Martin Maiers, Director of Bioinformatics Research at the National Bone Marrow Donor Program. We were able to record video of their presentations, which you can check out below. You can also peruse some posters showcasing Single Molecule, Real-Time (SMRT®) Sequencing for HLA and KIR analysis, as well as a webinar recorded at the meeting.

Thursday, December 11, 2014

Review Article: Long-Read Sequencing Offers Better Understanding of Pluripotency

A new review article offers a nice overview of attempts to characterize the transcriptome of human stem cells using RNA-seq, the Iso-Seq™ method, and more. Kin Fai Au and Vittorio Sebastiano, scientists at the University of Iowa and Stanford University, respectively, contributed the review to Current Opinion in Genetics & Development.

“The introduction of the RNA-Seq technology based on [second-generation sequencing technology] has provided a remarkable step forward providing a fast and inexpensive way to determine the transcriptome of a given cell type and several remarkable works have been done using this type of approach,” Au and Sebastiano write. “Nonetheless tasks like de novo discovery of genes, gene isoforms assembly or transcript and isoform abundance determination are still challenging and far from being achieved.”

Thursday, December 4, 2014

A New Reference Genome for Shigella: SMRT Sequencing of a Historic Sample

In a special issue of The Lancet dedicated to World War I, an article by scientists from the Wellcome Trust Sanger Institute used Single Molecule, Real-Time (SMRT®) Sequencing to decode the genome of the first isolate ever collected of Shigella flexneri.

The bacterium, a descendant of E. coli and first identified as a separate strain in 1902, was responsible for severe dysentery among World War I troops due to poor hygienic conditions in the trenches. Today, S. flexneri is one of the leading causes of diarrheal death among children in developing countries and other areas of poor sanitation.

Wednesday, November 12, 2014

New Transcript Study Offers Clues to Pathogenesis of Repeat Disorders Linked to FMR1

It’s been nearly two years since a team of scientists from the University of California, Davis, School of Medicine published the first-ever complete sequence of FMR1, the gene associated with a repeat expansion that causes Fragile X syndrome. That team is once again breaking new ground, this time characterizing alternative splicing and full-length transcripts of FMR1. For both studies, the scientists relied on Single Molecule, Real-Time (SMRT®) Sequencing because its uniquely long reads allowed them to span the gene and generate sequence and isoform data that would not have been possible any other way.

The new paper, “Differential increases of specific FMR1 mRNA isoforms in premutation carriers,” was published in the Journal of Medical Genetics and comes from lead author Dalyir Pretto and senior author Flora Tassone, along with collaborators. They aimed to elucidate the transcript levels of FMR1 in people with what’s known as a premutation allele — people who have more repeats within the FMR1 gene than normal, but fewer repeats than full-mutation Fragile X patients have. This group is at risk for fragile X-associated tremor/ataxia syndrome as well as fragile X-associated primary ovarian insufficiency.

Monday, November 10, 2014

Nature Paper Offers Novel Sequence, Structural Variant Data for a More Complete Human Genome

A new paper out in Nature extends our view into the human genome and challenges current ideas about genetic variation. “Resolving the complexity of the human genome using single-molecule sequencing” comes from first author Mark Chaisson, senior author Evan Eichler, and their collaborators at the University of Washington, University of Bari Aldo Moro, and University of Pittsburgh. In the paper, the scientists describe an important effort to fill gaps and better characterize structural variation in the human genome by using Single Molecule, Real-Time (SMRT®) Sequencing data.

The team sequenced a haploid human genome, using a hydatidiform mole cell line (CHM1), to about 40x coverage. Eichler’s group was able to close or shrink 55 percent of the 160 euchromatic gaps existing in the reference genome, the vast majority of them marked by GC-rich regions with several kilobases of short tandem repeats. The approach used repeated rounds of mapping and assembling data, and added more than 1 Mb of novel sequence — including novel exons and putative regulatory sequences — to the genome.

Wednesday, October 29, 2014

‘Revolutionizing HLA Typing': Uppsala’s Ulf Gyllensten on How Long Reads Give Access to New Areas of the Human Genome

In a recent interview with Theral Timpson — part of Mendelspod’s series on long-read sequencing — Ulf Gyllensten, a scientist at Uppsala University, spoke about using PacBio® technology for HLA typing, human genome studies, transcriptomics, and more.

Based in the medical genetics and genomics department, Gyllensten focuses on two areas: using systems biology to study biological variation in human physiology and studying the epidemiology of human papilloma virus and its genetic link to cervical cancer. He also works with the National Genomics Infrastructure, a national core facility in Sweden for genotyping and DNA sequencing, where he has access to all commercially available sequencing platforms.

In the podcast, Gyllensten spoke about advances in screening for HPV, his predictions for the widespread use of genome sequencing in the clinic, and applications using Single Molecule, Real-Time (SMRT®) Sequencing for human genome studies.

Tuesday, October 21, 2014

Data Release: Whole Human Transcriptome from Brain, Heart, and Liver

In higher eukaryotic organisms, like humans, RNA transcripts from the vast majority of genes are alternatively spliced. Alternative splicing dramatically increases the protein-coding potential of eukaryotic genomes and its regulation is often specific to a given tissue or developmental stage.

Using our updated Iso-Seq™ sample preparation protocol, we have generated a dataset containing the full-length whole transcriptome from three diverse human tissues (brain, heart, and liver). The updated version of the Iso-Seq method incorporates the use of a new PCR polymerase that improves the representation of larger transcripts, enabling sequencing of cDNAs of nearly 10 kb in length. The inclusion of multiple sample types makes this dataset ideal for exploring differential alternative splicing events. Download the polished, full-length transcript sequences and the raw data files.