PacBio Blog

Tuesday, July 22, 2014

At ISMB, Gene Myers’ Keynote Offers History, Future of Genome Assembly

At ISMB 2014 in Boston earlier this month, Gene Myers of the Max-Planck Institute for Molecular Cell Biology and Genetics, presented a keynote address entitled “DNA Assembly: Past, Present, and Future.”  Myers received the prestigious Senior Scientist Accomplishment Award from the International Society for Computational Biology (ISCB) at the event.

The ISCB Senior Scientist Accomplishment Award honors respected leaders in computational biology and bioinformatics for their significant contributions to these fields through research, education, and service. Myers is being honored as the 2014 winner for his outstanding contributions to the bioinformatics community, particularly for his work on sequence comparison algorithms, whole-genome shotgun sequencing methods, and for his recent endeavors in developing software and microscopic devices for bioimage informatics. 

His talk chronicled the history of sequence assembly methods highlighting the different technologies from Sanger sequencing to today, and the various algorithmic approaches to the problem, weaving throughout it the ideas of string graphs and de Bruijn graphs.

Myers believes the demand for lower-cost sequencing “after the genome” has hampered progress on the production of high quality de novo genome reconstructions, and resulting instead in ‘swiss cheese genomes’.  He said that generating genomes consisting of lots of small contigs was never his vision for assembly.

He spent nearly a decade out of the “DNA sequencing scene” (see his blog post “On Perfect Assembly”) because the cost-over-quality movement caused him to lose interest as a mathematician, until the advent of long-read sequencers renewed Myers’ engagement in assembly methods. He writes: “What I perceived early in 2013 was that the relatively new PacBio  ‘long read’ DNA sequencer was reaching sufficient maturity that it could produce data sets that might make this possible, or at least get us much, much closer to the goal of near perfect, reference quality reconstructions of novel genomes.” Myers noted that some in the industry had misunderstood the accuracy profile of the system, but he recognized the power of the Poisson sampling and random distribution of errors and decided last year to purchase a PacBio® RS II and “get back into the genome assembly game.”

Myers now has two PacBio RS II sequencers and, as he has discussed in his blog and presentations this year at AGBT and ISMB, he is not concerned with error rates associated with PacBio sequencing because the error is truly random (“unlike any previous technology”), and therefore “the ideal of near perfect de novo assembly is again possible.” 

He described his most recent algorithmic work on an assembler called the Dazzler (the Dresden AZZembLER) that can assemble 1-10 Gb genomes directly from a shotgun, long-read data set produced by PacBio RS II sequencers. Using Dazzler, he reported generating a de novo assembly of a human genome with an N50 of 5.5 Mb, which represents an improvement of over 1 Mb compared to our HGAP assembly in February, and with much reduced computational requirements and time. More information is available on his blog. In conclusion, he noted that long-read sequencers will enable de novo, reference-quality reconstructions, enhance comparative genomics and diversity studies, and give us an accurate picture of large-scale structural variation.

We are glad to see Myers back in the DNA sequencing scene, and very excited about the possibilities SMRT® Sequencing holds for genome assemblies!

Friday, July 11, 2014

ISMB 2014: The World Cup of Bioinformatics

We’re eager for the #ISMB conference — it’s the 22nd annual Intelligent Systems for Molecular Biology event — kicking off this weekend in Boston. As we continue to push our technology to deliver longer read lengths, we have been honored to work with many leading bioinformaticians to optimize the processing and analysis of our data.

Several of those experts will be speaking at ISMB this year. On Sunday, attendees will hear from Adam Phillippy of the National Biodefense Analysis and Countermeasures Center. He’ll be presenting at noon on producing complete genome assemblies using Single Molecule, Real-Time (SMRT®) Sequencing data. Adam’s team recently developed a new assembler called MHAP that dramatically reduces CPU power needed for building assemblies, so we are eager to hear more.

Later that day, Gene Myers from the Max Planck Institute of Molecular Cell Biology and Genetics in Dresden, Germany, will give the 2014 ISCB Accomplishment by a Senior Scientist Award keynote presentation entitled “DNA Assembly: Past, Present, and Future,” in which he'll reflect on genome assembly challenges throughout his career. According to his abstract, Myers’ talk will also cover “the surprising transition from skepticism of whole-genome shotgun sequencing to an irrational acceptance of NGS whole-genome shotgun over short reads.” He’ll speak about Dazzler, a new tool he developed to assemble genomes as large as 10 Gb directly from long PacBio® reads.

There are several other terrific keynotes scheduled for the meeting. On Monday, Harvard’s Zak Kohane will give a talk outlining the opportunities he sees for biomedical quantitative analysis experts to participate in the healthcare revolution happening today. On Tuesday, Russ Altman will offer a presentation on using informatics to better understand drug response from the molecular to the population level.

With a history of two decades of high-profile talks, ISMB is arguably the World Cup of the bioinformatics world. We hope to see you there!

Wednesday, July 9, 2014

Optimizing Eukaryotic De Novo Genome Assembly: Webinar Recording Available
Our webinar on eukaryotic genome assembly attracted a great crowd, and now we’re making the full recording available to the community. The session featured great hands-on information and best practices for working with Single Molecule, Real-Time (SMRT®) Sequencing data. “Optimizing Eukaryotic Genome Assembly with Long-Read Sequencing” featured three excellent speakers — Michael Schatz and James Gurtowski from Cold Spring Harbor Laboratory and Sergey Koren from the National Biodefense Analysis and Countermeasures Center — and was hosted by our own CSO Jonas Korlach.

Schatz kicked off the session with an overview of assemblers for PacBio® data (as well as recommendations for when to use each one) and a look at the challenges of short-read assemblies. He also set expectations around long-read data, noting that for genomes less than 100 Mb, users should expect a nearly perfect assembly from the automated workflow. Genomes up to 1 Gb should be represented in a high-quality assembly with a contig N50 of at least 1 Mb. Genomes larger than that will have shorter contig N50 stats and will require larger computational power, he added.

Tuesday, July 1, 2014

Scientists Generate the First Personal Transcriptome Using SMRT Sequencing

A new paper from scientists at Stanford University and Yale University describes the use of Single Molecule, Real-Time (SMRT®) Sequencing to generate transcriptomes for three individuals. The work is believed to be the first personal transcriptome analysis using long-read sequencing.

The paper, entitled “Defining a personal, allele-specific, and single-molecule long-read transcriptome,” was published in PNAS by Hagen Tilgner, Fabian Grubert, Donald Sharon, and Michael Snyder. Last year, the same authors published a study using SMRT Sequencing to analyze transcriptomes across tissue samples from human organs. In the PNAS publication, they compare metrics from the new data set to those from the previous study.

Friday, June 27, 2014

At SFAF 2014, Great Science and High-Quality Genomes

It’s been a busy start to the summer, but we’re still basking in the top-notch presentations and posters from the Sequencing, Finishing, and Analysis in the Future meeting last month. Hosted by Los Alamos National Laboratory in Santa Fe, this has become a premier event for scientists working on sequencing protocols, analysis, and assembly methods.

Many speakers presented data including reads from Single Molecule, Real-Time (SMRT®) Sequencing. Jeff Rogers from Baylor College of Medicine used long PacBio® reads with the PBJelly algorithm to fill gaps in many mammalian genomes, including sheep, rat, baboon, sooty mangabey, and mouse lemur. Tina Graves-Lindsay from Washington University reported work on improving the reference human genome through BAC sequencing and the use of a haploid human data set, which included PacBio’s CHM1TERT data release. James Gurtowski from Cold Spring Harbor Labs detailed improvements to genome assemblies of yeast, Arabidopsis, and various rice strains using his new algorithms, ECTools.

Monday, June 23, 2014

Unprecedented Read Length at the Icahn Institute:
Precise Sizing + SMRT Sequencing

At the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai in New York City, technology development expert Robert Sebra, Ph.D., sees tremendous need for long-read, high-accuracy sequencing for use in microbial surveillance, detection of repeat expansions, and other research applications. To meet that demand, he relies on Single Molecule, Real-Time (SMRT®) Sequencing from Pacific Biosciences with BluePippin™ automated DNA size selection from Sage Science. Together, these tools offer a powerful solution and industry-leading read lengths that allow Sebra and other researchers to resolve repeat elements and structural variants, rapidly close microbial genomes, and measure epigenetic marks.

Sebra, an assistant professor of genetic and genomic sciences, is no stranger to the SMRT Sequencing platform: he spent five years working at PacBio helping to develop that technology. Ultimately, his belief in the system led him to join the Icahn Institute, where he would get to use the PacBio® sequencer as a customer. Sebra, who came to Mount Sinai in 2012, says, “I had experienced firsthand the value of long-read sequencing and wanted to apply it to human and infectious disease research.”

Monday, June 2, 2014

Intro to the Iso-Seq Method: Full-length transcript sequencing

With the recent launch of SMRT Analysis v2.2, we’re excited to introduce analysis software support for the new Iso-Seq™ method for sequencing full-length transcripts and gene isoforms, with no assembly required! Today we’ll take a deeper look at the Iso-Seq method to explain its unique scientific value and review publications from those already applying Single Molecule, Real-Time (SMRT®) Sequencing to this exciting area of research.

In plant and animal genomes, along with all higher eukaryotic organisms, the majority of genes are alternatively spliced to produce multiple transcript isoforms. In humans, for example, there is evidence for alternative splicing of more than 95% of genes [1], with an average of more than five isoforms per gene.  Gene regulation through alternative splicing can dramatically increase the protein-coding potential of a genome that contains a limited number of genes that encode proteins. Somewhat surprisingly, alternatively spliced isoforms from a single gene can also have very different, even antagonistic, functions [2]. Therefore, understanding the functional biology of a genome requires knowing the full complement of isoforms. Microarrays and high-throughput cDNA sequencing have become incredibly useful tools for studying transcriptomes, yet these technologies provide small snippets of transcripts and building complete transcripts to study gene isoforms has been challenging.

Thursday, May 29, 2014

Research Studies Use Sequencing to Track Path of Infection Outbreaks

A talk at last week’s ASM conference continued the recent trend of scientists using Single Molecule, Real-Time (SMRT®) Sequencing in research projects designed to better understand the transmission path of hospital-acquired infections.

The presentation, entitled “Tracking Hospital Patients and Environment with Complete Genome Sequencing of Carbapenem-Resistant Klebsiella pneumoniae and other Enterobacteriaceae,” came from Julie Segre, a chief investigator at the National Human Genome Research Institute.

Wednesday, May 28, 2014

The Sequence Analysis Meeting: SFAF 2014

The Sequencing, Finishing, and Analysis in the Future (SFAF) meeting kicks off today in Santa Fe, New Mexico. The conference is hosted by Los Alamos National Laboratory and focuses on the analytical details that are so important as the community assesses how to get the most out of all this sequence data.

This year, we will have two PacBio speakers, and there will be a number of other talks from users of our long-read sequence data. Steve Turner, our CTO, will speak on Wednesday morning about the use of Single Molecule, Real-Time (SMRT®) Sequencing for generating highly contiguous genome assemblies as well as for transcriptome analysis that can resolve complete isoforms. Steve will look at how chemistry and other improvements can push PacBio’s average consensus accuracy to ~Q50 and N50 data to greater than 10,000 base pairs.

Thursday, May 22, 2014

At ASM, Pioneering Scientists Presented Bacterial Methylome Highlights

This week’s annual meeting of the American Society for Microbiology was every bit as interesting, data-rich, and jam-packed as promised. We’re grateful to everyone who stopped by our booth and got to know more about Single Molecule, Real-Time (SMRT®) Sequencing.

Our favorite session, “Bacterial Methylomes,” took place on the last day of the conference and was organized by Rich Roberts, Nobel laureate and Chief Scientific Officer at New England Biolabs. The session highlighted several projects analyzing genome-wide methylation states of bacteria, a task which has been all but impossible due to the technical inability to detect such base modifications. As Roberts kicked off the event, he noted that until recently, most scientists avoided this area of study — until, he said, the PacBio® technology came along.