PacBio Blog

Monday, June 29, 2015

Nature Methods Paper Uses Long-Read Data for Highly Contiguous Diploid Human Genome

A new publication in Nature Methods describes a new single-molecule assembly approach that resulted in “the most contiguous clone-free human genome assembly to date,” according to lead authors Matthew Pendleton, Robert Sebra, Andy Pang, and Ajay Ummat.

The paper, “Assembly and Diploid Architecture of an Individual Human Genome via Single Molecule Technologies,” comes from a large team of collaborators at the Icahn School of Medicine at Mount Sinai, Cornell, Cold Spring Harbor Laboratory, and other institutions.

Their new approach leverages the best aspects of each single-molecule data type by combining long-read sequencing for de novo assembly with single-molecule genome maps for scaffolding. The resulting hybrid assembly represents a mix of SMRT® Sequencing data and single-molecule genome maps from BioNano Genomics’ NanoChannel Arrays.

The paper describes sequencing the well-studied NA12878 genome using SMRT Sequencing and generating single-molecule genome maps with nicking enzymes. “Individually, the assemblies and genome maps markedly improve contiguity and completeness compared with de novo assemblies from clone-free, short-read shotgun sequencing data,” the authors write. “Moreover, by combining the two platforms, we achieve scaffold N50 values greater than 28 Mb, improving the contiguity of the initial sequence assembly nearly 30-fold and of the initial genome map nearly 8-fold.”

The scientists then compared their assembly to the human reference genome to identify a comprehensive set of genetic variants, including a wide variety of larger structural variants that are often overlooked by short-read SBS approaches. The scientists note that while short-read technologies are frequently used to survey genomes to identify single nucleotide variants, they cannot resolve most large-scale genetic variation, including a wide variety of structural variants and repetitive regions that confound short-read assemblies.

“Though the cost of sequencing has markedly decreased, de novo human genome analysis has, to some extent, regressed,” the authors report. “Although HuRef and the original Celera whole-genome shotgun assembly have scaffold N50 values … of 19.5 Mb and 29 Mb respectively, the best next-generation sequencing (NGS) assemblies have scaffold N50 values of 11.5 Mb, even with the use of high-coverage fosmid jumping libraries.” The biggest challenges in these short-read assemblies, they add, are repetitive structures, transposable elements, segmental duplications, and heterochromatin.

Advantages of this extraordinary contiguity in their single-molecule assembly, to which short-read NGS data was later added, include detecting large structural variants and successfully phasing both single nucleotide and structural variants. Comparisons of the assembly to reference genomes allowed the team to resolve and phase structural variants such as tandem repeats across the genome. They successfully separated maternal and paternal alleles, revealing complex events that had been missed in previous assemblies.

For structural elements, the authors report that “a major benefit of continuous long reads is the ability to directly observe structural variants,” an approach they say is more effective than relying on breakpoint analysis or local realignment.

The combination of SMRT Sequencing data, genome maps, and NGS data “allowed us to resolve long-standing assembly discrepancies,” the scientists write.

Thursday, June 25, 2015

SMRT Data Delivers for Next-Generation HLA Typing at Anthony Nolan Research Institute

A new publication in PLoS One from authors at Anthony Nolan’s Research Institute describes a feasibility study for HLA typing using SMRT® Sequencing. The research institute, where the world’s first bone marrow registry started in 1973, is part of the UK-based charity dedicated to improving the outcomes of bone marrow transplantation. Scientists at Anthony Nolan are leaders in HLA typing, which is an important step in matching a bone marrow or stem cell donor to a patient in need.

The Anthony Nolan team adopted the PacBio® system last year, and this publication reflects its efforts to test and establish the new standards for HLA typing. In "HLA Typing for the Next Generation," from lead author Dr Neema Mayor and senior author Professor Steven Marsh, they discuss the sequencing and analysis of various types of representative samples typically seen in their pipeline.

Monday, June 15, 2015

Scientists Publish New Methylation Analysis Protocols Using SMRT Sequencing

Scientists from the Icahn School of Medicine at Mount Sinai and the University of Saskatchewan teamed up to develop an innovative approach to methylation analysis using Single Molecule, Real-Time (SMRT®) Sequencing. The resulting method was just published in BMC Genomics.

Lead author Yao Yang and colleagues note in the paper [“Quantitative and multiplexed DNA methylation analysis using long-read single-molecule real-time bisulfite sequencing (SMRT-BS)”] that existing methods for methylation analysis are limited by cost and throughput in the case of Sanger sequencing, or short read lengths with NGS technologies. Their goal was to develop a method combining long reads, high accuracy, and high throughput.

Thursday, June 11, 2015

Updated! Data Release: Human MCF-7 Transcriptome


Our R&D team has added a new dataset for the MCF-7 human breast cancer transcriptome, originally released in 2013. The new results were produced using 28 SMRT® Cells with 4-hour movies and P5-C3 chemistry. Sizing was performed with the SageELF™ platform (fractions collected: 1-2 kb, 2-3 kb, 3-5 kb, and 5-10 kb). Sequencing of the larger fractions with our newer sequencing chemistry that generates longer reads added longer transcripts (up to 10 kb) to the MCF-7 dataset, which previously had only transcripts up to 4 kb.

New FASTA and GFF files are available, representing the new combined dataset. Raw data for both the 2013 and 2015 sequencing is also available.

Tuesday, June 9, 2015

Attend Our Worldwide User Meetings & SMRT Informatics Developers Conference

If you’d like to hear about the latest applications of SMRT® Sequencing from users, we have several events coming up. Our worldwide user group meetings and workshops feature PacBio users sharing their latest research, tips, and protocols, as well as our staff providing training and updates on products and methods to optimize your research. We’re always humbled by the quality and variety of science presented at these meetings. And for the bioinformatics crowd, we have a new event in August focused on developing new analytical tools for PacBio® data.

Tuesday, June 2, 2015

In Assembler Evaluation, Scientists Recommend Non-hybrid Approach to Bacterial Genomes

A new publication in Nature Scientific Reports recommends using only the PacBio® system to sequence bacterial genomes for the best chance of generating an accurate and finished assembly.

The paper, “Completing bacterial genome assemblies: strategy and performance comparisons,” reviews several different long-read assembly methods for bacterial genomes. Authors Yu-Chieh Liao, Shu-Hung Lin, and Hsin-Hung Lin from the Institute of Population Health Sciences in Taiwan note that while several methods exist, efforts to evaluate and compare them have been insufficient. They set out to thoroughly assess these methods, which include hybrid assembly protocols as well as long-read-only protocols.

Thursday, May 28, 2015

Microbial Madness: Talks, Posters, and SMRTest Microbe Grant Program at ASM 2015

For some people, Mardi Gras and beignets are big attractions in New Orleans — but for us it’s all about the annual conference of the American Society for Microbiology. With more than 8,000 attendees, ASM 2015 will take place at the Ernest N. Morial Convention Center from May 30 to June 2 and will feature some of the leading scientists in the field.

At 2:30 p.m. on Sunday, May 31, in La Nouvelle Orleans Ballroom B, Jing Li from Tsinghua University will discuss the detection and importance of three DNA methylation motifs in the genome of Streptococcus pneumonia. Shortly after, Julie Segre from NIH/NHGRI will discuss the use of SMRT® Sequencing as part of an effort to track the spread of carbapenem-resistant Enterobacteriaceae at the NIH Clinical Center. Long reads from the PacBio® system were instrumental in determining the sequence of events in a recent outbreak. Her presentation will take place in room 339, at 3:00 p.m.

Tuesday, May 26, 2015

New MHAP Algorithm Delivers Fast, High-Quality Genome Assemblies

A new publication in Nature Biotechnology reports the development of a lightning-fast genome assembly pipeline optimized for long reads. Scientists from the University of Maryland and the National Biodefense Analysis and Countermeasures Center created the MinHash Alignment Process, known as MHAP, to dramatically reduce assembly time and improve assembly quality. Their results are worth celebrating: assembly times were 600-fold faster compared to existing methods. “Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes,” the authors write. In the best cases, entire chromosome arms assembled into single-pieces from telomere to centromere!

MHAP takes a probabilistic approach to overlap-based assembly of long reads. MinHash represents longer text or a string of information as a set of fingerprints, allowing the assembly process to occur with more compact data that’s less computationally intensive. The authors’ MHAP overlapping method has been integrated into Celera Assembler for the assembly of gigabase-sized genomes, and is reported in their new paper “Assembling Large Genomes with Single-Molecule Sequencing and Locality-Sensitive Hashing.”

Wednesday, May 6, 2015

Tutorial on the Iso-Seq Method: Applications, Protocol, and Experimental Design

If you missed our recent webinar on isoform sequencing with the PacBio® platform, we’ve made the full recording available for on-demand access. “Iso-Seq™ Method: Sample Prep and Experimental Design for Full-Length cDNA Sequencing” offers an overview of the application, along with specific sample prep tips, factors to consider when designing an experiment, and suggestions about what kinds of projects can take advantage of this method.

Hosted by our own Tyson Clark, the webinar begins with a look at why it’s important to capture full-length transcripts. There are known human genes that have very different functions depending on which splice variant is expressed. With alternative splicing so critical to genome function — Clark noted one Drosophila gene that can make more than 38,000 isoforms — scientists who miss the full transcript aren’t seeing the full picture of gene activity. Single Molecule, Real-Time (SMRT®) Sequencing is the only technology that enables complete views of these isoforms, from poly-A tails to 5’ ends, without assembly.

Monday, May 4, 2015

PAG Grant Winner: Rainforest Tree Homalanthus nutans to get the SMRT Treatment

We’re pleased to announce the winner of our recent “Most Interesting Genome in the World” grant competition. Congratulations to Jay Keasling and Jeff Wong at the University of California, Berkeley! The grant program, which was supported by co-sponsors Sage Science, Computomics, and the Arizona Genomics Institute, was very competitive with more than 250 submitted proposals.

Keasling and Wong will be awarded SMRT® Sequencing — using up to 40 SMRT Cells with BluePippin™ DNA size selection — for Homalanthus nutans, a small rainforest tree that grows in Samoa. The plant is critical as the source of a natural product called prostratin, which is under development as an anti-HIV therapy. The genome size is estimated at 400 Mb.