PacBio Blog

Tuesday, October 21, 2014

Data Release: Whole Human Transcriptome from Brain, Heart, and Liver

In higher eukaryotic organisms, like humans, RNA transcripts from the vast majority of genes are alternatively spliced. Alternative splicing dramatically increases the protein-coding potential of eukaryotic genomes and its regulation is often specific to a given tissue or developmental stage.

Using our updated Iso-Seq™ sample preparation protocol, we have generated a dataset containing the full-length whole transcriptome from three diverse human tissues (brain, heart, and liver). The updated version of the Iso-Seq method incorporates the use of a new PCR polymerase that improves the representation of larger transcripts, enabling sequencing of cDNAs of nearly 10 kb in length. The inclusion of multiple sample types makes this dataset ideal for exploring differential alternative splicing events. Download the polished, full-length transcript sequences and the raw data files.

We have also uploaded the polished, full-length, non-redundant transcript set onto the UCSC Genome Browser to enable browsing of the data. An example of the data is shown below in Figure 2. Access the genome browser tracks.


Materials and Methods

Human mRNA samples were purchased from Biochain. Amplified double-stranded cDNA was generated as outlined in the updated version of the Iso-Seq target preparation protocol. The distribution of cDNA sizes was determined by running the amplified cDNA on a Bioanalyzer (Figure 1). The amplified cDNA was size fractionated on a BluePippin™ system (Sage Science). Size fractions of 1-2 kb, 2-3 kb, 3-6 kb, and 5-10 kb were collected, purified, and subjected to an additional round of PCR amplification. The 5-10 kb fraction was not collected for the liver sample because there were insufficient transcripts of ≥ 5 kb (Figure 1).

SMRTbell™ libraries were generated from each size fraction and sequenced independently. SMRTbell templates from the 3-6 kb and 5-10 kb fractions were cleaned up using the BluePippin system. The libraries were sequenced with a combination of P4-C2 and P5-C3 sequencing chemistries on a PacBio RS II with 2- or 3-hour movies. After sequencing, the data was processed using the "RS_IsoSeq.1" analysis pipeline with SMRT® Portal version 2.2. A summary of the sequencing results is shown in Table 1.

Table 1:  Summary of Sequencing Results
General sequencing statistics for each sample/size bin combination are shown along with statistics for each tissue.



Figure 1:  Size Distribution of Amplified cDNA
Approximately 50 ng of amplified cDNA from each tissue was run on an Agilent Bioanalzyer DNA 7500 kit. The plot shows the electropherogram with each tissue color coded.



Figure 2:  UCSC Genome Browser Screen Shot
Example data for the DCTN1 gene as shown in the UCSC Genome Browser. Polished, non-redundant, full-length transcript sequences are shown for each tissue.




Monday, October 20, 2014

SMRT Sequencing for the HLA Complex: PacBio Goes to ASHI

This week marks the 40th annual meeting of the American Society for Histocompatibility and Immunogenetics, better known in the community as ASHI. The PacBio® team is looking forward to attending; after all, several organizations are now using Single Molecule, Real-Time (SMRT®) Sequencing specifically for resolving the incredibly complex genetic regions related to histocompatibility.

Earlier this year, we announced that two leading HLA typing institutions had adopted SMRT Sequencing to untangle this highly polymorphic set of genes: Anthony Nolan, a UK-based blood cancer charity that started the world’s first bone marrow registry, and HistoGenetics, a pioneer that has used sequence-based typing to characterize HLA regions in more than 14 million samples. We’re pleased to report that scientists from both organizations will be giving presentations at our ASHI workshop, Advances in Fully Phased HLA & KIR Typing Using SMRT® Sequencing.

The workshop will be held Wednesday, October 22, from noon to 2:00 p.m. Click here to register for the event or to request a recording if you won’t be at the Denver conference. Here are the speakers and topics:

The Challenge of HLA Diversity in 2014
Prof. Steve Marsh, Anthony Nolan Research Institute & University College London

Clinical HLA Typing on PacBio Platform
Nezih Cereb, M.D., President & CEO, Histogenetics

KIR Haplotypes: The Long and Short of It
Martin Maiers, Ph.D., Director, Bioinformatics Research, National Bone Marrow Donor Program

There are also a number of presentations and posters during the ASHI conference that will highlight the utility of SMRT Sequencing for characterizing the HLA complex, and we encourage attendees to stop by booth #307 to learn more.

Podium presentations:

Tuesday, October 21
Session: New & Improved NGS
2:00 p.m. - 3:30 p.m.
OR01: Automated Assembly of Complex Immunogenetic Haplotypes Using Long-Read, Single Molecule, Real-Time Sequencing of Fosmids
Richard J. Hall, Ph.D., Pacific Biosciences, Menlo Park, CA

OR05: Complete Resequencing of Extended Genomic Regions Using Fosmid Target Capture and Single Molecule Real-Time (SMRT®) Long Read Sequencing Technology
Daniel Geraghty, Ph.D., Fred Hutchinson Cancer Research Center, Seattle, WA

Thursday, October 23
Session: Scholar Awards
2:00 p.m. - 3:30 p.m.
OR59: Generation of 252 HLA Class I Genomic Sequences in a Single Sequencing Reaction Using DNA Barcodes and Single Molecule Real-Time (SMRT) DNA Sequencing Technology
Dr. Neema P. Mayor, Anthony Nolan, London, United Kingdom; UCL Cancer Institute, London, United Kingdom

Posters:

LBP04: Application of Single Molecule Real-Time (SMRT) Sequencing Technology for the Field 4 Level Genotyping of Classical HLA Loci

LBP07: Evaluation of Multiplexing Strategies for HLA Genotyping Using PacBio Sequencing Technologies

Please note: the PacBio RS II system is intended for Research Use Only and not for use in diagnostic procedures.

Wednesday, October 15, 2014

New Chemistry Boosts Average Read Length to 10 kb – 15 kb for PacBio® RS II

We are pleased to announce the launch of our new reagent kit, P6-C4, which represents the next generation of our polymerase as well as our chemistry. This kit replaces the P5-C3 chemistry and is recommended for all SMRT® Sequencing applications, including de novo assembly, targeted sequencing, isoform sequencing, minor variant detection, scaffolding, long-repeat spanning, SNP phasing, and structural variant analysis.

P6-C4 continues the steady read length improvement our users have seen since the instrument first launched. With this new chemistry, average read lengths increase to 10 kb - 15 kb, with half of all data in reads 14 kb or longer. The throughput is expected to be between 500 million to 1 billion bases per SMRT Cell, depending on the sample being sequenced. By providing more throughput per instrument run, the chemistry enables users to sequence larger genomes and observe previously undetected structural variants, highly repetitive regions, and distant genetic elements.


Friday, October 10, 2014

ASHG 2014: A New Look at the Human Genome with Long-Read Sequencing

Scientists around the world are getting ready for the annual meeting of the American Society of Human Genetics taking place October 18-22 at the San Diego Convention Center. We’re looking forward to a number of excellent presentations and posters, and are delighted to see that many of them will focus on applying Single Molecule, Real-Time (SMRT®) Sequencing to human studies.

If you’ll be among those attending ASHG, be sure to attend our workshop, A New Look at the Human Genome – Novel Insights with Long-Read PacBio Sequencing, taking place 12:30 – 2:00 p.m. on Tuesday, October 21. Register in advance to reserve your seat or to receive the recording following the event. Our CSO, Jonas Korlach, will host the workshop, which includes:

* Increased Complexity of the Human Genome Revealed by Single-Molecule Sequencing
Evan Eichler, University of Washington 

* Defining a Personal, Allele-Specific, and Single-Molecule Long-Read Transcriptome
Hagen Tilgner, Stanford University

* Long-Read Multiplexed Amplicon Sequencing: Applications for Epigenetics and Pharmacogenetics
Stuart Scott, Icahn School of Medicine at Mount Sinai


Thursday, October 9, 2014

New Brain Study Reveals Higher Molecular Diversity from Alternative Splicing

A new paper from scientists in Switzerland and the US adds to recent findings about diversity of neuronal transcripts in the mammalian brain. The authors report that this study was only possible using long reads from Single Molecule, Real-Time (SMRT®) Sequencing.

Targeted Combinatorial Alternative Splicing Generates Brain Region-Specific Repertoires of Neurexins,” from lead author Dietmar Schreiner, senior author Peter Scheiffele, and collaborators, was published this month in the journal Neuron. The researchers are from the University of Basel, ETH Zurich, and North Carolina State University. This is the second study on neurexin mRNA diversity using PacBio® sequencing.


Monday, October 6, 2014

'The Quality of PacBio Data Is Beyond Compare': Eric Schadt on Applications of SMRT Sequencing to Human Genetics

As part of its continuing series on long-read sequencing, last week Mendelspod aired an engaging interview with Eric Schadt, Professor & Chair of Genetics and Genomic Sciences, and Director of the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai.

Having now spent three years in his role at the groundbreaking institute, he reports that they are making great progress in the quest to build better data-driven health profiles around individuals that may better guide healthcare choices.

On short-read versus long-read sequencing
Short-read sequencing technologies still maintain the advantage in terms of throughput, says Schadt, but there are a variety of important genomic features that cannot be characterized without long-read sequencing, such as long tandem repeats, bigger structural variations, and focal variants important in cancer.


Thursday, October 2, 2014

‘We’re Going to Find the Keys’: Dan Geraghty Discusses an Approach to Understanding Causal Genetic Variation

Dan Geraghty, a researcher at Fred Hutchinson Cancer Research Center and CEO of Scisco Genetics, has spent much of his career focused on the genetics of immune response. Recently he talked to Mendelspod host Theral Timpson as part of a continuing series of podcasts on the rise of long-read sequencing.

Geraghty explained that while there have been decades’ worth of studies associating the genetics of the major histocompatibility complex (MHC), and the highly polymorphic HLA class 1 and 2 genes, we still haven’t found the key mutations for a variety of different autoimmune diseases such as type 1 diabetes, rheumatoid arthritis, multiple sclerosis, and others.

Enormous amounts of linkage disequilibrium in these regions are one factor, as is getting information in phase, so larger stretches of sequence are needed. Recently Geraghty has begun using Single Molecule, Real-Time (SMRT®) Technology with hopes of drilling down to the causal genetics.


Tuesday, September 30, 2014

New Papers Detail Complexity of Methylome-Related Virulence in Human Pathogens

In two new publications, one published today, scientists from Australia, Italy, the UK, and the US report critical and surprising new findings about DNA methylation-related complexity of bacteria. Adding to the list of advances from genome-wide epigenetic analysis, these projects enhance our understanding of how methylation systems work in human pathogens — and offer important clues for future investigations into how to treat them.

Today’s paper, “A random six-phase switch regulates pneumococcal virulence via global epigenetic changes,” was published in Nature Communications by scientists at the University of Leicester, University of Siena, University of Adelaide, and Griffith University. Senior authors Marco Oggioni and Michael Jennings and their collaborators studied Streptococcus pneumoniae, a bacterium responsible for serious infectious diseases including pneumonia, to figure out how the organism shifts between relatively benign and highly pathogenic phases.


Tuesday, September 23, 2014

Science Perspective: “Tracking Antibiotic Resistance”

In the current issue of Science there is an interesting Perspective by Scott Beatson and Mark Walker of the University of Queensland discussing research published this week in Science Translational Medicine by Conlan et al. who used SMRT® Sequencing to track plasmid diversity of hospital-associated infectious bacteria at the NIH Clinical Center.

The article provides a nice overview of the paper, including an explanation of the important role that plasmids play in spreading antibiotic resistance. They illustrate why short-read DNA sequencing technologies are insufficient in resolving them and long reads are necessary for this work.

“Plasmids may be viewed as the ‘dark matter’ of short-read bacterial genome assemblies, with many large-scale genomic studies conspicuously avoiding the complexities of plasmid structure. Genomic comparisons such as that described by Conlan et al. reveal how the dynamism in the structure and arrangement of resistance elements can only be realized by ‘closing’ plasmid genomes with long-read sequencing,” they write.


Monday, September 22, 2014

Maryland Scientists Produce High-Quality, Cost-Effective Genome Assembly of Loa loa Roundworm Using SMRT Sequencing

A paper just released in BMC Genomics details what authors call “the most complete filarial
nematode assembly published thus far at a fraction of the cost of previous efforts.” The project was performed using the PacBio® RS II DNA Sequencing System by scientists at the University of Maryland School of Medicine’s Institute for Genome Sciences and the Laboratory of Parasitic Diseases at the National Institute of Allergy and Infectious Diseases.

In this genome sequencing effort, scientists generated a de novo assembly of Loa loa, a roundworm that infects humans. L. loa, transmitted to humans by deer flies, causes loiasis. The parasite lives under the skin and can grow to several centimeters without being detected.