PacBio Blog

Tuesday, July 29, 2014

Novel Study of Genome-wide PT Modifications in Bacteria Performed with SMRT Sequencing

A recent paper from scientists in China and the United States demonstrates a novel view of phosphorothioate (PT) DNA modifications in two bacterial genomes. Scientists from Shanghai Jiao Tong University, Massachusetts Institute of Technology, Wuhan University, and Pacific Biosciences teamed up to deploy Single Molecule, Real-Time (SMRT®) Sequencing to generate the first genome-wide view of PT modifications and to better understand their function. “Genomic mapping of phosphorothioates reveals partial modification of short consensus sequences” by Cao et al. was published in Nature Communications.

The authors note that PT modifications, which replace a non-bridging phosphate oxygen with sulphur, were only recently discovered to occur naturally in bacteria. (PT modifications are used by scientists to stabilize synthetic DNA molecules against nuclease degradation.) Today, these modifications have been seen in more than 200 bacteria and archaea, but the detailed genome-wide distribution and biological functions have not been clear.

To look at these events across whole genomes, the scientists used SMRT Sequencing, which can distinguish PT modifications as the polymerase is sequencing DNA. They studied Escherichia coli B7A, which uses the DndF-H proteins known to be associated with PT modifications, as well as Vibrio cyclitrophicus FF75, which lacks those proteins. The PacBio® RS II was used to fully sequence each genome and to assess PT modifications across the genomes.

The scientists found that in E. coli, PT modifications occur on both strands of a particular motif, but only 12 percent of possible motif sites were modified. In V. cyclitrophicus, the modifications are seen only on one DNA strand at CpsCA sequence contexts, but still in just 14 percent of possible sites. The authors also described an iodine-cleavage method in conjunction with Illumina® sequencing which was used to cross-validate the findings; however, that method requires both DNA strands to be modified so was only applied to the E. coli case. “The results raise questions about how Dnd modification proteins (DndA-E) select their DNA targets,” the authors write. “Emerging evidence suggests that DndD is a DNA nicking enzyme and that DndE binds selectively to nicked DNA, with both activities critical to incorporation of PT into the DNA backbone.”

The partial modification seen in both bacteria suggests that overexpression of DndA-E proteins could increase the levels of PT modifications, according to the paper. “These results point to a novel [restriction-modification] system involving site-specific PT modifications without a predictable consensus beyond four nucleotides and with partial modification of sites in the presence of a restriction activity,” the scientists report.

“Such consistency for two bacteria in which PT has very different functions points to a conserved mechanism of DNA target selection by the DNA-modifying DndA-E proteins, a mechanism that we have shown likely involves direct interaction of the modifying proteins with the consensus sequence,” the authors conclude.

Tuesday, July 22, 2014

At ISMB, Gene Myers’ Keynote Offers History, Future of Genome Assembly

At ISMB 2014 in Boston earlier this month, Gene Myers of the Max-Planck Institute for Molecular Cell Biology and Genetics, presented a keynote address entitled “DNA Assembly: Past, Present, and Future.”  Myers received the prestigious Senior Scientist Accomplishment Award from the International Society for Computational Biology (ISCB) at the event.

The ISCB Senior Scientist Accomplishment Award honors respected leaders in computational biology and bioinformatics for their significant contributions to these fields through research, education, and service. Myers is being honored as the 2014 winner for his outstanding contributions to the bioinformatics community, particularly for his work on sequence comparison algorithms, whole-genome shotgun sequencing methods, and for his recent endeavors in developing software and microscopic devices for bioimage informatics. 

Friday, July 11, 2014

ISMB 2014: The World Cup of Bioinformatics

We’re eager for the #ISMB conference — it’s the 22nd annual Intelligent Systems for Molecular Biology event — kicking off this weekend in Boston. As we continue to push our technology to deliver longer read lengths, we have been honored to work with many leading bioinformaticians to optimize the processing and analysis of our data.

Several of those experts will be speaking at ISMB this year. On Sunday, attendees will hear from Adam Phillippy of the National Biodefense Analysis and Countermeasures Center. He’ll be presenting at noon on producing complete genome assemblies using Single Molecule, Real-Time (SMRT®) Sequencing data. Adam’s team recently developed a new assembler called MHAP that dramatically reduces CPU power needed for building assemblies, so we are eager to hear more.

Wednesday, July 9, 2014

Optimizing Eukaryotic De Novo Genome Assembly: Webinar Recording Available
Our webinar on eukaryotic genome assembly attracted a great crowd, and now we’re making the full recording available to the community. The session featured great hands-on information and best practices for working with Single Molecule, Real-Time (SMRT®) Sequencing data. “Optimizing Eukaryotic Genome Assembly with Long-Read Sequencing” featured three excellent speakers — Michael Schatz and James Gurtowski from Cold Spring Harbor Laboratory and Sergey Koren from the National Biodefense Analysis and Countermeasures Center — and was hosted by our own CSO Jonas Korlach.

Schatz kicked off the session with an overview of assemblers for PacBio® data (as well as recommendations for when to use each one) and a look at the challenges of short-read assemblies. He also set expectations around long-read data, noting that for genomes less than 100 Mb, users should expect a nearly perfect assembly from the automated workflow. Genomes up to 1 Gb should be represented in a high-quality assembly with a contig N50 of at least 1 Mb. Genomes larger than that will have shorter contig N50 stats and will require larger computational power, he added.

Tuesday, July 1, 2014

Scientists Generate the First Personal Transcriptome Using SMRT Sequencing

A new paper from scientists at Stanford University and Yale University describes the use of Single Molecule, Real-Time (SMRT®) Sequencing to generate transcriptomes for three individuals. The work is believed to be the first personal transcriptome analysis using long-read sequencing.

The paper, entitled “Defining a personal, allele-specific, and single-molecule long-read transcriptome,” was published in PNAS by Hagen Tilgner, Fabian Grubert, Donald Sharon, and Michael Snyder. Last year, the same authors published a study using SMRT Sequencing to analyze transcriptomes across tissue samples from human organs. In the PNAS publication, they compare metrics from the new data set to those from the previous study.

Friday, June 27, 2014

At SFAF 2014, Great Science and High-Quality Genomes

It’s been a busy start to the summer, but we’re still basking in the top-notch presentations and posters from the Sequencing, Finishing, and Analysis in the Future meeting last month. Hosted by Los Alamos National Laboratory in Santa Fe, this has become a premier event for scientists working on sequencing protocols, analysis, and assembly methods.

Many speakers presented data including reads from Single Molecule, Real-Time (SMRT®) Sequencing. Jeff Rogers from Baylor College of Medicine used long PacBio® reads with the PBJelly algorithm to fill gaps in many mammalian genomes, including sheep, rat, baboon, sooty mangabey, and mouse lemur. Tina Graves-Lindsay from Washington University reported work on improving the reference human genome through BAC sequencing and the use of a haploid human data set, which included PacBio’s CHM1TERT data release. James Gurtowski from Cold Spring Harbor Labs detailed improvements to genome assemblies of yeast, Arabidopsis, and various rice strains using his new algorithms, ECTools.

Monday, June 23, 2014

Unprecedented Read Length at the Icahn Institute:
Precise Sizing + SMRT Sequencing

At the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai in New York City, technology development expert Robert Sebra, Ph.D., sees tremendous need for long-read, high-accuracy sequencing for use in microbial surveillance, detection of repeat expansions, and other research applications. To meet that demand, he relies on Single Molecule, Real-Time (SMRT®) Sequencing from Pacific Biosciences with BluePippin™ automated DNA size selection from Sage Science. Together, these tools offer a powerful solution and industry-leading read lengths that allow Sebra and other researchers to resolve repeat elements and structural variants, rapidly close microbial genomes, and measure epigenetic marks.

Sebra, an assistant professor of genetic and genomic sciences, is no stranger to the SMRT Sequencing platform: he spent five years working at PacBio helping to develop that technology. Ultimately, his belief in the system led him to join the Icahn Institute, where he would get to use the PacBio® sequencer as a customer. Sebra, who came to Mount Sinai in 2012, says, “I had experienced firsthand the value of long-read sequencing and wanted to apply it to human and infectious disease research.”

Monday, June 2, 2014

Intro to the Iso-Seq Method: Full-length transcript sequencing

With the recent launch of SMRT Analysis v2.2, we’re excited to introduce analysis software support for the new Iso-Seq™ method for sequencing full-length transcripts and gene isoforms, with no assembly required! Today we’ll take a deeper look at the Iso-Seq method to explain its unique scientific value and review publications from those already applying Single Molecule, Real-Time (SMRT®) Sequencing to this exciting area of research.

In plant and animal genomes, along with all higher eukaryotic organisms, the majority of genes are alternatively spliced to produce multiple transcript isoforms. In humans, for example, there is evidence for alternative splicing of more than 95% of genes [1], with an average of more than five isoforms per gene.  Gene regulation through alternative splicing can dramatically increase the protein-coding potential of a genome that contains a limited number of genes that encode proteins. Somewhat surprisingly, alternatively spliced isoforms from a single gene can also have very different, even antagonistic, functions [2]. Therefore, understanding the functional biology of a genome requires knowing the full complement of isoforms. Microarrays and high-throughput cDNA sequencing have become incredibly useful tools for studying transcriptomes, yet these technologies provide small snippets of transcripts and building complete transcripts to study gene isoforms has been challenging.

Thursday, May 29, 2014

Research Studies Use Sequencing to Track Path of Infection Outbreaks

A talk at last week’s ASM conference continued the recent trend of scientists using Single Molecule, Real-Time (SMRT®) Sequencing in research projects designed to better understand the transmission path of hospital-acquired infections.

The presentation, entitled “Tracking Hospital Patients and Environment with Complete Genome Sequencing of Carbapenem-Resistant Klebsiella pneumoniae and other Enterobacteriaceae,” came from Julie Segre, a chief investigator at the National Human Genome Research Institute.

Wednesday, May 28, 2014

The Sequence Analysis Meeting: SFAF 2014

The Sequencing, Finishing, and Analysis in the Future (SFAF) meeting kicks off today in Santa Fe, New Mexico. The conference is hosted by Los Alamos National Laboratory and focuses on the analytical details that are so important as the community assesses how to get the most out of all this sequence data.

This year, we will have two PacBio speakers, and there will be a number of other talks from users of our long-read sequence data. Steve Turner, our CTO, will speak on Wednesday morning about the use of Single Molecule, Real-Time (SMRT®) Sequencing for generating highly contiguous genome assemblies as well as for transcriptome analysis that can resolve complete isoforms. Steve will look at how chemistry and other improvements can push PacBio’s average consensus accuracy to ~Q50 and N50 data to greater than 10,000 base pairs.