Author: Todd Lowe
tRNA detection in large-scale genome sequence.
tRNAscan-SE detects ~99% of eukaryotic nuclear or prokaryotic tRNA genes, with a false positive rate of less than one per 15 gigabases, and with a search speed of about 30 kb/second. It was implemented for large-scale human genome sequence analysis, but is applicable to other DNAs as well. It applies our COVE software (see below) with a carefully built tRNA covariance model, while getting around COVE's speed limitations by using two tRNA finding programs from other research groups as fast first-pass scanners (Fichant and Burks', and an implementation of an algorithm from A. Pavesi's group). It runs on any UNIX system with Perl and a C compiler installed.
Author: Elena Rivas
RNA Structural Covariation Above Phylogenetic Expectation: Analysis of covariation support for RNA conserved secondary structure in a multiple sequence alignment.
Author: Zhirong Bao
Automated identification of repeat sequence families in genome sequences.
Author: Elena Rivas
A software tool for prototyping single-sequence RNA secondary structure prediction models. Tornado implements a "super-grammar" that includes the standard thermodynamic model as a special case. It can be used to build simpler or more complex models with fewer or more parameters, and it can be used to compare thermodynamic, probabilistic, and discriminative parameterization approaches. This is the maintained (up-to-date) version of the software that accompanied Elena's paper "A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more."
Author: Robin Dowell
Software for the exploration of lightweight stochastic context free grammars
This is the code accompanying Robin Dowell's paper "Evaluation of Several Lightweight Stochastic Context-Free Grammars for RNA Secondary Structure Prediction", BMC Bioinformatics 5:71, 2004. It implements several small SCFGs for single-sequence RNA secondary structure prediction.
Author: Elena Rivas
A prototype noncoding RNA genefinder, based on comparative genome sequence analysis.
This is the code from Elena Rivas that accompanies the paper Noncoding RNA gene detection using comparative sequence analysis. QRNA uses comparative genome sequence analysis to detect conserved RNA secondary structures, including both ncRNA genes and cis-regulatory RNA structures.
Author: Sean Eddy
Fast Pattern searching for RNA secondary structures.
RNABOB is an implementation of D. Gautheret's RNAMOT, but with a different underlying algorithm using a nondeterministic finite state machine with node rewriting rules. (Computer scientists would probably cringe in horror. It works, and it's fast, but is it street legal in a computer science department? Who knows.) An RNABOB motif is a consensus pattern a la PROSITE patterns, but with base-pairing.
Author: Elena Rivas
Experimental code demonstrating a dynamic programming algorithm for RNA pseudoknot prediction.
This is experimental code from Elena Rivas, demonstrating a dynamic programming algorithm for globally optimal RNA pseudoknot prediction. The algorithm is discussed in the paper A dynamic programming algorithm for RNA structure prediction using pseudoknots.
Author: Elena Rivas
Maximum likelihood phylogenetic inference, including insertions/deletions.
erate is an extension of Joe Felsenstein's DNAML program which treats insertions and deletions as evolutionary events, rather than ignoring them as missing data (which is what the most widely used phylogenetic inference programs all do). This is the software that accompanied Elena's paper "Probabilistic Phylogenetic Inference with Insertions and Deletions."
Author: Christian Zmasek
A visualization tool for large phylogenetic trees.
Author: Robbie Klein
Sequence database searching with RNA structure queries.
RSEARCH aligns an RNA query to target sequences, using SCFG algorithms to score both secondary structure and primary sequence alignment simultaneously. It's slow, but somewhat more capable of finding significant remote RNA structure homologies than sequence alignment methods like BLAST. (By slow, we mean, you really need a substantial computing cluster to do any serious work with it; a typical single search of a metazoan genome may take a few thousand CPU hours).
Author: Robin Dowell
Pairwise structural RNA alignment
This is the code accompanying Robin Dowell's paper "Efficient Pairwise RNA Structure Prediction and Alignment Using Sequence Alignment Constraints" , BMC Bioinformatics, 7:400, 2006. It implements a pinned Sankoff algorithm for simultaneous pairwise RNA alignment and consensus structure prediction.
Author: Christian Zmasek
Inference of orthology and paralogy relationships in gene trees.
Author: Sean Eddy
A C library that is bundled with much of the above software. C function library for sequence analysis.
SQUID is my own personal library of C functions and utility programs for sequence analysis. I don't really suggest that you use it in your programs, as I change it at will. However, it does contains some small utility programs that some people have found useful in scripts that drive large HMMER tasks.
Author: Elena Rivas
Experimental code for a structural RNA genefinder: it doesn't actually
work well, because it turns out that structural RNAs don't have much
more secondary structure content than random sequence.
This is
the code from Elena Rivas that goes with the paper Secondary
structure alone is generally not statistically significant for the
detection of noncoding RNAs by Elena Rivas and Sean
Eddy. As the title indicates, the genefinder doesn't work (though we
still think the algorithm is cool), because real RNAs don't generally
have any more secondary structure content than random sequence,
contrary to what we expected. The code will only be of interest to
people trying to reproduce our negative results, or trying to
understand the genome-scanning SCFG alignment algorithm that it
implements.
Author: Robin Dowell
Robin Dowell's prototype of a Perl/Tk application for viewing profile HMMs created by HMMER, including the Pfam database.
Author: Todd Lowe
Identifies 2'-O-methylation guide snoRNAs in yeast (and possibly other) genome sequences, using a combination of snoRNA sequence/structure consensus and guide complementarity to a putative target rRNA site. See Lowe & Eddy, "A Computational Screen for Methylation Guide snoRNAs In Yeast", Science 283: 1168-1171, 1999.
Author: Sean Eddy
Covariance models of RNA secondary structure (old version).
COVE is an implementation of stochastic context free grammar methods for RNA sequence/structure analysis. COVE is still experimental and not as well supported as I would like. It is an extremely sensitive tool for database searching for homologous RNAs, if you have an alignment of an RNA family. It requires hefty CPU resources to run properly.
Author: Michael Farrar
Striped SIMD vectorized Smith/Waterman
This is Michael Farrar's 2006 source code accompanying his paper "Striped Smith-Waterman speeds database searches six times over other SIMD implementations" (Bioinformatics, 2007). Michael died in 2010 while working as a senior software engineer in my laboratory. His 2006 code, developed on his personal time before he joined my group, had only been made available under a non-open-source license. With permission of his wife, who inherited his copyrighted work, I have relicensed his code and released this copy as open source under a BSD license.