"Efficient pairwise RNA structure prediction and alignment using 
sequence alignment constraints" BMC Bioinformatics 7:41 2006.
(Notes summary on how CONUS was utilized for the paper)
----------------------------------------------------------------
Assumptions:
CONSAN code has been compiled and is in a directory called "consan/".

The datasets are in a directory called "data/" with the names:
 *) mix80.stk 		LSU/SSU dataset for training
 *) Rfam.v7.pub80.stk	Rfam dataset for training comparison
 *) R100.stk		Random 100 tRNA and 100 5S RNA sequences
 *) R100.pairs.stk	Above set with an "informant" sequence (ie. pairs)

 3) percid.stk		5S + tRNA dataset, percent identity balanced
 4) stemloc.stk		Dataset for stemloc comparison (Holmes ref)

Brief descriptions and basic usage:
 * strain_ml (training; generates a model parameter file e.g. model.mod)
   > strain_ml -s model.mod trainingset.stk
 * sfold (structural alignment of two sequences)
   > sfold -m model.mod seq1.fa seq2.fa
 * scompare (structural alignment of two sequences with comparison to a
               given pairwise alignment; essentially a wrapper around sfold)
   > scompare -m model.mod alignment.stk

Because the paper presents almost entirely comparative results, the 
majority of this work is presented using scompare.  However, if you just 
have two sequences, you will instead prefer to use sfold.  For all three
programs, running with the -h option will briefly describe available options.

===========================================================================
			**** WARNING: ****
===========================================================================
The Consan algorithm is inherently resource intensive.  Runtime scales
as N^6 in the length of the sequences being compared and memory scales
as N^4 in the length of the sequences being compared.  Currently, Consan
assumes it will be used responsibly and therefore does only minimal 
checking before proceeding with memory allocations.  It can and will 
attempt to utilize more resources than your machine has available.  This
is a very bad idea and is strongly advised against.
===========================================================================

Training:
---------
consan/mltrain -s mixed80.mod data/mixed80.stk

This produces a binary output file called mixed80.mod which defines the 
parameters as trained on mixed80.stk.

Training Set Comparison:
------------------------
First train the grammar based on the Rfam published dataset:
consan/mltrain -s rfam7.mod data/Rfam.v7.pub80.stk

Then generate parameter comparison table as follows:
consan/src/utilities/cModel mix80.mod rfam7.mod

Bootstrapping (in general):
---------------------------
The scompare (Consan) and conus_compare (Conus) routines produce estimates
of the basepair sensitivity, basepair PPV, and alignment accuracy (for scompare 
only).  These numbers should be considered as ESTIMATES only.   

For complete comparison statistics (as presented in the paper):

*** For single sequence:
Reformat output to "wuss" structure annotation (necessary for compstruct):
consan/src/squid/sreformat --wuss stockholm reference.stk > reference.wuss.stk
consan/src/squid/sreformat --wuss stockholm prediction.stk > prediction.wuss.stk

Compare reference structures to prediction:
consan/src/squid/compstruct reference.wuss.stk prediction.wuss.stk > prediction.compstruct

Reformat to bootstrap input:
consan/src/boot/bsinput.pl prediction.compstruct > prediction.bsinput

*** Pairwise sequence:
[Reformatting to wuss is not necessary.]
Compare reference alignments to prediction:
consan/src/utilities/comppair reference.stk prediction.stk > prediction.comppair

Reformat to bootstrap input:
consan/src/boot/bsinputTOT.pl prediction.comppair > prediction.bsinput

*** Bootstrap: (both single sequence and pairwise comparisons)
consan/src/boot/bstats prediction.bsinput

The default bootstrap settings (used in the paper) are 10000 samples.  For 
reference, alternative samplings can be set using the -i option.  For more 
information use the -h option.

[***Note that the alignment numbers will be random and irrelvant when 
bootstrapping single sequence folding.***]

Comparison to Single Sequence Folding (Table 1):
------------------------------------------------
Train the single sequence grammar Gs:
consan/src/conus-1.1/conus_train -g RY2 -s Gs.mod data/mix80.stk

Run the R100 dataset using Gs:
consan/src/conus-1.1/conus_compare -m Gs.mod -N data/R100.stk

[Note that the above utilizes a slightly modified version of Conus. In 
Conus's grammar encoding, the Gs grammar is known as RY2.  The -N is
required to turn off stacking parameterization (making it comparable
to the pairSCFG).  For more on Conus, see 
http://selab.wustl.edu/people/robin/conus/]

Run pairSCFG on the R100.pairs dataset and R100 (self vs self):

consan/src/scompare -f -m mixed80.stk data/R100.pairs.stk

The -f options runs the full (unconstrained) pairSCFG.  This takes 
a *long* time and at least 2 Gb of memory for all 5S pairs.  The output
includes summary information which can be suppressed with the -S option.

[Generating self vs self comparison uses the R100 dataset but introduces
an informant sequence which is identical to the individual sequence.  The
name of the informant must necessarily be different from the original
sequence to be a proper STOCKHOLM file (ie. "self").  Once generated, 
the dataset is ran identically to R100.pairs.]

EXTERNAL PROGRAM:
Running mfold on the R100 dataset:
[mfold is an external program -- it is not provided as part of the 
Consan package.  Its usage is described here to reproduce the 
performance of Table 1 and Figure 4.]

> mfold MAX=750 P=20 W=0 SEQ=seqname.fa

[mfold requires FASTA to be 80 or less characters per line.]

Constrained algorithm versus full Sankoff (Figure 4):
-----------------------------------------------------
Run pairSCFG on the percid dataset:

consan/src/scompare -f -m mixed80.stk data/percid.stk

Run constrained pairSCFG on the percid dataset:

consan/src/scompare -m mixed80.stk data/percid.stk

The default constraints are posteriors > 0.95 and a protection 
window of 20 nucleotides.  These can be changed using 
-Z <posterior> to change the posterior cutoff and -W <window>
to change the window size.

[See "Comparison to Single Sequence Folding" above for information
 on running Gs and mfold.]

EXTERNAL PROGRAM:
Running Clustalw on the percid dataset:

Posterior Evaluation (Figure 5 & 6):
------------------------------------
Calculate accuracy of posterior probabilities:

consan/src/utilities/pincheck -z 4 data/percid.stk

Calculate accuracy of pins selected for given posterior 
cutoff (-Z) and protection window (-W):

consan/src/utilities/pincheck -z 6 -Z 0.95 -W 20 data/percid.stk

[The program pincheck can determine a number of different 
statistics about pins.  For a brief description of evalutation 
metrics use the -h option.]

EXTERNAL Sankoff Implementations (Figure 7):
---------------------------------------------
Dynalign:
dynalign seq1.seq seq2.seq seq1.ct seq2.ct pair.aln 15 0.4 1 0 0 0 0

seq1.seq and seq2.seq are required to be in dynalign's SEQ format.
seq1.ct, seq2.ct, and pair.aln are filenames for output.

Stemloc:
stemloc -g -na 100 -nf 1000 

PMcomp:
RNAfold -p -noLP < seqname1.fa
RNAfold -p -noLP < seqname2.fa
pmcomp.pl sname1_dp.ps sname2_dp.ps 

Note that RNAfold outputs a file called sname_dp.ps for an input
fasta file with ">sname" as the descriptor line.

FOLDALIGN:

foldalign -global -max_diff 25 -score_matrix global.fmat seqpair.fa 

Expects fasta file (seqpair.fa) to contain the two sequences for 
alignment.