abSENSE home

About

Paper (please cite!)

Downloadable code (via Github)

abSENSE:
a method to interpret undetected homologs

What is abSENSE?

abSENSE is a method that calculates the probability that a homolog of a given gene would fail to be detected by a homology search (using BLAST or a similar method) in a given species, even if the homolog were present and evolving normally.

The result of this calculation informs how one interprets the result of a homology search failing to find homologs of a gene in some species. One possibility to explain such a result is that the gene is actually absent from the genome in that species: a biological, and potentially interesting (e.g. if due to a gene loss or the birth of a new gene), result.

A second explanation, often ignored, is that the homolog is present in the genome of that species, but that the homology search merely lacks statistical power to detect it. Here, the apparent absense of the homolog is a technical/statistical limitation, and does not reflect underlying biology.

By calculating the probability that your homology search would fail to detect a homolog even if one were present and even if it were evolving normally (e.g. no rate accelerations on a specific branch, potentially suggestive of biologically interesting changes), abSENSE informs the interpretation of a negative homology search result. If abSENSE finds that there is a high probability of a homolog being undetected even if present, you may not be as inclined to invoke a biological explanation for the result: the null model of a failure of the homology search is sufficient to explain what you observe.

The method is explained in further detail in the paper (citation). There, it is applied to the specific case of lineage-specific genes, for which homologs appear absent in all species outside of a narrow lineage. The method itself is applicable to any case in which a homolog appears absent (e.g. a single species missing a homolog that one might interpret as a gene loss), and likewise, this code is applicable to all such cases.

When should I use this site vs. the downloadable code?

Small numbers of analyses: This website can analyze one gene at a time, outputs numerical results to the screen (not a file), and produces a visualization of the resulting analysis. If you want to analyze one or a few genes, for which you have the bandwidth and desire to look at their visualizations, this website will probably serve you well. By contrast, if you're looking to analyze hundreds or thousands of genes, you should use the downloadable command line code, which is faster, can be run on an arbitrary number of genes at once, and outputs results to a tab-delimited file.

No need for advanced options: The command line code allows you to implement advanced options that are not needed for the standard use case of abSENSE, and they are not included on the website. These include adjusting the E-value threshold and database sizes for the pre-computed fungal and insect genes and using bitscores from only a subset species in the fitting procedure and subsequent analysis. You can see a complete list of these options in the README on the github page.

FAQ

Why do some species appear in gray text in the abSENSE output? As indicated by the label 'Orthology ambiguous,' these are species in which a gene was detected that is homologous to the query gene at the chosen significance threshold, but which is at risk of not being a strict orthlog (e.g. could be a paralog), because it failed the Reciprocal Best Hit criterion. These species aren't missing homologs, and so we don't consider them as such in our analysis, but since their homologs may not be orthologs, including them in the prediction procedure for other orthologs may throw off the results (paralogs can have different evolutionary rates/patterns), and so we don't use them to predict bitscores. Who should I contact if I have questions? You should email the author, Cara Weisman, at weisman@g.harvard.edu.