Tuesday, June 26, 2012

Get more aggressive control over recursive deletion with Emacs dired

Using the Emacs directory editor, dired, as much as I do, I've wanted a quick way to force recursive deletion of a selection of (sub(sub)) directories without having to confirm each deletion. Here's a snippet for you init.el that builds on Emacs' existing variable `dired-recursive-deletes`.

Tuesday, October 04, 2011

HOWTO: turn off orgmode's evaluation of source blocks during export

I find that I generally do NOT want org mode to execute source code
blocks as a side effect of exporting, which is the default behavior.
Rather, I prefer to explicitly execute the code either interactively
block by block (such as when I'm developing an analysis), or using
org-babel-execute-(buffer|subtree) commands. So, I typically set
`org-export-babel-evaluate` to nil on a buffer basis with the
following as 1st line in buffer:

# -*- org-export-babel-evaluate: nil; -*-

Thursday, June 09, 2011

Tabulating gaps in BAM files

If you're analyzing RNA-SEQ data and care about quantifying splicing, read on...
Here is an approach to tabulating the counts of gaps present in a bam file.
It is implemented in R using bioconductor packages
It takes about a minute to tabulate 33 million reads on my hardware.
It produces a data.frame for further downstream analysis of your choosing, and, optionally, a corresponding "junctions.bed" file (i.e. as is produced by Tophat and can be visualized in IGV).
Here's an example of using it and inspecting its output:
> btg <- bamTabulateGaps('t/t1.sam.sorted.bam')
> names(btg)
[1] "bedPath" "tabulatedGaps"
> btg$bedPath
[1] "t/t1.sam.sorted.junctions.bed"
> system(paste('head ', btg$bedPath))
track name=t1 graphType=junctions
2L 11343 11410 . 25
2L 11517 11779 . 51
2L 12220 12286 . 121
2L 12927 13520 . 109
2L 13624 13683 . 109
2L 14873 14933 . 114
2L 15710 17053 . 38
2L 17211 18026 . 4
2L 17211 18261 . 5
> head(btg$tabulatedGaps)
space start end score
1 2L 11344 11410 25
2 2L 11518 11779 51
3 2L 12221 12286 121
4 2L 12928 13520 109
5 2L 13625 13683 109
6 2L 14874 14933 114

Enjoy the code:

Wednesday, April 07, 2010

Control IGV from Excel or other Macintosh/Windows Application using Applescript/VBA

IGV is great for browsing genomes. Here is some quick Applescript|VBA that you can install into an Excel workbook (or other Mac|Windows app) to allow navigating IGV to the locus in the current cell. With this approach, I can deliver Excel workbooks containing RNASeq expression values, and the recipient can sort/filter or otherwise slice and dice them, and easily navigate to the genes that survive the dicing....

I'm likely to make some improvements on this and build an installer for it, but, I put it out here/now in case someone
* has already done this better
* has suggestions for improvement to take into consideration
* want's to use it as is
-- NAME: IGVGoTo.scpt
-- PURPOSE: Applescript to cause IGV (http://www.broad.mit.edu/igv/) to 'goto' the locus appearing in the Excel's active cell.
-- AUTHOR: malcolm_cook@stowers.org
-- * installing XNet as obtained from http://lestang.org/spip.php?article20
-- * the XOSL.Framework gets installed into Users//Library/Frameworks
-- * configuring IGV > View > Preferences > advanced > Enable Port
-- it must be checked ON and set to 60151 (the default)
-- * Save this script as Users//Documents/Microsoft User Data/Excel Script Menu Items/IGVGoTo.scpt
-- * OPTIONAL: assign a keyboard shortcut:
-- * Open the System Preferences, click on "Keyboard & Mouse" and select the "Keyboard Shortcuts" tab.
-- * Click on the plus sign beneath the list view. In the dialog sheet that pops up, select Microsoft Excel from the Application menu (install it if needed - I had to)
-- * type "IGVGoTo" into the "Menu Title" field, tab to the Keyboard Shortcut field, and type in the shortcut you would like to use (I prefer Command-Option-I) and click the Add button
-- * IGVGoTo will appear in Microsoft Excels's script menu (rightmost menu)
-- * navigate to a cell whose value is locus for the currently displayed IGV genome and run the script.
-- The active cells value will be used as the `locus` target of a `goto` command
-- * c.f. http://www.broadinstitute.org/igv/PortCommands for valid targets
-- * bring IGV to the front (optionally?)
-- * dispatch on the frontmost appliction - get the locus appropriate to the application - implement for
-- MSWord and FileMakerPro
-- * trap errors and provide diagnostics (i.e. XNet not installed, port not openable, IGV not running, bad response from IGV, etc)

tell application "Microsoft Excel"
my IGVGoTo("localhost", 60151, value of active cell as string) end tell

on IGVGoTo(theHost, thePort, theLocus)
tell application "XNet"
set s to make new socket with properties {host:theHost, port:thePort}
socket open s
socket write s data "goto " & theLocus
delete s
end tell
end IGVGoTo

Here's the VBA for the Window's folk:

Public Sub IGV_Goto()
'PURPOSE: Excel Macro to cause IGV (http://www.broad.mit.edu/igv/) to 'goto' the locus appearing in the Excel's active cell.
' AUTHOR: malcolm_cook@stowers.org
' * configuring IGV > View > Preferences > advanced > Enable Port
' it must be checked ON and set to 60151 (the default)
' * installing a free winsock implementation from "http://www.ostrosoft.com/oswinsck/oswinsck_vba.asp"
' * create a new VBA Macro out of this script in the desired Excel workbook - like this:
' alt-F11 to open VBA
' click on 'This workbook'
' paste in this script
' create a reference to the winsock library, by checking "OstroSoft Winsock Component" under "Tools > references".
' quit VBA with alt-q
' Alt-F8 (back in Excel) will list your macros - click option button to assign to a key (I use Ctrl-Shift-I)

On Error GoTo Err
Dim cmd As String
Dim locus As String
Dim r As Excel.Range
Set r = Excel.ActiveCell.EntireRow '.CurrentArray.RowDifferences..Rows
locus = Excel.ActiveCell.Value
cmd = "goto " & locus
Dim wso As New OSWINSCK.TCP ' free winsock implementation from "http://www.ostrosoft.com/oswinsck/oswinsck_vba.asp"
wso.Connect "localhost", 60151
wso.SendData cmd & vbCr
Exit Sub
MsgBox Err.Description
End Sub

Tuesday, October 28, 2008

eliminating redundant indexes from your mysql database

To find redundant (unneccesary) indexes in your mysql database, try the information schema views in
Roland Bouman's Redundant Index Finder. To delete them, try the procedure I_S_REDUNDANT_INDEXES_DROP which I just posted to the Community Feedback section.

Tuesday, September 11, 2007

HOWTO: search unix's $LD_LIBRARY_PATH

In figuring out how to search $LD_LIBRARY_PATH for (hopefully not multiply) installed versions of libraries, I found that $LD_LIBRARY_PATH, being colon-delimited, was not suitable argument for the unix `find` command.

Thus, an 'on-the-fly' edit to replace the colons with spaces is needed, consing up a suitable arg to find.

I had never used the string substitution modifer to bash's parameter expansion. Now I do.

For example:

>find ${LD_LIBRARY_PATH//:/ } -maxdepth 1 -name libreadline.* -print

Which prompts me to write the following bash function:

function llpfind {
# EXAMPLE: llpfind -name libreadline.*
find ${LD_LIBRARY_PATH//:/ } -maxdepth 1 $@ -print

Wednesday, April 11, 2007

HOWTO: inferring SO compliant features for splice_donor_site and splice_acceptor_site given a gene model

Correctly inferring SO compliant features for splice_donor_site and splice_acceptor_site given a gene model can be tricky.

I hope the following simplified example is useful to understanding the issue.


Given this simplified gene model containing two exon each being 3bp


and given these SO definitions:

splice_donor_site: The junction between the 3 prime end of an exon and the following intron.
splice_acceptor_site: The junction between the 3 prime end of an intron and the following exon.

...we should encode the gene as:

HOWEVER, if the gene codes the other way, viz.


...we should encode it as:

Note that the coordinates of the exon and intron are the same in both encodings, only the strand is different; AND, the coordinates of the
splice sites are also the same between encodings, due to understanding GFF3: "For zero-length features, such as insertion sites, start equals end and the implied site is to the right of the indicated base in the direction of the landmark"

"to the right of the indicated base in the direction of the landmark." as "1 plus the indicated base, in interbase coordinates"

It is this understanding that I hope to have clarified by this example, demonstrating in particular that the splice sites should NOT be encoded in the second model as: