Friday, August 30, 2013

Subsetting data with string matching in R

Getting a subset from a larger pool of data is a major strength in R. This subsetting can be based on numbers or strings; the latter being more challenging. String matching (using grep, grepl, sub) and subsetting (using, well, subset in R) are two separate feature and one can combine them to great effect. Details of what grep, grepl etc can do are at
http://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html . This site
http://atomicules.co.uk/2010/10/09/invert-a-subset-selection-when-using-grepl-in-r.html
was of great help when I wanted (had) to use grepl in subset but I also wanted to use the invert feature available only in grep and not in grepl. Turns out grepl returns a vectors of T and F which can be easily inverted with a simple !grepl(...)

Thursday, August 15, 2013

Area under a curve

R has a neat way to calculate area under a curve (whether a function/equation or a trend line). The example below describes the two ways in which this can be achieved. rollmean requires library(zoo).


Thursday, August 8, 2013

Making and filling grids with R

I had a table with 64 rows and 130 columns and I needed to map a third dimension as a simple X on the 2D plot based on some criteria. I got this done in R using a code similar to the one below.


Tuesday, August 6, 2013

Contact Maps using VMD

Contact maps are a quick way to identify residues of a protein (or monomer) that interact with residue from another protein, ligand or monomer. The script below helps obtain a distance list:

set seg1 [atomselect top "segname SG03 and name CA"]
set seg2 [atomselect top "segname SG04 and name CA"]

set file [open "Contact_map.dat" w]

set list1 [$seg1 get index]
set list2 [$seg2 get index]

foreach atom1 $list1 {
        foreach atom2 $list2 {
                set index1 [atomselect top "index $atom1"]
                set index2 [atomselect top "index $atom2"]
                set resid1 [[atomselect top "index $atom1"] get resid]
                set resid2 [[atomselect top "index $atom2"] get resid]
                set resnm1 [[atomselect top "index $atom1"] get resname]
                set resnm2 [[atomselect top "index $atom2"] get resname]
     puts $file "$resnm1 $resid1 $resnm2 $resid2 [veclength [vecsub [measure center $index1] [measure center $index2]]]"
                $index1 delete
                $index2 delete
        }
puts $file " "   # I include this line to make it easier to make contour plots using gnuplot.
}

close $file

Aligning hexameric proteins along an axis

This tcl script aligns a hexamer such that its central axis is aligned along the y-axis.


set all [atomselect top protein]
$all moveby [vecscale [measure center $all] -1]

set res10 [atomselect top "resid 10"]
set res40 [atomselect top "resid 40"]

set cenres10 [measure center $res10]
set cenres40 [measure center $res40]

set vector [vecsub $cenres10 $cenres40]

$all move [transvecinv $vector]
$all move [transaxis z 90]

foreach i {A B C D E F} {
        set chain [atomselect top "chain $i and protein and resid 4 to 105"]
        $chain writepdb chain$i.pdb
}


This script is also applicable to trimers, tetramers and other aggregates that have a well defined central axis.