Musings on Software: February 2012

Wednesday, February 29, 2012

PLINK

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.

PLINK (one syllable) is being developed by Shaun Purcell at the Center for Human Genetic Research (CHGR), Massachusetts General Hospital (MGH), and the Broad Institute of Harvard & MIT, with the support of others.

Basic usage for SNP lookup function
plink --lookup rs1475515
which outputs to the LOG file the following information

Wednesday, February 22, 2012

Bookmarks

My favorite online bookmarking tool is delicious.com. It is a pretty neat, no-nonsense website currently run by Chad Hurley and Steve Chen. I typically assign a tag to a saved link and looking at what other people have saved under that tag can be quite educational. Here's how to do it:
delicious.com/tag/name-of-tag

Monday, February 20, 2012

Solvating molecules with VMD

To perform MD simulations with NAMD with Charmm ff, I typically use Charmm to build the initial psf/pdb of the protein and then use VMD to solvate it. Now VMD does not like any psf file generated by Charmm. It likes to its own version of psf file. For this one has to first read in a pdb file and read out a psf file. For this the following script is helpful:

package require psfgen
topology ./top_all27_prot_na.inp
segment PROT { pdb xp5helixonXaxis.out.pdb }
coordpdb xp5helixonXaxis.out.pdb PROT
guesscoord
writepsf vmd.psf

package require solvate
solvate vmd.psf xp5helixonXaxis.out.pdb -o solvate -s WT -t 15

Useful resource:
http://www.ks.uiuc.edu/~villa/dna/

Thursday, February 16, 2012

Fitting Protein Structures in VMD

# This script can be used to first load a reference structure and
# fit the second structure/trajectory to the reference.

mol new reference.pdb

mol new enzfibsol.psf
mol addfile md2.dcd type dcd first 0 last -1 step 1 waitfor all
mol addfile md3.dcd type dcd first 0 last -1 step 1 waitfor all
mol addfile md4.dcd type dcd first 0 last -1 step 1 waitfor all
mol addfile md5.dcd type dcd first 0 last -1 step 1 waitfor all

set backbone [atomselect 0 backbone]

for {set i 0} {$i < [molinfo 1 get numframes]} {incr i 1} {

set bk [atomselect 1 backbone frame $i]
set all [atomselect 1 all frame $i]
$all move [measure fit $bk $backbone]

}

Wednesday, February 15, 2012

Image manipulation in Matlab

a = double(imread('figures_from_microscope/500nm_particles_1200.tif'));
colormap('gray'), imagesc(a);

b = bpass(a,0,5,20);
# bpass(img,lnoise, lobject, threshold)
colormap('gray'), image(b);

pk = pkfnd(b,25,3);
# pkfnd(im, th, sz)

fid = fopen('exp.txt', 'w');
fprintf(fid, '%6.2f %12.8f\n', pk);
fclose(fid);

pk= pkfnd(b(200:400,200:400), 30,10);

# Below is a way to display multiple figures
figure(1); colormap('gray'), imagesc(a);
figure(2); colormap('gray'), image(b);
figure(3); colormap('gray'), image(b(200:400,200,400));
figure(4); colormap('gray'), image(a(200:400,200:400));

# This will put dots of 'r'ed on the particles of interest.
plot(pk(:,1),pk(:,2),'r.')
Looking at a small section of an image and checking for accuracy of the image manipulation is necessary. This is important since we make many assumptions on what particles to look at and what to ignore [this is what values in bpass and pkfnd do].

If var is a variable you are using in Matlab, it can be saved into a file using:
save('filename.txt', 'var', '-ascii')

These are the relevant links:
http://physics.georgetown.edu/matlab/tutorial.html
http://www.physics.emory.edu/~weeks/idl/tracking.html
http://www.physics.emory.edu/~weeks/idl/index.html

Friday, February 10, 2012

Getting header names in R and playing with csv files

names(myTable)

a<-read.csv("file.csv", header=T, sep=",")
names(a)
motility<-subset(a, Putative.orf.function=="Motility & Attachment")
write.csv(motolity, file="motility.csv")

Edit Feb 15th 2012

grep("Motility", a$Putative.orf.function)

gives the list of rows whose Putative.orf.function column contains the string "Motility".

Using this, one can get the entire row information.

df<-read.csv("file.csv", header=T, sep=",")

names(df)

 > names(df)
 [1] "Strain.Name"           "ID"                    "Location"             
 [4] "PA.ORF"                "Gene"                  "Gene.Abbrev."         
 [7] "Putative.orf.function" "Position.in.ORF"       "Frame"                
[10] "Genome.Position"       "Transposon.Direction"  "Transposon"           
[13] "F.Primer.Name"         "F.Primer.Seq"          "F.Primer.Position"    
[16] "F.Primer.Tm"           "Insert...F"            "R.Primer.Name"        
[19] "R.Primer.Seq"          "R.Primer.Position"     "R.Primer.Tm"          
[22] "R...Insert"            "WT.PCR.Len"            "Confirmed."           
[25] "How.confirmed."        "Notes"                
> df$Strain.Name[grep("Attachment",df$Putative.orf.function)]
  [1] PW10299 PW10300 PW1728  PW1729  PW1730  PW1731  PW1749  PW1750  PW1751 
 [10] PW1752  PW1753  PW1754  PW1755  PW1756  PW1757  PW2791  PW2792  PW2794 
 [19] PW2795  PW2941  PW2942  PW2943  PW2944  PW2945  PW2946  PW2947  PW2948 
...

[118] PW8681  PW8682  PW9465  PW9466  PW9467  PW9468  PW9469  PW9470  PW9471 
[127] PW9472  PW9473  PW9474