Thursday, August 23, 2012

Memory convservation in R

Depending on how large a data set you are handling, R can take up large swaths of your RAM and slow down other processes. It is therefore a good idea to remove unused variables.
ls()  # Gives the list of variables/arrays currently in use.
rm(list=ls()) # Removes all the variables and frees up memory.
Useful links:
 http://www.matthewckeller.com/html/memory.html

Wednesday, August 22, 2012

Benchmarking Simulations

It is always a good idea to benchmark your simulations and select the optimal number of cores for your simulations. This exercise greatly reduces wastage of resources and time. The script below helps with the setup. You need files conf.in and template.pbs

for i in 24 48 96 192 240 576 960 1200; do
        mkdir scaling$i
        cp conf.in scaling$i
        sed 's/xxx/'$i'/g' template.pbs > scaling${i}/sub.pbs
        echo "cd scaling$i";
        echo "aprun -n $i $namd conf.in 2>&1 1> conf.out &";  
        echo "cd \$PBS_O_WORKDIR";
done

exit;



----- Template.pbs ------
#PBS -q debug
#PBS -l mppwidth=xxx
#PBS -l walltime=00:30:00
#PBS -N scaling
#PBS -S /bin/bash
#PBS -j oe

namd=/usr/common/usg/namd/2.8/bin/namd2
cd $PBS_O_WORKDIR
aprun -n xxx $namd conf.in 2>&1 1> conf.out

Tuesday, August 21, 2012

Making 2D heat-maps of protein secondary structure

I needed to make 2D heat-maps of protein structure with time on x-axis and protein residues on y-axis. This helps to see the time evolution of structure quite nicely. This is easily accomplished by loading the trajectory in VMD and sourcing the file ss_traj.tcl and going through your entire trajectory once. This creates a buffer with all the structure information. ss_traj.tcl creates a ss_traj.dat file which has the structure information. This ss_traj.dat file is then used by ss_plot.c, after compilation of course, to generate a .ps file with the desired output. All this is explained quite well here. I made a personal copy of the files here:
ss_traj.tcl
ss_plot.c
Sample ss_traj.dat file.

Once you have the output in .ps format you can do interesting things to change the file. The 7th line has the scaling information. The width and height of the output were not to my satisfaction; so I changed the scaling factors to adjust the display. Also the executable could crash when running due to memory requirements. Simply change the dimensions of the array in ss_plot.c according to your needs.

Thursday, August 2, 2012

Reading/processing multiple files in R

j<-1;
for (i in seq(300,390,15)) {
df<-read.table(paste("sasa",i,".txt",sep=``));
# Files are named "sasa300.txt" for ex.
tab[j,]<-cbind(i,mean(df[,2]),sd(df[,2]));
print(tab[j,]);
j<-j+1;
}
> tab
     [,1]     [,2]      [,3]
[1,]  300 3408.514  96.48097
[2,]  315 3422.388 119.14861
[3,]  330 3391.590  81.57252
[4,]  345 3424.176  84.84211
[5,]  360 3394.205 100.37193
[6,]  375 3462.469 122.60486
[7,]  390 3438.328  92.70519