Friday, August 30, 2013

Subsetting data with string matching in R

Getting a subset from a larger pool of data is a major strength in R. This subsetting can be based on numbers or strings; the latter being more challenging. String matching (using grep, grepl, sub) and subsetting (using, well, subset in R) are two separate feature and one can combine them to great effect. Details of what grep, grepl etc can do are at
http://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html . This site
http://atomicules.co.uk/2010/10/09/invert-a-subset-selection-when-using-grepl-in-r.html
was of great help when I wanted (had) to use grepl in subset but I also wanted to use the invert feature available only in grep and not in grepl. Turns out grepl returns a vectors of T and F which can be easily inverted with a simple !grepl(...)

No comments:

Post a Comment