Data is never perfect, especially when transfering between systems. I was working with some Jira data and \N appeared in most of the columns upon extraction. So, I needed something that would loop through all columns and set those values to blank. Setting to NA would be easy to do also. Escape characters can be a nightmare.
jira.data <- as.data.frame(lapply(jira.data, function(x)
if(is.character(x)|is.factor(x))
gsub("\\\\N", "", x) else x))
Taking out underscores from column names.
names(jira.data) <- gsub("_", "", names(jira.data))
Sometimes you just have to bulldog the data and your code. Easier to make a list and add to it when necessary. I had a bunch of cases that I knew represented useless data. This code removes them by subsetting the data.
txt <- c('domo', 'modocorp', 'apptest', 'qa2stag', 'qastag', 'support-prod',
'support', '220221', '^ec-7', '^ec-8', '^ec-9', 'appdev', 'prod5-', 'dev-', 'publisher',
'qa-', 'ptk-', 'ckdemo', '^training', 'standard2-test1', 'demo', 'brian-prod', 'erictest',
'@', 'bohme', 'test', 'custom', 'dev-', 'dev.', 'http', 'sandbox', 'freemium', 'verizondemo')
data.frame <- data.frame[-grep(paste(txt, collapse = '|'), data.frame$var_name, ignore.case = TRUE),]
I was cleaning up a data set imported from SQL. One of the columns contained the text of emails of which a few had Japanese characters. These cases were few and far between and not needed for my analysis. Here is an easy way to wipe them out.
text <- c("はしとみま", "とまきはと", "とまは", "しつくきは", "そみ", "hammer", "toe-jam")
text <- iconv(text, "latin1", "ASCII", sub="")
text
## [1] "" "" "" "" "" "hammer" "toe-jam"