subset - square brackets multiple columns R -
i flummoxed. trying isolate rows of df according values in 2 columns. try in practice data first. code works fine.
data1<-df2[df2$fruit=="kiwi" | df2$fruit=="orange" | df2$fruit=="apple" & (df2$dates>= "2010-04-01" & df2$dates< "2010-10-01"), ] when try same code on real data, doesn't work. collects "fruits" need, ignores date range request.
data1<-lti_first[lti_first$hai_atc=="c10aa01" | lti_first$hai_atc=="c10aa03" | lti_first$hai_atc=="c10aa04" | lti_first$hai_atc=="c10aa05" | lti_first$hai_atc=="c10aa07" | lti_first$hai_atc=="c10ab02" |lti_first$hai_atc=="c10aa04" |lti_first$hai_atc=="c10ab08" | lti_first$hai_atc=="c10ax09" & (lti_first$date_of_claim >= "2010-04-01" & lti_first$date_of_claim<"2010-10-01"), ] the structure of variables in practice data , real data exact same. fruits/hai_atc factors in both dfs, dates as.dates in both dfs.
in effort around i've tried subsetting data instead, won't work me either (but work on practice data)
x<-subset(lti_first, hai_atc=="v07ay03" | hai_atc=="a11jc94" & (date_of_claim>="2010-04-01" & date_of_claim<"2010-10-01")) what doing wrong? me, code looks identical!
sample df
names<-c("tom", "mary", "tom", "john", "mary", "tom", "john", "mary", "john", "mary", "tom", "mary", "john", "john") dates<-as.date(c("2010-02-01", "2010-05-01", "2010-03-01", "2010-07-01", "2010-07-01", "2010-06-01", "2010-09-01", "2010-07-01", "2010-11-01", "2010-09-01", "2010-08-01", "2010-11-01", "2010-12-01", "2011-01-01")) fruit<-as.character(c("apple", "orange", "banana", "kiwi", "apple", "apple", "apple", "orange", "banana", "apple", "kiwi", "apple", "orange", "apple")) age<-as.numeric(c(60,55,60,57,55,60,57,55,57,55,60,55, 57,57)) sex<-as.character(c("m","f","m","m","f","m","m", "f","m","f","m","f","m", "m")) df2<-data.frame(names,dates, age, sex, fruit) df2 dput(df2) structure(list(names = structure(c(3l, 2l, 3l, 1l, 2l, 3l, 1l, 2l, 1l, 2l, 3l, 2l, 1l, 1l), .label = c("john", "mary", "tom" ), class = "factor"), dates = structure(c(14641, 14730, 14669, 14791, 14791, 14761, 14853, 14791, 14914, 14853, 14822, 14914, 14944, 14975), class = "date"), age = c(60, 55, 60, 57, 55, 60, 57, 55, 57, 55, 60, 55, 57, 57), sex = structure(c(2l, 1l, 2l, 2l, 1l, 2l, 2l, 1l, 2l, 1l, 2l, 1l, 2l, 2l), .label = c("f", "m"), class = "factor"), fruit = structure(c(1l, 4l, 2l, 3l, 1l, 1l, 1l, 4l, 2l, 1l, 3l, 1l, 4l, 1l), .label = c("apple", "banana", "kiwi", "orange"), class = "factor")), .names = c("names", "dates", "age", "sex", "fruit"), row.names = c(na, -14l), class = "data.frame") **real data big put in dput, here's str instead
str(sample_lti_first) 'data.frame': 20 obs. of 5 variables: $ hai_dispense_number: factor w/ 53485 levels "patient hai0000017",..: 22260 22260 2527 24311 24311 24311 24311 13674 13674 13674 ... $ sex : factor w/ 4 levels "f","m","u","x": 2 2 2 1 1 1 1 1 1 1 ... $ hai_age : int 18 18 27 40 40 40 40 28 28 28 ... $ date_of_claim : date, format: "2009-10-09" "2009-10-09" "2009-10-18" ... $ hai_atc : factor w/ 1038 levels "","a01aa01","a01ab03",..: 144 76 859 80 1009 1009 859 81 1008 859 ...
does work?
data1 <- subset(lti_first, (hai_atc %in% c("c10aa01", "c10aa03", "c10aa04", "c10aa05", "c10aa07", "c10ab02", "c10aa04", "c10ab08", "c10ax09")) & (date_of_claim >= as.date("2010-04-01") & date_of_claim < as.date("2010-10-01"))) note use of %in% , as.date.
Comments
Post a Comment