subset - square brackets multiple columns R -

January 15, 2011

i flummoxed. trying isolate rows of df according values in 2 columns. try in practice data first. code works fine.

data1<-df2[df2$fruit=="kiwi" |  df2$fruit=="orange" | df2$fruit=="apple"  & (df2$dates>= "2010-04-01" & df2$dates<  "2010-10-01"), ]

when try same code on real data, doesn't work. collects "fruits" need, ignores date range request.

 data1<-lti_first[lti_first$hai_atc=="c10aa01" | lti_first$hai_atc=="c10aa03" | lti_first$hai_atc=="c10aa04" | lti_first$hai_atc=="c10aa05" | lti_first$hai_atc=="c10aa07" | lti_first$hai_atc=="c10ab02" |lti_first$hai_atc=="c10aa04" |lti_first$hai_atc=="c10ab08" | lti_first$hai_atc=="c10ax09" & (lti_first$date_of_claim >= "2010-04-01" & lti_first$date_of_claim<"2010-10-01"), ]

the structure of variables in practice data , real data exact same. fruits/hai_atc factors in both dfs, dates as.dates in both dfs.

in effort around i've tried subsetting data instead, won't work me either (but work on practice data)

x<-subset(lti_first, hai_atc=="v07ay03" | hai_atc=="a11jc94" & (date_of_claim>="2010-04-01" & date_of_claim<"2010-10-01"))

what doing wrong? me, code looks identical!

sample df

names<-c("tom", "mary", "tom", "john", "mary",  "tom", "john", "mary", "john", "mary", "tom", "mary", "john", "john") dates<-as.date(c("2010-02-01", "2010-05-01", "2010-03-01",  "2010-07-01", "2010-07-01", "2010-06-01", "2010-09-01",  "2010-07-01", "2010-11-01", "2010-09-01", "2010-08-01",  "2010-11-01", "2010-12-01", "2011-01-01")) fruit<-as.character(c("apple", "orange", "banana", "kiwi",  "apple", "apple", "apple", "orange", "banana", "apple",  "kiwi", "apple", "orange", "apple")) age<-as.numeric(c(60,55,60,57,55,60,57,55,57,55,60,55, 57,57)) sex<-as.character(c("m","f","m","m","f","m","m",  "f","m","f","m","f","m", "m")) df2<-data.frame(names,dates, age, sex, fruit) df2   dput(df2) structure(list(names = structure(c(3l, 2l, 3l, 1l, 2l, 3l, 1l,  2l, 1l, 2l, 3l, 2l, 1l, 1l), .label = c("john", "mary", "tom" ), class = "factor"), dates = structure(c(14641, 14730, 14669,  14791, 14791, 14761, 14853, 14791, 14914, 14853, 14822, 14914,  14944, 14975), class = "date"), age = c(60, 55, 60, 57, 55, 60,  57, 55, 57, 55, 60, 55, 57, 57), sex = structure(c(2l, 1l, 2l,  2l, 1l, 2l, 2l, 1l, 2l, 1l, 2l, 1l, 2l, 2l), .label = c("f",  "m"), class = "factor"), fruit = structure(c(1l, 4l, 2l, 3l,  1l, 1l, 1l, 4l, 2l, 1l, 3l, 1l, 4l, 1l), .label = c("apple",  "banana", "kiwi", "orange"), class = "factor")), .names = c("names",  "dates", "age", "sex", "fruit"), row.names = c(na, -14l), class = "data.frame")

**real data big put in dput, here's str instead

str(sample_lti_first) 'data.frame':   20 obs. of  5 variables:  $ hai_dispense_number: factor w/ 53485 levels "patient hai0000017",..: 22260 22260 2527 24311 24311 24311 24311 13674 13674 13674 ...  $ sex                : factor w/ 4 levels "f","m","u","x": 2 2 2 1 1 1 1 1 1 1 ...  $ hai_age            : int  18 18 27 40 40 40 40 28 28 28 ...  $ date_of_claim      : date, format: "2009-10-09" "2009-10-09" "2009-10-18" ...  $ hai_atc            : factor w/ 1038 levels "","a01aa01","a01ab03",..: 144 76 859 80 1009 1009 859 81 1008 859 ...

does work?

data1 <- subset(lti_first,   (hai_atc %in% c("c10aa01", "c10aa03", "c10aa04", "c10aa05", "c10aa07",                   "c10ab02", "c10aa04", "c10ab08", "c10ax09")) &    (date_of_claim >= as.date("2010-04-01") & date_of_claim < as.date("2010-10-01")))

note use of %in% , as.date.

Search This Blog

Error

subset - square brackets multiple columns R -

Comments

Post a Comment

Popular posts from this blog

android - Inheriting from Theme.AppCompat* -

broadcastreceiver - android BOOT_COMPLETED not received if not activity intent-filter -

basic authentication with http post params android -