r - Subset based on granularity and average values -
i have large data-frame consists of 2 columns. want calculate average of second column values each subset of first column. subset of first column based on specified granularity. example, following data-frame, df, want calculate average of df$b values each subset of df$a increment(granularity) of 1 each subset. results should in 2 new columns.
a b expected results newa newb 0.22096 1 0 1.142857 0.33489 1 1 2 0.33655 1 2 4 0.43953 1 0.64933 2 0.86668 1 0.96932 1 1.09342 2 1.58314 2 1.88481 2 2.07654 4 2.34652 3 2.79777 5 this simple example, i'm not sure how loop on whole data-frame , perform calculation i.e. average of df$b.
tried below subset, couldn't figure how append results , create final results:
tried :
increment<-1 mx<-max(df$a) i<-0 newdf<-data.frame() while(i < mx){ tmp<-subset(df, (a >i & a< (i+increment))) i<-i+granualrity } not sure logic. i'm sure there short way required calculation. thoughts?
i use findinterval subset selection (in example simple ceiling each a value should sufficient, too. if increment different 1 need findinterval.) , tapply calculate mean:
df <- read.table(textconnection(" b 0.22096 1 0.33489 1 0.33655 1 0.43953 1 0.64933 2 0.86668 1 0.96932 1 1.09342 2 1.58314 2 1.88481 2 2.07654 4 2.34652 3 2.79777 5"), header=true) ## sort data.frame column (needed findinterval) df <- df[order(df$a), ] ## define granuality subsets <- seq(1, max(ceiling(df$a)), by=1) # change "by" argument different increments df$subset <- findinterval(df$a, subsets) tapply(df$b, df$subset, mean) # 0 1 2 #1.142857 2.000000 4.000000
Comments
Post a Comment