r - Subset based on granularity and average values -


i have large data-frame consists of 2 columns. want calculate average of second column values each subset of first column. subset of first column based on specified granularity. example, following data-frame, df, want calculate average of df$b values each subset of df$a increment(granularity) of 1 each subset. results should in 2 new columns.

a       b            expected results     newa      newb 0.22096 1                                  0         1.142857 0.33489 1                                  1         2 0.33655 1                                  2         4 0.43953 1 0.64933 2 0.86668 1 0.96932 1 1.09342 2 1.58314 2 1.88481 2 2.07654 4 2.34652 3 2.79777 5 

this simple example, i'm not sure how loop on whole data-frame , perform calculation i.e. average of df$b.

tried below subset, couldn't figure how append results , create final results:

tried :

increment<-1 mx<-max(df$a) i<-0  newdf<-data.frame() while(i < mx){     tmp<-subset(df, (a >i & a< (i+increment)))     i<-i+granualrity } 

not sure logic. i'm sure there short way required calculation. thoughts?

i use findinterval subset selection (in example simple ceiling each a value should sufficient, too. if increment different 1 need findinterval.) , tapply calculate mean:

df <- read.table(textconnection("       b 0.22096 1 0.33489 1 0.33655 1 0.43953 1 0.64933 2 0.86668 1 0.96932 1 1.09342 2 1.58314 2 1.88481 2 2.07654 4 2.34652 3 2.79777 5"), header=true)  ## sort data.frame column (needed findinterval) df <- df[order(df$a), ]  ## define granuality subsets <- seq(1, max(ceiling(df$a)), by=1) # change "by" argument different increments df$subset <- findinterval(df$a, subsets)  tapply(df$b, df$subset, mean) #       0        1        2  #1.142857 2.000000 4.000000 

Comments

Popular posts from this blog

basic authentication with http post params android -

vb.net - Virtual Keyboard commands -

How to get multiresult with multicondition in Sql Server -