Summation in condition of another variable
Problem:
We have a large weather dataset that consists of more than 50 year weather. We want to know the distribution and average of accumulative rainfall in August and September during the history. We have to calculate the total rainfall in August and September for all years and then create a new variable for further analysis. The problem is how we can calculate the total rainfall in August and September every year and put the values in a separate variable.
Solution:
Suppose the weather data stored in demo.txt file. In R, a subset of the whole dataset can be obtained by subset function for August and September after the data file is read into the memory. The new dataset includes all information we need. Now the key part is how we can calculate the summation based on year. There a user-defined function, sumUp, to fulfill the job. It takes the dataset, a list of variables as key, a list of variables for summation and returns a data frame with key and summation. Then statistics of the newly generated dataset and plots can be generated by R functions.
sumUp <- function(dat, key_list, sum_list) {
key <- with(dat, do.call("paste", dat[, key_list, drop = FALSE]))
totals <- as.matrix(sapply(dat[, sum_list, drop = FALSE], tapply, key, sum))
dimnames(totals)[[2]] <- paste("total", sum_list, sep = "_")
m <- match(dimnames(totals)[[1]], key)
cbind(dat[m, key_list, drop = FALSE], totals)
}
lsmet <- read.table("c:/demo.txt", header=TRUE)
lsmet1<-subset(lsmet, lsmet$Day>=213 & lsmet$Day<=273)
lsmetT <- sumUp(lsmet1, "Year", "RAIN")
s <- summary(lsmetT)
plot(lsmetT$Year,lsmetT$total_RAIN)
barplot(lsmetT$total_RAIN, lsmetT$Year, xlab="Year", ylab="Rainfall(mm)")
