Posts Tagged formula

Construct R formula with variable names in a vector automatically

Suppose you have a list of variable names in a vector and you want to construct some standard linear model from the vector, how can we do it in R. Actually it is pretty easy in R. By using two R functions, that is,  formula and paste, linear model can be constructed automatically

1.additive model

PredictorVariables<- c("x1","x2")

Apply approach: We can then construct a formula as follows:
PredictorVariables <- paste(“x”, 1:100, sep=””)
Apply approach: We can then construct a formula as follows:

Formula <- formula(paste("y ~ ", 
     paste(PredictorVariables, collapse=" + ")))
lm(Formula, Data)

2.multiplicative model
Apply approach: We can then construct a formula as follows:
PredictorVariables <- paste(“x”, 1:100, sep=””)
Apply approach: We can then construct a formula as follows:

Formula <- formula(paste("y ~ ", 
     paste(PredictorVariables, collapse=" * ")))
lm(Formula, Data)

If you have more than two variable names in the vector, the mode can be quite complex, including higher order of interactions.

By using the similar idea, you can also include random terms in the constructed model.

Appendix: Get all column headers that fit to certain patterns.
Suppose I have data frame called my.df and there are some column headers include string “trt_”, we can use the following methods to get all columns headers with “trt_” and store them into a vector.


my.colnames <- colnames(my.df)
my.headers <- my.colnames[grepl("trt_", my.colnames)]
Share

Tags: , ,

Dynamic graphing in Excel

Problem:

Data are in two columns: A and B.  I am plotting a graph which takes data from columns  A and B as its source data. I want to generate an Excel template that can use many times. That is, I can copy data to the data sheet and view graph directly. The number of rows in both the columns changes with time. For example, one data set only has 50 rows and another one has 150 rows. If I create a graph by directly use the data in columns A and B, I have to change the number of rows to exclude blank cells while plotting a graph.

Solution:

The key of the solution depends on Excel worksheet function – OFFSET and COUNTA. OFFSET function takes a cell as reference  and number of columns and rows away from the reference cell and the final number of rows and columns of returned range. The result of the OFFSET function is a reference to a range that is specified number of rows and columns from a cell or range of cells. COUNTA counts the number of cells that are not empty and the values within the list of arguments. By combining these two worksheet functions, we can defined a named variable in Excel, which is dynamic in size. Here is a step by step instruction.
a) Create a chart as you would like to see it based on the values in column A and B. Let’s say it has 100 rows of data, with headers in A1 and B1.
b) Click Insert -> Name -> Define…
c) Type a new name, such as CatVar and for the formula, use =OFFSET(Sheet1!$A$2,0,0,COUNTA(Sheet1!$A:$A)-1,1)
d) Click Insert -> Name -> Define… again
e) Type a new name, such as ValueVar and for the formula, use =OFFSET(Sheet1!$B$2,0,0,COUNTA(Sheet1!$B:$B)-1,1)
f) On your chart, click on one of the data points in curve or bar. The formula for the series should show up in the Formula bar, similar to: =SERIES(Sheet1!$B$1,Sheet1!$A$2:$A$100,Sheet1!$B$2:$B$100,1)
g) Change that formula in the Formula bar to: =SERIES(Sheet1!$B$1,Book1!CatVar,Book1!ValueVar,1)

Note: If your defined names are at the sheet level, not the workbook level, change ‘Book1’ to ‘Sheet1’ or your sheet name. Once you press Enter you should be all set. Remove data rows, add them.. your chart should update automatically and dynamically. Enjoying the dynamic graphing in Excel.

Download a demo spreadsheet – dynamic graph.

Share

Tags: , , ,