Archive for March, 2018

Slicing a data frame in R

Slice data rows in a data frame

Normally, we can easily slice a data frame based on given values in a column. Since data frame can be treated as a list, we can use expression “tbData[[colName]]” to represent a column in a data frame. That makes the program more generic. colName includes the column header information.


tbData[tbData[[colName]]==colValue, ]

The %in% operator in R is very useful. It can be used to test if there are common elements in two vectors, especially useful in character testing. It can be used to:
1) Test if shorter vectors are in longer vectors (ex, 6:10 %in% 1:36);
2) Test which elements of long vectors are in short vector (ex, 1:36 %in% 6:10)
3) Used in character vectors or factors (ex, c(“d”, “e”) %in% c(“a”, “b”, “c”, “d”))
If you want to know the indexes of the specific elements inside a larger vector, use function “which” to achieve this goal. Find the indexes of which elements of (1:36 %in% 1:6) are: which(1:36 %in% 6:10).
By using this function, we can slice a data frame based on a given string or string vector.


myDf[myDf$contrast %in% compstr, ]

Slice data columns in a data frame

In R slicing data columns in a data frame is pretty simple. Either using the index information of columns or directly using column names.
Using index or position of columns to slice the data frame is more generic.


df[c(1, 4:ncol(df))]

Using columns to slice the dat frame is useful when you know the column headers before you slice it.


df[c('c1', 'c5', 'c10')]
Share

Tags: , , , ,

Transpose rows to columns in MySQL

In MySQL, transposing data in rows to columns is not a trivial job, especially when the number of distinct values in a column is undetermined and the values are unknown. However, there is a nice solution for this job. To achieve this goal, the magic is to use group_concat function in MySQL. When the number of distinct values and the values are known before you write the SQL statement, it can be done with the group_concat function. However, to get a nice solution for the undetermined number of unknown values, there are couple of steps: 1) create a sql_segment with group_concat function; 2) create a complete SQL statement on fly; 3) execute the constructed SQL statement.

1.Create a sql_segment with group_concat

Suppose we have a table called table1, it has a number columns, like f1, t1, and v1. We want to transpose the data based on values in f1 and create column headers based on f1 and t1. The following SQL statement can create a piece of SQL segment for using in the next step.

declare _SQL text(4096);
SET SESSION group_concat_max_len = 100000;

SET @_SQL = null;

SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'group_concat(if(a.f1 =  ',
      f1,
      ', if(v1<0, null, round(v1,1)), null)) AS `',
      concat(t1,'_',f1), '`'
    )
) INTO @_SQL
FROM
(
    select distinct f1, t1
    from table1
	group by f1, t1
	order by f1
) d;

2.Create a complete SQL statement

Create the complete SQL statement. All data in table “table1” will be grouped by f2,f3. The simple statement will be like the following.

SET @_SQL 
  = CONCAT('f2,f3,', @_SQL, ' 
	from table1 
	group by f2, f3'
	); 

3.Execute the constructed SQL statement

Use the following statements to execute the SQL statement stored in @_SQL.

prepare s1 from @_SQL;
execute s1;
deallocate prepare s1;    
Share

Tags: , ,

Construct R formula with variable names in a vector automatically

Suppose you have a list of variable names in a vector and you want to construct some standard linear model from the vector, how can we do it in R. Actually it is pretty easy in R. By using two R functions, that is,  formula and paste, linear model can be constructed automatically

1.additive model

PredictorVariables<- c("x1","x2")

Apply approach: We can then construct a formula as follows:
PredictorVariables <- paste(“x”, 1:100, sep=””)
Apply approach: We can then construct a formula as follows:

Formula <- formula(paste("y ~ ", 
     paste(PredictorVariables, collapse=" + ")))
lm(Formula, Data)

2.multiplicative model
Apply approach: We can then construct a formula as follows:
PredictorVariables <- paste(“x”, 1:100, sep=””)
Apply approach: We can then construct a formula as follows:

Formula <- formula(paste("y ~ ", 
     paste(PredictorVariables, collapse=" * ")))
lm(Formula, Data)

If you have more than two variable names in the vector, the mode can be quite complex, including higher order of interactions.

By using the similar idea, you can also include random terms in the constructed model.

Appendix: Get all column headers that fit to certain patterns.
Suppose I have data frame called my.df and there are some column headers include string “trt_”, we can use the following methods to get all columns headers with “trt_” and store them into a vector.


my.colnames <- colnames(my.df)
my.headers <- my.colnames[grepl("trt_", my.colnames)]
Share

Tags: , ,

Several important and useful iPhone/iOS settings

1. Open iPhone with TouchID directly without having to press the Home button again.

[Turn ON] setting-general-accessibility-home button -resetfinger to open

2. Speed up iPhone by reducing motion of the user interface, including the parallox effect of icons

[Turn ON] setting-general-accessibility-reduce motion

3. Clear the list of apps that stay in background and refresh frequently without your notice. The intention of automatic background app refresh is to refresh their content when on WLAN or cellular in the background and accelerate serving the content to you. However, the function consumes cellular data and and battery power. To reduce cellular data usage and preserve battery life, turn off all apps that do not need frequently refresh their contents at the background.

[Turn OFF]setting-general-background app refresh

4.Location services

Change app do not need location services to NEVER.

setting-privacy-location services

Turn off some system services

setting-privacy-location services

[turn OFF] location-based apple ads

[turn OFF] location-based suggestions

[turn OFF] significant locations

[turn OFF] iPhone analytics

[turn OFF] popular near me

[turn OFF] routing and traffic

5. Control Center

Do not allow to access Control Center at locked condition.

[turn OFF] setting-TouchID$Passcode-Control Center

[turn ON] wallet

[turn ON] erase data – erase all data on this iPhone after 10 failed passcode attempts to protect your data on the phone.

Share

Tags: , , ,