Posts Tagged transpose

Transpose rows to columns in MySQL

In MySQL, transposing data in rows to columns is not a trivial job, especially when the number of distinct values in a column is undetermined and the values are unknown. However, there is a nice solution for this job. To achieve this goal, the magic is to use group_concat function in MySQL. When the number of distinct values and the values are known before you write the SQL statement, it can be done with the group_concat function. However, to get a nice solution for the undetermined number of unknown values, there are couple of steps: 1) create a sql_segment with group_concat function; 2) create a complete SQL statement on fly; 3) execute the constructed SQL statement.

1.Create a sql_segment with group_concat

Suppose we have a table called table1, it has a number columns, like f1, t1, and v1. We want to transpose the data based on values in f1 and create column headers based on f1 and t1. The following SQL statement can create a piece of SQL segment for using in the next step.

declare _SQL text(4096);
SET SESSION group_concat_max_len = 100000;

SET @_SQL = null;

SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'group_concat(if(a.f1 =  ',
      f1,
      ', if(v1<0, null, round(v1,1)), null)) AS `',
      concat(t1,'_',f1), '`'
    )
) INTO @_SQL
FROM
(
    select distinct f1, t1
    from table1
	group by f1, t1
	order by f1
) d;

2.Create a complete SQL statement

Create the complete SQL statement. All data in table “table1” will be grouped by f2,f3. The simple statement will be like the following.

SET @_SQL 
  = CONCAT('f2,f3,', @_SQL, ' 
	from table1 
	group by f2, f3'
	); 

3.Execute the constructed SQL statement

Use the following statements to execute the SQL statement stored in @_SQL.

prepare s1 from @_SQL;
execute s1;
deallocate prepare s1;    
Share

Tags: , ,

Clustering in R

There is a nice post at https://stackoverflow.com/questions/15376075/cluster-analysis-in-r-determine-the-optimal-number-of-clusters that provide a number of examples to illustrate how to determine number of clusters.

Below are two examples:

# prepare data from a dataframe
# var1 includes one group
# var2 includes second group
# var3 includes all the values used in clustering
# transpose the dataframe data to matrix
kmdata <- acast(my.data, var1 ~ var2, value.var='var3')
# kmeans
my.cluster <- cascadeKM(kmdata, inf.gr = 1, sup.gr = nrow(kmdata)-1)
plot(my.cluster, sortg = TRUE, grpmts.plot = TRUE)
calinski.best <- as.numeric(which.max(my.cluster$results[2,]))
cat("Calinski criterion optimal number of clusters:", calinski.best, "\n")
# sum of square error
wss <- (nrow(kmdata)-1)*sum(apply(kmdata,2,var))
for (i in 2:(nrow(kmdata)-1)) wss[i] <- sum(kmeans(kmdata,
centers=i)$withinss)
plot(1:(nrow(kmdata)-1), wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")
my.cluster2 <- kmeans(kmdata, 4)

Here is another nice article talking about clustering (http://www.sthda.com/english/wiki/cluster-analysis-in-r-unsupervised-machine-learning)

Share

Tags: , , , ,

MySQL transpose function of aggregation

An article published at the following providing a lot of examples.

http://www.artfulsoftware.com/infotree/queries.php

At the stackoverflow website there is an example as below. It helps me a lot.

The following text from the site.

Question:

Here is what my current mysql table looks like:

PunchID EmpID PunchEvent PunchDateTime
1 0456 clockin 5/14/2013 8:36:26 AM
48 0456 breakout 5/14/2013 12:01:29 PM
53 0456 breakin 5/14/2013 12:28:31 PM
54 0456 clockout 5/14/2013 2:28:33 PM
57 0456 clockin 5/15/2013 7:38:34 AM
58 0456 breakout 5/15/2013 7:38:39 AM
59 0456 breakin 5/15/2013 7:38:41 AM
60 0456 clockout 5/15/2013 7:38:42 AM
Now I want to return a result set that is grouped based on the day of the week like so:

Day ClockIn BreakOut BreakIn ClockOut
Tuesday 8:36:26 AM 12:01:29 PM 12:28:31 PM 2:28:33 PM
Wednesday 7:38:34 AM etc, etc…
This is my current query. But it only returns the first punch for each day.

Answer:

SELECT DATE_FORMAT(PunchDateTime, ‘%W’) DAY,
MAX(CASE WHEN PunchEvent = ‘ClockIn’ THEN DATE_FORMAT(PunchDateTime, ‘%r’) END) ClockIn,
MAX(CASE WHEN PunchEvent = ‘BreakOut’ THEN DATE_FORMAT(PunchDateTime, ‘%r’) END) BreakOut,
MAX(CASE WHEN PunchEvent = ‘BreakIn’ THEN DATE_FORMAT(PunchDateTime, ‘%r’) END) BreakIn,
MAX(CASE WHEN PunchEvent = ‘ClockOut’ THEN DATE_FORMAT(PunchDateTime, ‘%r’) END) ClockOut
FROM tableName
WHERE EmpID = 456
GROUP BY DATE_FORMAT(PunchDateTime, ‘%W’)
ORDER BY PunchDateTime
SQLFiddle Demo
OUTPUT

╔═══════════╦═════════════╦═════════════╦═════════════╦═════════════╗
║ DAY ║ CLOCKIN ║ BREAKOUT ║ BREAKIN ║ CLOCKOUT ║
╠═══════════╬═════════════╬═════════════╬═════════════╬═════════════╣
║ Tuesday ║ 08:36:26 AM ║ 12:01:29 PM ║ 12:28:31 PM ║ 02:28:33 PM ║
║ Wednesday ║ 07:38:34 AM ║ 07:38:39 AM ║ 07:38:41 AM ║ 07:38:42 AM ║
╚═══════════╩═════════════╩═════════════╩═════════════╩═════════════╝

Share

Tags: , , , ,