Posts Tagged column

Slicing a data frame in R

Slice data rows in a data frame

Normally, we can easily slice a data frame based on given values in a column. Since data frame can be treated as a list, we can use expression “tbData[[colName]]” to represent a column in a data frame. That makes the program more generic. colName includes the column header information.


tbData[tbData[[colName]]==colValue, ]

The %in% operator in R is very useful. It can be used to test if there are common elements in two vectors, especially useful in character testing. It can be used to:
1) Test if shorter vectors are in longer vectors (ex, 6:10 %in% 1:36);
2) Test which elements of long vectors are in short vector (ex, 1:36 %in% 6:10)
3) Used in character vectors or factors (ex, c(“d”, “e”) %in% c(“a”, “b”, “c”, “d”))
If you want to know the indexes of the specific elements inside a larger vector, use function “which” to achieve this goal. Find the indexes of which elements of (1:36 %in% 1:6) are: which(1:36 %in% 6:10).
By using this function, we can slice a data frame based on a given string or string vector.


myDf[myDf$contrast %in% compstr, ]

Slice data columns in a data frame

In R slicing data columns in a data frame is pretty simple. Either using the index information of columns or directly using column names.
Using index or position of columns to slice the data frame is more generic.


df[c(1, 4:ncol(df))]

Using columns to slice the dat frame is useful when you know the column headers before you slice it.


df[c('c1', 'c5', 'c10')]
Share

Tags: , , , ,

MySQL transpose function of aggregation

An article published at the following providing a lot of examples.

http://www.artfulsoftware.com/infotree/queries.php

At the stackoverflow website there is an example as below. It helps me a lot.

The following text from the site.

Question:

Here is what my current mysql table looks like:

PunchID EmpID PunchEvent PunchDateTime
1 0456 clockin 5/14/2013 8:36:26 AM
48 0456 breakout 5/14/2013 12:01:29 PM
53 0456 breakin 5/14/2013 12:28:31 PM
54 0456 clockout 5/14/2013 2:28:33 PM
57 0456 clockin 5/15/2013 7:38:34 AM
58 0456 breakout 5/15/2013 7:38:39 AM
59 0456 breakin 5/15/2013 7:38:41 AM
60 0456 clockout 5/15/2013 7:38:42 AM
Now I want to return a result set that is grouped based on the day of the week like so:

Day ClockIn BreakOut BreakIn ClockOut
Tuesday 8:36:26 AM 12:01:29 PM 12:28:31 PM 2:28:33 PM
Wednesday 7:38:34 AM etc, etc…
This is my current query. But it only returns the first punch for each day.

Answer:

SELECT DATE_FORMAT(PunchDateTime, ‘%W’) DAY,
MAX(CASE WHEN PunchEvent = ‘ClockIn’ THEN DATE_FORMAT(PunchDateTime, ‘%r’) END) ClockIn,
MAX(CASE WHEN PunchEvent = ‘BreakOut’ THEN DATE_FORMAT(PunchDateTime, ‘%r’) END) BreakOut,
MAX(CASE WHEN PunchEvent = ‘BreakIn’ THEN DATE_FORMAT(PunchDateTime, ‘%r’) END) BreakIn,
MAX(CASE WHEN PunchEvent = ‘ClockOut’ THEN DATE_FORMAT(PunchDateTime, ‘%r’) END) ClockOut
FROM tableName
WHERE EmpID = 456
GROUP BY DATE_FORMAT(PunchDateTime, ‘%W’)
ORDER BY PunchDateTime
SQLFiddle Demo
OUTPUT

╔═══════════╦═════════════╦═════════════╦═════════════╦═════════════╗
║ DAY ║ CLOCKIN ║ BREAKOUT ║ BREAKIN ║ CLOCKOUT ║
╠═══════════╬═════════════╬═════════════╬═════════════╬═════════════╣
║ Tuesday ║ 08:36:26 AM ║ 12:01:29 PM ║ 12:28:31 PM ║ 02:28:33 PM ║
║ Wednesday ║ 07:38:34 AM ║ 07:38:39 AM ║ 07:38:41 AM ║ 07:38:42 AM ║
╚═══════════╩═════════════╩═════════════╩═════════════╩═════════════╝

Share

Tags: , , , ,

MySQL transpose data from column to row and vice versa

Update on August 6, 2013

Dynamic pivot tables (transform rows to columns) provides a nice way to generate dynamic pivot table. I use this to write a Stored Procedure, which works like a charm.

=============================

I was looking for how to transpose MySQL table from column to row or from row to column and came across the following two posts online. They taught me how to achieve the job nicely.

1. Pivot table basics: rows to columns (row to column). The following was copied and pasted here.

From table tbl( class, member ), you want to cross-tabulate all classes with their members. In SQL terms, youaggregate members over classes. In MySQL:

SELECT class,GROUP_CONCAT(member)
FROM tbl
GROUP BY class;

With that simple query, you’re halfway toward cross-tabulation, halfway to implementing a simple CUBE, and halfway to basic entity-attribute-value (EAV) logic. This is easier to see if we have two columns, rather than just one, to tabulate against the grouping column:

DROP TABLE IF EXISTS tbl;
CREATE TABLE tbl( id INT, colID INT, value CHAR(20) );
INSERT INTO tbl VALUES
(1,1,'Sampo'),(1,2,'Kallinen'),(1,3,'Office Manager'),
(2,1,'Jakko'),(2,2,'Salovaara'),(2,3,'Vice President');

To tabulate all colID and value values against all id values—that is, to write a reporting CUBE for the table—write aGROUP_CONCAT() instruction for each colID found in the table, then GROUP BY id:

SELECT
id,
GROUP_CONCAT(if(colID = 1, value, NULL)) AS 'First Name',
GROUP_CONCAT(if(colID = 2, value, NULL)) AS 'Last Name',
GROUP_CONCAT(if(colID = 3, value, NULL)) AS 'Job Title'
FROM tbl
GROUP BY id;
+------+------------+-----------+----------------+
| id   | First Name | Last Name | Title          |
+------+------------+-----------+----------------+
|    1 | Sampo      | Kallinen  | Office Manager |
|    2 | Jakko      | Salovaara | Vice President |
+------+------------+-----------+----------------+

Since Paul Spinks first taught a spreadsheet how to do that cross-tabulation trick in 1979, this has also been known as a pivot table: we pivot colID and value against ID.

(Of course for a proper EAV representation, we’d add an attributes table:

DROP TABLE IF EXISTS attrs;
CREATE TABLE attrs(colID INT,attr CHAR(12));
INSERT INTO attrs VALUES (1,'First Name'),(2,'Last Name'),(3,'Title');

and write a stored procedure to PREPARE the above query from table-based attribute names.)

More often, crosstab queries calculate. Here is a simple sales table:

DROP TABLE IF EXISTS sales;
CREATE TABLE Sales (empID INT, yr SMALLINT, sales DECIMAL(10,2));
INSERT sales VALUES
(1, 2005, 12000),(1, 2006, 18000),(1, 2007, 25000),
(2, 2005, 15000),(2, 2006, 6000),(3, 2006, 20000),(3, 2007, 24000);

In Microsoft SQL Server, CUBE/PIVOT syntax for horizontal and vertical totals is:

SELECT EmpId, [2005], [2006], [2007], [ALL]
FROM (
SELECT
CASE WHEN GROUPING(EmpId)=0 THEN CAST(EmpId AS CHAR(7)) ELSE 'ALL' END AS EmpId,
CASE WHEN GROUPING(Yr)=0 THEN CAST(Yr AS CHAR(7)) ELSE 'ALL' END AS Yr,
SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH CUBE
) AS s
PIVOT( SUM(Sales) FOR Yr IN ([2005], [2006], [2007], [ALL]) ) AS p

In MySQL, it’s a two-step:

1. In an inner query, write one aggregating expression per reporting column,
2. In an outer query, build the horizontal sums:

SELECT
IFNULL(empId,'Totals') AS EmpId,       -- outer query labels rollup row
sums.2005, sums.2006, sums.2007,       -- and calculates horizontal sums
sums.2005 + sums.2006 + sums.2007 AS Sums
FROM (                                   -- inner query groups by employee
SELECT                                 -- with an expression for each column
EmpID,
SUM(IF(Yr=2005,sales,0)) As '2005',
SUM(IF(Yr=2006,sales,0)) As '2006',
SUM(IF(Yr=2007,sales,0)) As '2007'
FROM Sales
GROUP BY EmpID WITH ROLLUP
) AS sums;
+--------+----------+----------+----------+-----------+
| EmpId  | 2005     | 2006     | 2007     | Sums      |
+--------+----------+----------+----------+-----------+
| 1      | 12000.00 | 18000.00 | 25000.00 |  55000.00 |
| 2      | 15000.00 |  6000.00 |     0.00 |  21000.00 |
| 3      |     0.00 | 20000.00 | 24000.00 |  44000.00 |
| Totals | 27000.00 | 44000.00 | 49000.00 | 120000.00 |
+--------+----------+----------+----------+-----------+

That approach works beautifully for the common need to tabulate monthly amounts by year, say from an order history table orderhist(orderdate date,amount decimal(10,2)):

SELECT
IfNull(Year,'Totals') Year,
Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, `Dec`,
Qty AS Count,
Yrly as 'Yrly Total'
FROM (
SELECT
year(orderdate) AS 'Year',
Round(Sum(CASE WHEN Month(orderdate)= 1 THEN amount ELSE 0 END),2) AS Jan,
Round(Sum(CASE WHEN Month(orderdate)= 2 THEN amount ELSE 0 END),2) AS Feb,
Round(Sum(CASE WHEN Month(orderdate)= 3 THEN amount ELSE 0 END),2) AS Mar,
Round(Sum(CASE WHEN Month(orderdate)= 4 THEN amount ELSE 0 END),2) AS Apr,
Round(Sum(CASE WHEN Month(orderdate)= 5 THEN amount ELSE 0 END),2) AS May,
Round(Sum(CASE WHEN Month(orderdate)= 6 THEN amount ELSE 0 END),2) AS Jun,
Round(Sum(CASE WHEN Month(orderdate)= 7 THEN amount ELSE 0 END),2) AS Jul,
Round(Sum(CASE WHEN Month(orderdate)= 8 THEN amount ELSE 0 END),2) AS Aug,
Round(Sum(CASE WHEN Month(orderdate)= 9 THEN amount ELSE 0 END),2) AS Sep,
Round(Sum(CASE WHEN Month(orderdate)=10 THEN amount ELSE 0 END),2) AS Oct,
Round(Sum(CASE WHEN Month(orderdate)=11 THEN amount ELSE 0 END),2) AS Nov,
Round(Sum(CASE WHEN Month(orderdate)=12 THEN amount ELSE 0 END),2) AS `Dec`,
Count(*) AS Qty,
Round(Sum(amount),2) AS Yrly
FROM orderhist
GROUP BY year
WITH ROLLUP
) AS sums ;

More likely the required sums need to be calculated across joins. Build them in using a good text editor. Here is the above query for the orders and orderdetails table in the Northwind database:

SELECT
IfNull(Year,'Totals') Year,
Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, `Dec`,
quantity AS Count,
Yrly as 'Yrly Total'
FROM (
SELECT
year(orderdate) AS 'Year',
Round(Sum(If( Month(o.orderdate)= 1, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS Jan,
Round(Sum(If( Month(o.orderdate)= 2, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS Feb,
Round(Sum(If( Month(o.orderdate)= 3, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS Mar,
Round(Sum(If( Month(o.orderdate)= 4, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS Apr,
Round(Sum(If( Month(o.orderdate)= 5, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS May,
Round(Sum(If( Month(o.orderdate)= 6, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS Jun,
Round(Sum(If( Month(o.orderdate)= 7, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS Jul,
Round(Sum(If( Month(o.orderdate)= 8, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS Aug,
Round(Sum(If( Month(o.orderdate)= 9, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS Sep,
Round(Sum(If( Month(o.orderdate)=10, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS Oct,
Round(Sum(If( Month(o.orderdate)=11, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS Nov,
Round(Sum(If( Month(o.orderdate)=12, (d.unitprice*d.quantity)-d.discount, 0 )), 2 ) AS `Dec`,
Count(*) AS quantity,
Round(Sum((unitprice*quantity)-discount),2) AS Yrly
FROM orders o
JOIN orderdetails d USING(orderID)
GROUP BY year
WITH ROLLUP
) AS sums ;

With many columns or subtotals, pivot tables get complicated, time-consuming and error-prone. Automation is needed. Oracle and Microsoft SQL Server have CUBE syntax to simplify the job, especially for big pivot tables. MySQL doesn’t. See “Automate pivot table queries” for how to roll your own cube with MySQL.

2. Transpose a row into columns with MySQL without using UNIONS? (column to row)

Question:

I have a table that is similar to the following below:

       id |        cat |         one_above |        top_level | 
        0    'printers'          'hardware'        'computers'

I want to be able to write a query, without using unions, that will return me a result set that transposes this table’s columns into rows. What this means, is that I want the result to be:

       id |          cat |
        0      'printers'
        0      'hardware'
        0     'computers'

Is this possible in MySQL? I can not drop down to the application layer and perform this because I’m feeding these into a search engine that will index based on the id. Various other DBMS have something like PIVOT and UNPIVOT. I would appreciate any insight to something that I’m missing.

Answer:

I got this out of the book The Art of SQL, pages 284-286:

Let’s say your table name is foo.

First, create a table called pivot:

CREATE Table pivot (
  count int
);

Insert into that tables as many rows as there are columns that you want to pivot in foo. Since you have three columns in foo that you want to pivot, create three rows in the pivot table:

insert into pivot values (1);
insert into pivot values (2);
insert into pivot values (3);

Now do a Cartesian join between foo and pivot, using a CASE to select the correct column based on the count:

SELECT foo.id, Case pivot.count
  When 1 Then cat
  When 2 Then one_above
  When 3 Then top_level
End Case
FROM foo JOIN pivot;

This should give you what you want.

 

Share

Tags: , , , ,