Posts Tagged lattice

Use sparse data set to create contour map

Problem: We have a sparse array that include some real data. We want to create a contour map for the data. The data set includes x, y, and the measured values.
Solution: I found the following article that provides a complete solution for this problem. Following the link below to read the article.

Software for Exploratory Data Analysis and Statistical Modelling

Then I wrote a R script to create contour map of my measured data.

temp.df = data.frame(y = mydata$row,
x = mydata$col,
z = mydata$temp)
temp.loess = loess(z ~ x*y, data = temp.df, degree = 2, span = 1) = expand.grid(list(x = seq(1, 16, 0.1), y = seq(1, 4, 0.1)))
z = predict(temp.loess, newdata =$Height = as.numeric(z)

# basic image
image(seq(1, 16, 0.1), seq(1, 4, 0.1), z,
   xlab = "X Coordinate", ylab = "Y Coordinate",
   main = "Surface temp data")

# lattice plot
levelplot(Height ~ x*y, data =,
   xlab = "col", ylab = "row",
   main = "Surface map",
   col.regions = terrain.colors(100)

Tags: , , , , ,

Data visualization and outlier detection

Any dataset can potentially have outliers. To get good results through statistical analysis, outliers should be always excluded. There are multiple ways to do that. One way is to create box whisker plot and visually and manually find them.

Basic concept
What do the box and whiskers represent in box whisker plot? The box represents the distance between the 1st and 3rd quartiles. The whiskers show the highest and lowest data points or 1.5 times the box (Q3-Q1). Outlier points are those that are greater than 1.5 times (Q3 -Q1).

One data set includes several samples of seed weight of each genotype in an experiment with a large number of varieties. For a given genotype, a box whisker plot can be plotted and outliers can be visually find. Here we show how to use R to get this done. We utilize lattice library to generate separate box whisker graph per genotype. Suppose the dataset is stored in file “c:/seedweight.csv” and seed weight in column called VALUE and genotype in column called GENOTYPE. Here is the R commands to generate bwplots..

seedweight<- read.csv("C:/seedweight.csv")
bwplot(~ VALUE | factor(GENOTYPE), seedweight)

Click to download the data file - seedweight

Second method
There is an individual leaf area dataset from corn plants. The relationship between leaf number and leaf area from individual plant is curve linear, well defined bell shape. Since the relationship is known, it can be utilized to detect data errors and outliers. Suppose the dataset is stored in file "c:/indla.csv". There are three columns, id, ln, and la, in the file. Here is R commands to generate scatter plot for individual plants.

> xyplot(la ~ ln|id, la, type="b")

Click to download the data file - indla


Tags: , , , ,