|
|
|
|
Graph Digitization by MATLAB
Zhanshan Dong
Sometimes we need digitize graphs from scientific papers because we can not get the original data. The digitized data can be used to fit model or redraw graphs. How can we achieve the goal? At the beginning I think we can do that by reading the image data and classifying the data by a given threshhold. Then put the data to spreadsheet (EXCEL) and do further precessing. After I tried, I found it was difficult to do it in EXCEL.
I turned to MATLAB. I found there are several general functions in MATLAB that can handle this problem very easily. By using imread function to read the image data to memory, and image function to show the image, then ginput to get data points in the graphs. After all data point is digitized, use linear regression method to transfer the x and y coordiate values to actual values with the units used in the graph. Then show the data again and save them to text files.
The following code is a complete m function. You can modify it to fit your problem. The arguments of the function are:
imgfilename : the filename of image file, must in the same directory as the m file is;
xp : the actual values of X axis, a row vector;
yp : the actual values of Y axis, a row vector;
line1 : the name of the first line, a string;
line2 : the name of the second line, a string;
The results will be saved to two seperate text files, one file per line. The first column is X value, the second column is Y values. In my case, X is time and Y is gene expression level.
function digitize(imgfilename,xp,yp,line1,line2)
a = imread([imgfilename '.bmp']);
image(a);
xl=length(xp);
yl=length(yp);
[X,TOC1]=ginput(yl);
[TIME,Y]=ginput(xl);
[XLer,YLer]=ginput;
[Xlhy,Ylhy]=ginput;
A1=[ones(size(TIME)) TIME];
C1=A1\xp';
A2=[ones(size(TOC1)) TOC1];
C2=A2\yp';
XLerTime=[ones(size(XLer)) XLer]*C1;
YLerToc1=[ones(size(YLer)) YLer]*C2;
XlhyTime=[ones(size(Xlhy)) Xlhy]*C1;
YlhyToc1=[ones(size(Ylhy)) Ylhy]*C2;
plot(XLerTime,YLerToc1);
hold on;
plot(XlhyTime,YlhyToc1);
DATA1=[XLerTime YLerToc1];
DATA2=[XlhyTime YlhyToc1];
save([imgfilename line1 '.txt'],'DATA1','-ascii');
save([imgfilename line2 '.txt'],'DATA2','-ascii');
|
After the simple version was finished. I told my major professor and he suggest me to do some statistical tests for the digitizing processing. I rewrote the program. The source code is listed below. In this version, argument 'lines' has changed to a cell array, it can accommodate as many lines as possible according to your specific graphs.
function digitize(imgfilename,xp,yp,lines)
%
% This function can digitize as many lines as you indicate in 'lines'
% imgfilename - the filename of image file, a string
% xp - the actual data value of data points in X axis, a row vector
% yp - the actual data value of data points in Y axis, a row vector
% lines - the string name of lines, a cell array
%
a = imread([imgfilename '.bmp']);
image(a);
% number of data points in X axis
xl=length(xp);
% number of data points in Y axis
yl=length(yp);
% number of lines
nline = length(lines);
% number of times to digitize axes
naxes = 3;
Y = []; % all Ys put into one vector
YY = []; % all Ys put into a matrix
% get the data points in Y axis for several times
for i=1:naxes
[X,y]=ginput(yl);
Y = [Y;y];
YY = [YY y];
end
% pool the data points from three repetition together
A=[ones(size(Y)) Y];
% linear regression model for Y axis
yps = repmat(yp,1,naxes);
CY=A\yps';
% calculate the earror value
ym = mean(YY,2); % mean for each data point
for i=1:size(YY,2), YY(:,i)=YY(:,i)-ym;, end
yy = reshape(YY,1,size(YY,1)*size(YY,2));
yse = std(yy)
yse = CY(2) * yse
X = []; % all Ys put into one vector
XX = []; % all Ys put into a matrix
% get the data points in X axis for several times
for i=1:naxes
[x,Y]=ginput(xl);
X = [X;x];
XX = [XX x];
end
% pool the data points from three repetition together
A=[ones(size(X)) X];
% linear regression model for Y axis
xps = repmat(xp,1,naxes);
CX=A\xps';
% calculate the earror value
xm = mean(XX,2); % mean for each data point
for i=1:size(XX,2), XX(:,i)=XX(:,i)-xm;, end
xx = reshape(XX,1,size(XX,1)*size(XX,2));
xse = std(xx)
xse = CX(2) * xse
% get the data points for all lines
% after digitization of one line, press return key to advance to next line
XLine = cells(nline,1);
YLine = cells(nline,1);
for i = 1:nline
[xLine,yLine]=ginput;
XLine{i}=[ones(size(xLine)) xLine]*CX;
YLine{i}=[ones(size(yLine)) YLine]*CY;
end
figure(2);
for i = 1:nline
plot(XLine{i},YLine{i});
hold on;
DATA=[XLine{i} YLine{i}];
save([imgfilename lines{i} '.txt'],'DATA','-ascii');
end
|
This version really aspired me. It is not easy to use it because if you input some thing wrong. You need stop the program and run it again until you did everything right. MATLAB provide a tool, GUIDE, to make graphic user interface. Just by trying, I started to make a UI program to digitize the graph. And I succeeded. See the following screenshot.
In the GUI version, users can do the digitization one by one. The order is not very important. You can digitize Y axis, ten X axis. After that test the perpendicular property of two axes. Then digitize lines one by one as long as you give a different name for a line. The program can save the digitized data to different files. You do not need to do it no break. You can do it pierce by pierce as long as the window is not closed. This version give user the maximum flexibility to finish the work.

Because the source code of graphic user interface is too long. I just put it into the zip file. Please download the program and run it under MATLAB 6.5.
Download Source Code
|
|