Archive for category MATLAB

Matlab Toolbox: Fuzzy Clustering and Data Analysis Toolbox

I googled “fuzzy clustering and matlab” and wanted to find some tools for data clustering today. I came across a very good MATLAB toolbox at http://www.fmt.vein.hu/softcomp/fclusttoolbox/. In the web page, you can find source code and a detailed documentation that describes several popular clustering methods and validation methods, too. The Toolbox is a collection of MATLAB functions. The toolbox provides four categories of functions: 1) Clustering algorithms, 2) Evaluation with cluster prototypes, 3) Validation and 4) Visualization (See the following figure). In addition, it also provides some real examples that you can use them as start point for your project. To learn more you can visit author’s website or download source code and documentation here.

fuzzysam

MATLAB Source code – ClusteringToolbox.zip
Toolbox documentation – FuzzyClusteringToolbox

Share

Tags: , , ,

Graph Digitization in MATLAB

Sometimes we need digitize graphs from scientific papers because we can not get the original data. The digitized data can be used to fit model or redraw graphs. How can we achieve the goal? At the beginning I think we can do that by reading the image data and classifying the data by a given threshhold. Then put the data to spreadsheet (EXCEL) and do further precessing. After I tried, I found it was difficult to do it in EXCEL.

I turned to MATLAB. I found there are several general functions in MATLAB that can handle this problem very easily. By using imread function to read the image data to memory, and image function to show the image, then ginput to get data points in the graphs. After all data point is digitized, use linear regression method to transfer the x and y coordiate values to actual values with the units used in the graph. Then show the data again and save them to text files.

The following code is a complete m function. You can modify it to fit your problem. The arguments of the function are:

  1. imgfilename : the filename of image file, must in the same directory as the m file is;
  2. xp : the actual values of X axis, a row vector;
  3. yp : the actual values of Y axis, a row vector;
  4. line1 : the name of the first line, a string;
  5. line2 : the name of the second line, a string;

The results will be saved to two seperate text files, one file per line. The first column is X value, the second column is Y values. In my case, X is time and Y is gene expression level.

function digitize(imgfilename,xp,yp,line1,line2)
a = imread([imgfilename '.bmp']);
image(a);
xl=length(xp);
yl=length(yp);
[X,TOC1]=ginput(yl);
[TIME,Y]=ginput(xl);
[XLer,YLer]=ginput;
[Xlhy,Ylhy]=ginput;
A1=[ones(size(TIME)) TIME];
C1=A1\xp';
A2=[ones(size(TOC1)) TOC1];
C2=A2\yp';
XLerTime=[ones(size(XLer)) XLer]*C1;
YLerToc1=[ones(size(YLer)) YLer]*C2;
XlhyTime=[ones(size(Xlhy)) Xlhy]*C1;
YlhyToc1=[ones(size(Ylhy)) Ylhy]*C2;
plot(XLerTime,YLerToc1);
hold on;
plot(XlhyTime,YlhyToc1);
DATA1=[XLerTime YLerToc1];
DATA2=[XlhyTime YlhyToc1];
save([imgfilename line1 '.txt'],'DATA1','-ascii');
save([imgfilename line2 '.txt'],'DATA2','-ascii');

After the simple version was finished. I told my major professor and he suggest me to do some statistical tests for the digitizing processing. I rewrote the program. The source code is listed below. In this version, argument ‘lines’ has changed to a cell array, it can accommodate as many lines as possible according to your specific graphs.

function digitize(imgfilename,xp,yp,lines)
% 
% This function can digitize as many lines as you indicate in 'lines'
% imgfilename - the filename of image file, a string
% xp - the actual data value of data points in X axis, a row vector
% yp - the actual data value of data points in Y axis, a row vector
% lines - the string name of lines, a cell array
%
a = imread([imgfilename '.bmp']);
image(a);
% number of data points in X axis
xl=length(xp);
% number of data points in Y axis
yl=length(yp);
% number of lines
nline = length(lines);
% number of times to digitize axes
naxes = 3;

Y = []; % all Ys put into one vector
YY = []; % all Ys put into a matrix
% get the data points in Y axis for several times
for i=1:naxes
    [X,y]=ginput(yl);
    Y = [Y;y];
    YY = [YY y];
end
% pool the data points from three repetition together
A=[ones(size(Y)) Y];
% linear regression model for Y axis
yps = repmat(yp,1,naxes);
CY=A\yps';
% calculate the earror value
ym = mean(YY,2); % mean for each data point
for i=1:size(YY,2), YY(:,i)=YY(:,i)-ym;, end
yy = reshape(YY,1,size(YY,1)*size(YY,2));
yse = std(yy)
yse = CY(2) * yse


X = []; % all Ys put into one vector
XX = []; % all Ys put into a matrix
% get the data points in X axis for several times
for i=1:naxes
    [x,Y]=ginput(xl);
    X = [X;x];
    XX = [XX x];
end
% pool the data points from three repetition together
A=[ones(size(X)) X];
% linear regression model for Y axis
xps = repmat(xp,1,naxes);
CX=A\xps';
% calculate the earror value
xm = mean(XX,2); % mean for each data point
for i=1:size(XX,2), XX(:,i)=XX(:,i)-xm;, end
xx = reshape(XX,1,size(XX,1)*size(XX,2));
xse = std(xx)
xse = CX(2) * xse

% get the data points for all lines
% after digitization of one line, press return key to advance to next line
XLine = cells(nline,1);
YLine = cells(nline,1);
for i = 1:nline
    [xLine,yLine]=ginput;
    XLine{i}=[ones(size(xLine)) xLine]*CX;
    YLine{i}=[ones(size(yLine)) YLine]*CY;
end
figure(2);
for i = 1:nline
    plot(XLine{i},YLine{i});
    hold on;
    DATA=[XLine{i} YLine{i}];
    save([imgfilename lines{i} '.txt'],'DATA','-ascii');
end

This version really aspired me. It is not easy to use it because if you input some thing wrong. You need stop the program and run it again until you did everything right. MATLAB provide a tool, GUIDE, to make graphic user interface. Just by trying, I started to make a UI program to digitize the graph. And I succeeded. See the following screenshot.

In the GUI version, users can do the digitization one by one. The order is not very important. You can digitize Y axis, ten X axis. After that test the perpendicular property of two axes. Then digitize lines one by one as long as you give a different name for a line. The program can save the digitized data to different files. You do not need to do it no break. You can do it pierce by pierce as long as the window is not closed. This version give user the maximum flexibility to finish the work.
digitizeit
Because the source code of graphic user interface is too long. I just put it into the zip file. Please download the program and run it under MATLAB 6.5. Download all source code here – digitizeit

Reference

Share

Tags: , , , , , , ,

Build a boolean network in MATLAB

To build a boolean network is pretty simple in MATLAB. It does not need ODE function to solve the problem. Just put the rate of change logic equations in a ordinary equation, to calculate several steps (step size always 1). You can get the answer. The sample code is provided here.

function test;
a = 1;
b = 1;
c = 1;
for i = 1:10
    [a, b, c] = calabc(a, b, c);
    [a b c]
end

function [a,b,c] = calabc(a0, b0, c0)
a = b0;
b = a0 | c0;
c = (a0 & b0) | (b0 & c0) | (a0 & c0);
Share

Tags: ,

Concept of using multiple PCs within LAN for parallel computing and its implementation in MATLAB

我在使用MATLAB来实现一个动态模拟模型时遇到了一个令我十分头疼的事,那就是模型的运行时间过长,如果使用优化软件包搜寻最优参数,只用单机需要很长时间才能完成。开始的时候我的模拟运行一遍需要近3个小时,如果搜寻优化参数,这估计需要几个月甚至几年的时间。这种情况是不能接受的,我只有另外寻找其他途径。

优化模型
使用MATLAB提供的Profiler工具,我找到了模型中最消耗时间的部分,对其进行优化,得到了十分满意的效果。主要对如下方面进行了优化:

  1. 减少函数的调用,不仅仅指自己写的函数,对内建函数的调用也要尽可能地减少;
  2. 不用或少用全局变量以及结构变量;
  3. 预先申请数据空间,减少动态改变数组大小的语句;
  4. 将所有调用函数集中到一个函数中,使用mcc命令将其编译成mex格式的可执行文件;
  5. 尽可能将循环化为向量,然后进行矩阵运算;

并行计算
MATLAB并没有提供并行计算的功能,但是我们可以想出办法来实现并行计算。在我办公室的局域网中有一个11台计算机的集合体,只要有一种办法让每个计算机干一小部分任务,从而可以实现并行计算。我的实现办法十分简单,在主机(master)上就是将任务分割成几个等分,分别将他们写入到文件中,在奴役(slave)机上读取这些文件,并进行计算。奴役机一直在监视主机分配的任务,直到主机发出任务完成的命令。

数据分发
在分发数据或读取结果时可能会出现某种冲突,比如这里主程序刚建立文件,奴役程序就开始读取数据,结果是可想而知的,程序会中断运行。为了解决这个问题,可以采用如下的策略,先在盘上写一个小文件,然后再写数据文件,等数据文件写完后,再删除这个小文件。在另一个程序中,先判断数据文件是否就绪,如果就绪,再判断那个作为指标的小文件是否存在,如果不存在,说明程序可以安全度取数据了。使用这种策略,将不会发生任何冲突。下面是两个示例程序,一个是主程序,一个奴役程序。

MASTER.M

function [dap] = master(x,xdata,nummachines)
if nargin < 3
    nummachines = 6;
end

for i=2:nummachines
    save(['slave' num2str(i) 'ok.txt'],'i','-ascii');
end
switch nummachines
    case 1
        slave1data = xdata;
    case 4
        slave1data = xdata(:,1:28);
        slave2data = xdata(:,29:48);
        slave3data = xdata(:,49:60);
        slave4data = xdata(:,61:end);
    case 6
        slave1data = xdata(:,1:20);
        slave2data = xdata(:,21:36);
        slave3data = xdata(:,37:47);
        slave4data = xdata(:,48:55);
        slave5data = xdata(:,56:66);
        slave6data = xdata(:,67:end);
end

for i=2:nummachines
    save(['slave' num2str(i) 'x.txt'],'x','-ascii');
    save(['slave' num2str(i) 'data.txt'],['slave' num2str(i) 'data'],'-ascii');
end

for i=2:nummachines
    delete(['slave' num2str(i) 'ok.txt']);
end

dap = atclocktrainfun(x,slave1data);

for i=2:nummachines
    while ~exist(['slave' num2str(i) 'result.txt'])
    end
end

for i=2:nummachines
    while exist(['slave' num2str(i) 'resultok.txt'])
    end
end

for i=2:nummachines
    dap = [dap load(['slave' num2str(i) 'result.txt'])];
    delete(['slave' num2str(i) 'result.txt']);
end

SLAVE.M

function slave(i)
i
STOP = 0;
save(['slave' num2str(i) 'resultok.txt'],'i','-ascii');
while ~STOP
    STOP = exist('stop.txt');
    if STOP, break; end

    while ~exist(['slave' num2str(i) 'x.txt'])
        STOP = exist('stop.txt');
        if STOP, break;end
    end
    
    while ~exist(['slave' num2str(i) 'data.txt']);
        STOP = exist('stop.txt');
        if STOP, break;end
    end

    while exist(['slave' num2str(i) 'ok.txt']);
        STOP = exist('stop.txt');
        if STOP, break;end
    end
    
    x = load(['slave' num2str(i) 'x.txt']);
    xdata = load(['slave' num2str(i) 'data.txt']);
    delete(['slave' num2str(i) 'x.txt']);
    delete(['slave' num2str(i) 'data.txt']);

    target = atclocktrainfun(x,xdata);
    
    save(['slave' num2str(i) 'result.txt'],'target','-ascii');
    delete(['slave' num2str(i) 'resultok.txt']);
end
Share

Tags: , , , , , , ,

Downhill Simplex Method in Multidimensions

In this article, C program of the downhill simplex method in Numeric Recipe in C – the Art if Scientific computing (Second Edition, Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992, Cambridge University Press, New York) was translated into MATLAB code. The following text comes from the book. (p.408-410)

The downhill simplex mehtod is due to Nelder and Mead. The mothod requires only function evaluations, not derivatives. It is not very efficient in terms of the number of function evaluations that it requires.
A simplex if the geometrical figure consisting, in N dimensions, of N+1 points (or vertices) and all their interconnecting line segments, polygonal faces, etc. In two dimensions, a simplex is a triangle. In three dimensions it is a tetrahedron, not necessarily the regular tetrahedron. In general we are only interested in simplexes that are nondegenerate, i.e., that enclose a finite inner N-dimensional volume. If any point of a nondegenerate simplex is taken as the origin, then the N other points define vector directions that span the N-dimensional vector space.

In one-dimensional minimizations, it was possible to bracket a minimum, so that the success of a subsequent isolation was guaranteed. There is no analogous procedure in multidimensional space. For multidimensional minimization, the best we can do is give our algorithm a starting guess, that is, an N-vector of independent variables as the first point to try. The algorithm is then supposed to make iss own waydownhill through the unimaginable complexity of an N-dimensional topography until it encounters a (local, at least) minimum.
The downhill simplex method must be started not just with a single point, but with N+1 points, defining an initial simplex. If you think of one of these points (it matters not which) as being your initial starting point P0, then you can take the other N points to be
Pi = P0 + Lei
where the ei’s are N unit vectors, and where L is a constant which is your guess of the problem’s characteric length scale. You could have different Ls for each vector direction.

The downhill simplex method now takes a series of steps, most steps just moving the point of the simplex where the function if largest (“highest point”) through the opposite face of the simplex to a lower point. These steps are called reflections, and they are constructed to conserve the volume of the simplex (hence maintain its nondegeneracy). When it can do so, the method expands the simplex in one or another direction to take larger steps. When it reaches a valley floor, the method constracts itself in the transerve direction and tries to ooze down the valley. If there is a situation where the simplex itself in all directions, pulling itself in around its lowest (best) point. The routine name amoeba is intended to be descriptive of this kind of behavior.
Termination criteria can be delicate in any multidimensional minimization routine. Without bracketing, and with more than one independent variable, we no longer have the option of requiring a certain tolerance for a single independent variable. We typically can identify one cycle or step of our multidimensional algorithms. It is then possible to terminate when the vector distance moved in that step is fractionally small in magnitude than some tolerance tol. Alternatively, we could require that the descrease in the function value in the terminating step be fractionally smaller than some tolerance ftol. Note that while tol should not usually be smaller than the square root of the machine precision, it is perfectly approciate to let ftol be of order the machine precision (or perhaps slightly larger so as not to be diddled br roundoff).

Note well that either of hte above criteria might be fooled by a single anomalous step that, for one reason of another, failed to get anywhere. Therefore, it is frequently a good idea to restart a multidimensional minimization routine at a point where it claims to have found a minimum. For this restart, you should reinitialize any ancillary input quantities. In the downhill simplex method, for example, you should reinitialize N of N + 1 vertices of the simplex again by the above equation, with P0 being one of the vetices of the claimed minimum.

AMOEBA.M

function [p,y,nfunk] = amoeba(funk, p, y, ndim, ftol, NMAX)
%function [p,y,nfunk] = amoeba(funk, p, y, ndim, ftol, NMAX)
%
% funk - the name of the function that you want to minimize
% p - the N+1-by-N matrix, row stands for the vertices, column stands for
% the variables
% y - the function values of N+1 vertices
% ndim - the number of variables
% ftol - functional convergence tolerance
% NMAX - the maximum number of function evaluations
% 
% OUTPUT
%   p,y has the same meaning
%   nfunk - the number of function evaluations
%
nfunk = 0;
TINY = 1e-20;
mpts = ndim + 1;
psum = sum(p,1);
while (1==1)
    ilo = 1;
    if y(1)>y(2)
        inhi = 2;
        ihi  = 1;
    else
        inhi = 1;
        ihi  = 2;
    end
    for i=1:mpts
        if (y(i) <= y(ilo)) 
            ilo = i;
        end
        if (y(i) > y(ihi)) 
            inhi = ihi;
            ihi  = i;
        elseif (y(i) > y(inhi)) && (i ~= ihi)
            inhi = i;
        end
    end
    
    rtol = 2.0 * abs(y(ihi)-y(ilo))/(abs(y(ihi)) + abs(y(ilo))+TINY);
    if (rtol < eps)
        [y(1),y(ilo)] = swap(y(1),y(ilo));
        [p(1,:),p(ilo,:)] = swap(p(1,:),p(ilo,:));
        break;
    end
    if (nfunk >= NMAX) 
        break;
    end
    nfunk = nfunk + 2;
    [ytry, p, y, psum] = amotry(funk,p,y,psum,ndim,ihi,-1.0);
    if (ytry <= y(ilo))
        [ytry, p, y, psum] = amotry(funk,p,y,psum,ndim,ihi,2.0);
    elseif (ytry >= y(inhi))
        ysave=y(ihi);
        [ytry, p, y, psum] = amotry(funk,p,y,psum,ndim,ihi,0.5);
        if (ytry >= ysave)
            for i=1:mpts
                if (i ~= ilo)
                    psum=0.5*(p(i,:) + p(ilo,:));
                    p(i,:)=psum;
                    y(i)=feval(@funk,psum);
                end
            end
            nfunk = nfunk + ndim;
            psum = sum(p,1);
        end
    else
        nfunk = nfunk - 2;
    end
end

function [ytry, p, y, psum] = amotry(funk, p, y, psum, ndim,ihi, fac)
fac1 = (1.0-fac) / ndim;
fac2 = fac1 - fac;
ptry = psum .* fac1 - p(ihi,:) .* fac2;
ytry = feval(funk,ptry);
if (ytry < y(ihi))
    y(ihi) = ytry;
    psum = psum + ptry - p(ihi,:);
    p(ihi,1:ndim)= ptry;
end

function [a,b]=swap(a,b)
swapv=a;
a=b;
b=swapv;

Example.M

for i=1:MP
    for j=1:NP
        if i == j+1
            p(i,j) = 1.0;
        else
            p(i,j) = 0.0;
        end
        x(j)=p(i,j);
    end
    y(i)=func(x);
end
disp(sprintf('%3s %10s %12s %12s %14s\n\n','i','x[i]','y[i]','z[i]','function'));
for i=1:MP
    ss1 = [sprintf('%3d ',i)];
    for j=1:NP
        ss1 = [ss1 sprintf('%12.6f ',p(i,j))];
    end
    ss1 = [ss1 sprintf('%12.6f\n',y(i))];
    disp(ss1);
end
[p,y,nfunc] = amoeba(@func,p,y,ndim,eps,1000);
disp(sprintf('\nNumber of function evaluations: %3d\n',nfunc));
disp(sprintf('Vertices of final 3-d simplex and\n'));
disp(sprintf('function values at the vertices:\n\n'));
disp(sprintf('%3s %10s %12s %12s %14s\n\n','i','x[i]','y[i]','z[i]','function'));
for i=1:MP
    ss1 = [sprintf('%3d ',i)];
    for j=1:NP
        ss1 = [ss1 sprintf('%12.6f ',p(i,j))];
    end
    ss1 = [ss1 sprintf('%12.6f\n',y(i))];
    disp(ss1);
end
disp(sprintf('\nTrue minimum is at (0.5,0.6,0.7)\n'));  

FUNC.M

function [res] = func(x)
res = 0.6 - bessj0((x(1)-0.5)^2 + (x(2)-0.6)^2 + (x(3)-0.7)^2);

function [ans] = bessj0(x)
ax=abs(x);
if (ax < 8.0) 
    y=x*x;
    ans1=57568490574.0+y*(-13362590354.0+y*(651619640.7 ...
        +y*(-11214424.18+y*(77392.33017+y*(-184.9052456)))));
    ans2=57568490411.0+y*(1029532985.0+y*(9494680.718 ...
        +y*(59272.64853+y*(267.8532712+y*1.0))));
    ans=ans1/ans2;
else 
    z=8.0/ax;
    y=z*z;
    xx=ax-0.785398164;
    ans1=1.0+y*(-0.1098628627e-2+y*(0.2734510407e-4 ...
        +y*(-0.2073370639e-5+y*0.2093887211e-6)));
    ans2 = -0.1562499995e-1+y*(0.1430488765e-3 ...
        +y*(-0.6911147651e-5+y*(0.7621095161e-6 ...
        -y*0.934935152e-7)));
    ans=sqrt(0.636619772/ax)*(cos(xx)*ans1-z*sin(xx)*ans2);
end
Share

Tags: , , ,