Use PERL to processing CSV file as plain text file

There are several modules used to process CSV file in PERL. Please refer to this tutorial to learn more about it. It describes several ways to parse CSV file, such as using Text::CSV module, Tie::CSV_File, Tie::Handle::CSV, and DBD::CSV modules. Here what I describe a very simple way to process CSV without using any modules.

Since CSV file is plain text file that includes fields separated by comma, to process it should be pretty simple and easy. We can just read line by line and parsing them with no difficulty. The procedure I present here is PERL script to handle CSV file. User should provide input filename and output filename through command line. An additional parameter ($layer) is used to pick up right column.

Here is a sample of input file:

CL	a	b	c	S	Weight	T1	Count
1	0.08	0.072	0.08	11	0	4.91477	13
3	0.08	0.072	0.08	11	0	5.51728	25
2	0.08	0.072	0.08	11	0	5.566	14

What we want to do is to rearrange data fields in the file and also skip some fields. A sample of output looks like the following.

a	 CL	Weight	T1
0.08	1	0	4.91477
0.08	3	0	5.51728
0.08	2	0	5.566

The above data is just for demonstration. There is any meaning in the context. As sample, the data files is very small. You might ask why do you use Excel to process it. In reality, we might have a huge file way beyond Excel’s capacity to handle. Secondarily, we might have hundreds of small CSV files with same format. If you process them with Excel one by one, it will be to tedious to do.

The following is the PERL script to process these type of CSV data file.

#!\perl\bin\perl
$numArgs = $#ARGV + 1;
if ($numArgs != 3) { exit; }
my $infile = $ARGV[0];
my $outfile = $ARGV[1];
my $layer = $ARGV[2];

# open output file
open(MYOUT, "> $outfile") or die "cannot open $outfile: $!\n";
# read input file
open(MYIN, "< $infile") or die "cannot open $infile: $!\n";
my $linecount = 0;

while() {
# Good practice to store $_ value because
# subsequent operations may change it.
my ($line) = $_;

# Good practice to always strip the trailing
# newline from the line.
chomp($line);

$linecount++;

print MYOUT processLine($line);
}

close(MYIN);
close(MYOUT);

sub processLine {
my @elem = split(/,/,$_);
my $size = @elem;
my $mytraitvalue;
$mytraitvalue = $elem[$layer];
if ($mytraitvalue + 1 > 1) { $mytraitvalue = $elem[$layer]*100; }
my $returnStr = "$mytraitvalue,$elem[0]";
for (my $i=5; $i<$size-1; $i++) {
$returnStr .= ",$elem[$i]";
}
$returnStr .= "\n";
return $returnStr;
}
  • Share/Bookmark

Leave a Response

You must be logged in to post a comment.