Archive for category PERL

PERL processes files and related functions

Command line parameters

If you want to know how many command line arguments the program gets, you can use $#ARGV because ARGV is an predefined array in Perl. Here is code snippet to use this array.


if ($#ARGV != 1)
{
print "Usage: you have provide two inputs";
exit;
}
my $mypath = $ARGV[0];

Change folder

It is as simple as simply use cd command in OS.

chdir($mypath) || die "$!";

When you use folder path in the PERL variable, please remember use \\ in DOS or Windows system to represent path, for example “.\\temp\\” represent the subfolder “temp” in the current folder. Otherwise you will get very strange errors when you run the PERL script.

Read the whole content in a file

The key command is “local $/=undef;”. See the code snippet to learn the detail.


open(DATA, "< $filename")
|| die "Failed to open the file $filename\n$!\n";

local $/ = undef;
my $content = <DATA>;

close(DATA);

Process text content

After you read the whole content in a file to a variable, you have to process it. If you have to process it line by line, you can use split function to break them into an array, like the following.


my @lines = split(/\n/,$content);

If you want to get rid of the first line, you can use the following command to achieve it.

shift(@lines);

If you want to break the content in one line by space, you can use the following command.

my @temps = split(/\s+/,$temp);

The key is the regular expression \s+, which stands for one or more space.

Share

Tags: , , , ,

Remove BOM from utf-8 files

Problem

When I developed a PHP web application with multiple language support, I encountered a situation that PHP report “header is already sent”. That was caused by the Unicode Byte Of Mark (BOM) at the beginning of files in utf-8 encoding.

Solution 1

In my article Solution for Cannot modify header information in PHP application, one solution was provided. That works well. But when I use http://validator.w3.org to validate my site, a warning was reported.

Byte-Order Mark found in UTF-8 File.

The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported.

So, how can I get rid of BOM in UTF-8 files?

Solution 2

If we can remove BOM from UTF-8 files, we do not modify the PHP configuration file. How can we get rid of the BOM header in each unicode files. One solution is write a PERL script to delete the BOM. The following is a simple PERL script get from the internet.

#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);

The script reads a file from stdin, remove the BOM from the beginning of the first line if it’s present, and print it to stdout.

To use the script is very simple. Save is as bomkill.pl. Use the following command to convert a unicode file.

./bomkill.pl < [unicode filename] > [new filename]

Solution 3

Late on I found that Notepad++ provides a useful function that can save UTF-8 files without BOM header. So, you do not need to play with PERL script anymore, just go and download it a copy of Notepad++ and use it to deal with the BOM header problem.

Reference

  1. Solution for Cannot modify header information in PHP application
Share

Tags: , , , , , , , ,

U’s Bargain Network, an online deal publishing system

U’s Bargain Network is a website that publishes online deals, bargains, discounts, and coupons related to computer, software, gaming, electronics, and video and more. It was intensively developed through the last two years. It is powered by online deal publishing system that consists of PHP applications, PERL applications, MySQL database system and other auxiliary applications.  The system can automatically download and parse webpages, RSS feeds, csv files, and other text files and load them into corresponding data tables. Deal pages can be generated and stored on the web server in selected time interval. It also track product price and performance. Visitors of the website can write reviews for all products. The following briefly introduces this system based on the live website – U’s Bargain Network. To support our development, simply buy products through this live website. We will make it better and better.

User interface

Figure 1 is the home page of U’s Bargain Network. All other pages have the similar looking. At the top of the page is the logo of U’s Bargain Network. Click it to lead to the homepage. In the middle of the top is a banner that usually broadcasts recent coupons or promotions. At the right side of the page is the search box that provides you powerful full text search function. We will describe it in detail in next section.


Figure 1

Below the top header of the page is the menu bar that includes several ways to view the deals, such as Best Sellers, All Time Low, Rebates, Promotions, All and Blog. Click these menu items, corresponding pages will be loaded.

There are two columns in the main content area. On the left side is the deal list column. Scroll down to the bottom of the page, you will see a navigation bar at the bottom of deal list column. Use it to navigate you through the other deals. In the home, the best featured deals are always show at the top followed by the newly added deals.

On the right side, the one on the top show deals from Amazon. Google ads box is just below Amazon deal box. Below the Google Ads are Keyword tag box, Brand tag box, and Merchant tag box. These tags aims at to facilitate your deal searching. See detail in the next section.

The footer bar includes copyright information and links for contact us, privacy and terms of the website.

Find deals you want

U’s Bargain Network process hundreds of online deals. Without an efficient search engine, it will be a bad thing to scan through all the deals. Powered by a powerful full text search engine, you can simply type keywords to the search box at the top right corner of each page to search what you are looking for. It is very fast and usually return results in seconds unless the server is too busy. Figure 2 provides one example, searching iPad in the website.


Figure 2

In addition to the powerful search engine, every U’s Bargain Network web page provides three tag boxes: keyword tag box, brand tag box and merchant tag box (see Figure 3). Hover over each tag, you can find how many deals related to the tag. Click each tag will lead to a new presorted deal page.


Figure 3

Write customer reviews

In the deal list column in Figure 1, there is a link “[Details/Comments]” for every deal. Simply click the link, a new deal detail page will show up, see Figure 4. There are several useful components in this page: deal description, deal price history, comments and comment post form, and coupon list. By using the comment post form, you can write your comment for the specific product to help others to make decision.


Figure 4

Contact the core team

If you have questions or inquiries, please use the contact us form (see Figure 5) to send message to the core team. We will review your questions and inquiries frequently. An answer might be provided based on the nature of the message.


Figure 5

Blog pages

A blog site was established specifically for U’s Bargain Network. You can register as a contributor to write blogs for U’s Bargain Network. We welcome your input by posting comments and blog articles.


Figure 6

Share

Tags: , , , , , , , , , , , , ,

A PERL script to get Twitter user information

If you are a Twitter user and do business on Twitter, you probably think about how I can target to a group of Twitter users, such as living in certain region or having similar interests. Can we screen Twitter users and target to a given population of Twitter users? Answer is absolutely positive: yes, we can. One simple solution is to write a PERL script to download Twitter user information and store them into database first. Then we can conduct different sort of queries to achieve our goal. The following PERL script can download Twitter user information, extract user information and then store in MySQL database. Here is the source code.

#!/usr/bin/perl
require 5.6.0;
use strict;
use warnings;
use DBI;
use Data::Dumper;
use XML::Simple;

my $dbh;
my $sql;
my $sth;
my $dbname = "twitter_world";
my $tablename = "users";
my $xmlname = "tid.xml";

getConnected();

my $i = 100;
while (1==1)
{
	$i++;
	process($i);
}

disConnected();

# END OF MAIN PROGRAM

sub process {

	my $tid = $_[0];
	system("curl  http://twitter.com/users/show.xml?user_id=$tid -o tid.xml");

	# create an XML object
	my $xml = new XML::Simple;

	# read the RSS feed file
	my $data = $xml->XMLin($xmlname);

	#print Dumper(\$data); 
	
	if ($data->{error}) { return; }
	
	my $location        = checkVar($data->{location});
	my $time_zone       = checkVar($data->{time_zone});
	if (($location eq "") && ($time_zone eq "")) { return; }
	
	my $id              = checkVar($data->{id});
	my $name            = checkVar($data->{name});
	my $screen_name     = checkVar($data->{screen_name});
	my $description     = checkVar($data->{description});
	my $url             = checkVar($data->{url});
	my $followers_count = checkVar($data->{followers_count});
	#my $friends_count   = checkVar($data->{friends_count});
	#my $created_at      = checkVar($data->{created_at});
	#my $utc_offset      = checkVar($data->{utc_offset});
	
	my $sql = "insert into users (id, name, screen_name, location, description, url, time_zone) ".
	          "values ($id, $dbh->quote($name), $dbh->quote($screen_name), $dbh->quote($location), $dbh->quote($description), $dbh->quote($url), $dbh->quote($time_zone));";
	#print "$sql\n";
	executesql($sql); 
}


sub checkVar
{
	my $var = $_[0];
	if (ref($var) eq 'HASH')
	{
		$var = "";
	}
	return $var;
}

#############################################
#
# execute a SQL statement
#
sub executesql {
	my $sth = $dbh->prepare($_[0]);
	my $nrec = $sth->execute();
	$sth->finish();
	return $nrec;
}

#############################################
#
# connect to database and generate a handle
#
sub getConnected {
# return the database handle object to the caller

	# Set the parameter values for the connection
	#-------------------------------
	my $host="host address";
	my $connectionInfo = "DBI:mysql:$dbname;$host";
	my $databaseUser = "mysql database username";
	my $databasePw = "mysql database password";

	# Connect to the database
	# Note this connection can be used to 
	# execute more than one statement
	# on any number of tables in the database
	#-------------------------------
	$dbh = DBI->connect($connectionInfo, $databaseUser, 
	    $databasePw) || die "Connect failed: $DBI::errstr\n";	    
}

######################################
#
# disconnect the database handle
#
sub disConnected {
	$dbh->disconnect();
}

Share

Tags: , , , ,

Automatically send updates to Twitter

Here is a PERL script I developed to automate Twitter update activity. The fundamental thing is you have a database and all updates are stored there. You are sick of sending all Twitter updates manually. As long as you meet these two criteria you can keep reading. Otherwise, stop here. The PERL source code is shared here. You can modify and adopt it for your purpose.

#!/usr/bin/perl
# include libraries
#——————————-
require 5.6.0;
use warnings;
use strict;
use DBI;
use Time::Local;
use Data::Dumper;
 
if ($#ARGV != 3) {
    print “usage: twitter_friends username password\n”;
    exit;
}

my $username = $ARGV[0];
my $password = $ARGV[1];
my $newdeal  = $ARGV[2];
my $direct   = $ARGV[3];
my $dbname   = “your mysql database name”;
my $dbh;
my $sql;
my $sth;
my $todayStr;

getTodayStr();

getConnected();

if ($newdeal != 1) 
{ 
$sql = “SELECT price, name, id_items FROM deals “.
“WHERE (expiration_date >= ‘$todayStr’) AND (imagefile <> ”) AND featured = 1;”;
}
else
{
$sql = “SELECT price, name, id_items FROM deals “.
“WHERE (expiration_date >= ‘$todayStr’) AND (imagefile <> ”) “.
“ORDER BY id_items DESC “.
“LIMIT 0,18;”;
}
updateStatus();

disConnected();

#############################################
#
# send status update to twitter 
#
sub updateStatus {
$sth = $dbh->prepare($sql);
$sth->execute();
my ($price, $name_items, $id_items);
$sth->bind_columns(\$price, \$name_items, \$id_items);
while ($sth->fetch()) 
{
  my $price_str = “”; 
  if ($price > 0.0)
  {
    $price_str = “[\$".$price."]“;
  }
  my $short_name = substr($name_items, 0, 70); 
  my $url = “http://usbargains.net/deals/$id_items.html”;
  if ($direct == 1)
  {
    $url = “http://usbargains.net/items/usb_$id_items.html”;
  }
  my $status=”Deal-”.$price_str.” “.$short_name.” “.$url;
  #print $status.”\n”;
  $status =~ s/([^A-Za-z0-9])/sprintf(”%%%02X”, ord($1))/seg;
  my $cmd = “curl -X POST -u $username:$password \”http://twitter.com/statuses/update.xml?status=”.$status.”\” >/dev/null”;
  system($cmd);
}
$sth->finish();
}

#############################################
#
# connect to database and generate a handle
#
sub getConnected {
# return the database handle object to the caller
# Set the parameter values for the connection
#——————————-
my $host=”localhost”;
my $connectionInfo = “DBI:mysql:$dbname;$host”;
my $databaseUser = “your database username”;
my $databasePw = “your database password”;

# Connect to the database
# Note this connection can be used to 
# execute more than one statement
# on any number of tables in the database
#——————————-
$dbh = DBI->connect($connectionInfo, $databaseUser, 
   $databasePw) || die “Connect failed: $DBI::errstr\n”;    
}

######################################
#
# disconnect the database handle
#
sub disConnected {
$dbh->disconnect();
}

######################################
#
# generate today’s date string
#
sub getTodayStr {
# get local time
my ($sec,$min,$hour,$iDay,$iMonth,$iYear,$wday,$yday,$isdst) = localtime time;
$iYear = 1900 + $iYear;
$iMonth++;
$todayStr = “$iYear-$iMonth-$iDay”;
}

==============
To use this PERL script, you have to install CURL to your system. We assume that your have MySQL database server somewhere you can access. Post your comments to discuss this script. Enjoying.

Reference

Share

Tags: , , ,