Generate entire site by PERL

I wrote a Windows application to generate entire site a while ago. It works fine in Windows OS. I am working on Linux more and more and love its cron job. That make me think that I can let computer automatically generate the entire site at given time without any human interfere. At the beginning I thought it was complicated and not easy to achieve. As my skills and experiences in PERL grow, I know it is not hard to do that. Yes, I wrote a simple one several days ago. It works perfectly even it still has bugs. Gradually I made this progrm better nand better.

I created two files: template and page database. The template file includes a template for the generated page and the page database includes essential information about a page. Here is an example:

! SAS articles
! undre folder "sas"

#sas/index.html
   pagetitle|string|A Collection of SAS Program
   rootpath|string|../
   maincontent|file|sas/index.content

#sas/crgen.html
   pagetitle|string|Generating the treatment layout of a completely randomized (CR) design
   rootpath|string|../
   maincontent|file|sas/crgen.content

A comment line starts with ‘!’. A new page starts with ‘#’ followed by page name. Below that are pagetitle, rootpath, maincontent etc. These are variables you can define in your template and Perl program. Perl program gets these values in the page database and substitutes variables in the template. in the template you can use some special symbol to mark your variables. I use $$ to surround the CAPITALIZED string.

In addition, the Perl program create sitemap.xml automatically. It can be provided to Google and other search engine if you want other people can find your pages.

Here lists a sample of the Perl program. You are free to adopt and modify it for your purpose.

#!/usr/bin/perl
use Time::Local;

# get local time
my ($sec,$min,$hour,$iDay,$iMonth,$iYear,$wday,$yday,$isdst) = localtime time;
$iYear = 1900 + $iYear;
$iMonth++;
my $urls = "";
my $lastmod = sprintf("%04s-%02s-%02sT%02s:%02s:%02s+00:00", $iYear, $iMonth, $iDay, $hour, $min, $sec);

my $tempfile = "articles.tem";
my $pagesfile = "articles.page";

# read template
open(TEM, "< $tempfile") or die "cannot open $tempfile: $!\n";
undef $/;
my $temp_content = ;
close(TEM);

# read pages
open(PAGES, "< $pagesfile") or die "cannot open $pagesfile: $!\n";
undef $/;
my $pages_content = ;
close(PAGES);

my @pages = split(/#/, $pages_content);

my $page;

foreach (@pages) {
	my @lines = split(/\n/,$_);
	my $size = @lines;
	if ($size > 1) {
		my $htmlfile = $lines[0];
		my $pagetitle = (split(/\|/,$lines[1]))[2];
		my $rootpath = (split(/\|/,$lines[2]))[2];
		$rootpath =~ s/\/n//sg;
		my $maincontent = (split(/\|/,$lines[3]))[2];

		print $htmlfile, "\n";
		print $pagetitle, "\n";
		print $rootpath, "\n";
		print $maincontent, "\n\n";

		# read page content
		my $infile = "./".$maincontent;
		print $infile,"\n";
		open(PAGE, "< $infile") or die "cannot open $infile: $!\n";
		undef $/;
		my $page_content = ;
		close(PAGE);

		my $content = $temp_content;

		$content =~ s*\$\$PAGETITLE\$\$*$pagetitle*g; # page title
		$content =~ s*\$\$MAINCONTENT\$\$*$page_content*g; # main content
		$content =~ s*\$\$ROOTPATH\$\$*$rootpath*g; # root path

	        # write content to output file
		my $outfile = "./".$htmlfile;
		print $outfile,"\n";
	        open(HTML, "> $outfile") or die "cannot open $outfile: $!\n";
	        print HTML $content;
	        close(HTML);

	        addUrl($htmlfile);
	}
}
SaveSitemap();

#########################################
#
# update Sitemap.xml every time
# keep it in the latest status for Google rebots
#
sub SaveSitemap {
        # read template
        open(HTML, "< ./sitemap.temp.xml") or die "cannot open ./sitemap.temp.xml: $!\n";
        undef $/;
        my $content = ;
        close(HTML);

        # process text
        $content =~ s/\$url/$urls/g; # urls

        # write content to output file
        open(HTML1, "> ./articlessitemap.xml") or die "cannot open ./articlessitemap.xml: $!\n";
        print HTML1 $content;
        close(HTML1);
}

##########################################
#
# addUrl
#
sub addUrl {
	$urls .= "".
		"http://articles.sunfinedata.com/$_[0]".
		"$priority".
		"$lastmod".
		"weekly".
		"";
}
  • Share/Bookmark

Leave a Response

You must be logged in to post a comment.