Posts Tagged utf-8

Save and retrieve utf-8 characters to MySQL database directly

As the new version of MySQL Workbench (V5.2.31) rolled out, it solved the utf-8 character problem. I can directly insert, query, and edit utf-8 characters. This is first time I use the interface to directly handle Chinese characters in utf-8. In the past I do not know what encoding was used when I loaded utf-8 characters into the MySQL database even though I can use PHP get the characters back and display with no problem. I am still confused about that. But now in the MySQL Workbench, I am able to directly handle utf-8 characters.
However, I got a string of question marks, ie, “??????”, back from the data retrieved from the database by using my old PHP code. I googled and find a nice solution and then incorporate the solution into a nice, useful PHP class. It can handle utf-8 easily.
The idea is the following. For all PHP page, we should have the following meta tag:

<meta http-equiv="content-type" content="text/html; charset=utf-8">

This tells the Web browser that display page content in UTF-8 encoding and take form input text in UTF-8 encoding.
When we inserted data from forms into MySQL data tables, we should set two session control variables:

character_set_client=utf8
character_set_connection=utf8

This tells MySQL server that my SQL statement is encoded as UTF-8 and keep it as UTF-8 when executing the statement.
On the other hand, when retrieving text data from MySQL, I need to set one session control variable:

character_set_results=utf8.

This tells MySQL server that result set must be sent back in UTF-8 encoding.

Here is my PHP class to handle insert and retrieve data from MySQL database. It is generic class. You can modify it and use it in your PHP application.
<?php

/*
Author: Zhanshan Dong at Sunfinedata.com
@version $Id: sql.php 2010-05-18 $
*/

class SQL
{
private $_config;
private $_mysqli;

function __construct($config)
{
$this->_config = $config;
$this->_mysqli = @new mysqli($this->_config['DB_HOST'],
$this->_config['DB_USER'],
$this->_config['DB_PASSWORD'],
$this->_config['DB_NAME']);

// For handling any characters in utf-8 encoding correctly

// When insert utf-8 encoding input data to MySQL database, need to set two session
// control variables: character_set_client=utf8 and character_set_connection=utf8
// when saving input text to the database table. This is to tell MySQL server that
// my SQL statement is encoded as UTF-8 and keep it as UTF-8 when executing the statement
$mysqli->query("SET character_set_client=utf8");
$mysqli->query("SET character_set_connection=utf8");

// When retrieving text data from MySQL, need to set one session control variable:
// character_set_results=utf8. This is to tell MySQL server that result set must
// be sent back in UTF-8 encoding.
$mysqli->query("SET character_set_results=utf8");
}

function __destruct()
{
$this->_mysqli->close();
}

function execute($sql)
{
$res = "";
$res = $this->_mysqli->query($sql);
return $res;
}

function affected_rows()
{
return $this->_mysqli->affected_rows;
}

function escape_str($str)
{
return $this->_mysqli->real_escape_string($str);
}

function getInsertID()
{
return mysqli_insert_id($this->_mysqli);
}

}

?>

Share

Tags: , , , ,

Remove BOM from utf-8 files

Problem

When I developed a PHP web application with multiple language support, I encountered a situation that PHP report “header is already sent”. That was caused by the Unicode Byte Of Mark (BOM) at the beginning of files in utf-8 encoding.

Solution 1

In my article Solution for Cannot modify header information in PHP application, one solution was provided. That works well. But when I use http://validator.w3.org to validate my site, a warning was reported.

Byte-Order Mark found in UTF-8 File.

The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported.

So, how can I get rid of BOM in UTF-8 files?

Solution 2

If we can remove BOM from UTF-8 files, we do not modify the PHP configuration file. How can we get rid of the BOM header in each unicode files. One solution is write a PERL script to delete the BOM. The following is a simple PERL script get from the internet.

#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);

The script reads a file from stdin, remove the BOM from the beginning of the first line if it’s present, and print it to stdout.

To use the script is very simple. Save is as bomkill.pl. Use the following command to convert a unicode file.

./bomkill.pl < [unicode filename] > [new filename]

Solution 3

Late on I found that Notepad++ provides a useful function that can save UTF-8 files without BOM header. So, you do not need to play with PERL script anymore, just go and download it a copy of Notepad++ and use it to deal with the BOM header problem.

Reference

  1. Solution for Cannot modify header information in PHP application
Share

Tags: , , , , , , , ,

JavaScript handling UTF8 characters

When I developed a PHP web application with AJAX. I encountered a problem that JavaScript cannot process utf-8 character correctly. Whenever user enter a UTF-8 string,
it was converted to ASCII in JavaScript and delivered to PHP script. The site is using UTF-8 characters and will be sure there are a lot of request using utf-8 characters. I googled “Javascript and utf-8” and find an article at http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html. It provides two functions for the purpose. One function I will use is the decode_utf8. I simply copied these two functions and the second one did not work. I change the inner function from escape to unescape. After the change, I call the function decode_utf8(s).

Here are the two functions:
function encode_utf8( s )
{
return unescape( encodeURIComponent( s ) );
}

function decode_utf8( s )
{
return decodeURIComponent( unescape( s ) );
}

A JavaScript code snippet to call the function:

function search(php)
{
var keyword = window.document.forms["searchform"].elements["keyword"].value;
var url=php+"?cmd=search" +
"&keyword="+decode_utf8(keyword);
makeGETrequest2(url, link_callback);
}

Share

Tags: , , , , ,

Solution for Cannot modify header information in PHP application

I encountered a strange problem when I developed a PHP website application. When I use english language file, saved in ASCII format. The web application runs smoothly with no problem. As soon as I change the language file to an Asian language, saved in UTF-8 format, a problem occurs. See the following for the detail information:

warning: Cannot modify header information – headers already sent by (output started at xxxxx labels.inc:1) in xxxxx.php on line nnn.

I switched the language files back and forth. The problem persisted. So googled “php utf8 Cannot modify header information – headers already sent by” and found this is a problem that other people also encountered. I studied a couple of posts before I got a easy solution.

The warning error message is related to output_buffering on/off in your php.ini of the web server. If you have output_buffering set to some cache, the server will allow to send headers with delay (or to modify them shortly after they are sent), and this error will not happen. But if you set output_buffering to 0 or off at all, then headers will be sent immediately, it will trip this error message.

For my situation, the problem is caused by UTF-8 file format for the language files. I modified the php.ini file by changing OUTPUT_BUFFER=OFF to OUTPUT_BUFFER=ON. The problem was solved after I restarted Apache2 server.

Reference

  1. Remove BOM from utf-8 files
Share

Tags: , , , ,

Database migration: MS Access to MySQL server

The Problem
Recently I solved a database migration problem: migrate MS Access database encoded in GB2312 to MySQL server encoded in UTF-8. I’d like to share what I did and hope it is useful to help you solve your problem.

I had a lot of useful data in an MS Access database. They were collected through ASP Gallery 1.0, which was run on Windows IIS web server. I did not have an IIS server to run ASP Gallery for a long time and do not have a plan to launch an IIS server in the coming future.

I wrote simple PERL CGI’s and used it from 2007 to 2008. Now I have a powerful Apache2 server and I developed an powerful PHP web application. In order to use the useful information in the MS Access database, I have to migrate all data to MySQL server.

The Solution
There is a big problem in front of me. All data in the MS Access database was encoded in GB2312. My MySQL server and everything else related to the new web server use UTF-8 encoding. I have to convert the GB2312 encoding to UTF-8 encoding. That was solved easily. See the corresponding function in migrate.php.

First I used the MySQL migrate tool (you can download it from mysql.com website) to transfer data (migrate_class) to MySQL server. Then I wrote a nice PHP application called migrate.php, that help me to merge the data from the table to MySQL server (category) table in donghome database.

The PHP code

<?php
// database-specific information
DEFINE(DB_USER,"your username");
DEFINE(DB_PASSWORD,"your password");
DEFINE(DB_HOST,"your mysql server host");
DEFINE(DB_NAME,"gmain");

// connect to mysql
$db_connection = mysql_connect(DB_HOST,DB_USER,DB_PASSWORD);

// select the database
mysql_select_db(DB_NAME);

// process gmain database that includes old gallery data tables

//update_class_table();

//migrate_class();


function migrate_class()
{
	$sql = "SELECT classid, pclassid, title FROM class";
	$query_result = mysql_query($sql);
	$counter = 0;
	while ($row = mysql_fetch_array($query_result, MYSQL_NUM))
	{
		$classid=$row[0];
		$pClassID = $row[1];
		$name = $row[2];
		$counter ++;
		echo $counter. " : ";
		update_category($classid, $pClassID, $name);
	}

}
function update_category($classid, $pClassID, $name)
{
	$sql1 = "INSERT INTO donghome.category (id_category,parent_id,name) ".
			"VALUES ($classid,$pClassID,'".mysql_escape_string($name)."');";
	echo $sql1."<br>";
	mysql_query($sql1);
}

function update_class_table()
{
	$sql = "SELECT * FROM migrate_class";
	$query_result = mysql_query($sql);
	$str = "all classes in the table <br>";
	$str .= "ClassID,directory,pClassID,title,hyperlink,hasdir,user,path<br>";
	while ($row = mysql_fetch_array($query_result, MYSQL_NUM))
	{
		$path = "";
		$classid=$row[0];
		$directory = $row[1];
		$pClassID = $row[2];
		$title = $row[3];
		$hyperlink = $row[4];
		$hasdir = $row[5];
		if ($row[5])
		{
			$path = buildpath($directory,$pClassID);
		}
		$str .= "*** $row[0],$row[1],$row[2],".convertGB_UTF8($title).",$row[4],$row[5],$row[6],$path<br>";
		updateClass($classid,$directory,$pClassID,$title, $hyperlink, $hasdir, $path);
	}
	echo $str;
}

function updateClass($classid,$directory,$pClassID,$title, $hyperlink, $hasdir, $path)
{
	$sql1 = "INSERT INTO class (classid,directory,pclassid,title,hyperlink,hasdir,path) ".
			"VALUES ($classid,'$directory',$pClassID,'".mysql_escape_string(convertGB_UTF8($title))."', '$hyperlink', $hasdir, '$path');";
	echo $sql1."<br>";
	mysql_query($sql1);
}

function buildpath($path,$parentdir)
{
	$apath = $path;
	$sql = "SELECT * FROM migrate_class WHERE ClassId = $parentdir;";
	$query_result = mysql_query($sql);
	$row = mysql_fetch_array($query_result, MYSQL_NUM);
    if ($row[5])
	{
		$apath = $row[1]."/".$apath;
	}
    $aparentdir = $row[2];
	if ($aparentdir != 0)
	{
		$apath = buildpath($apath,$aparentdir);
	}
	return $apath;
}


function convertGB_UTF8($str)
{
	$str = mb_convert_encoding($str, "UTF-8", "GB2312");
	return $str;
}

?>
Share

Tags: , , , , , ,