TapWordPress 2.2 and Charset

A while ago someone pointed out that there was an anomaly with WordPress in that the web pages were displayed in UTF-8 character set but the database was stored in a Latin charset and that was causing a few problems. They worked out in detail how this should be corrected.

Unfortunately it seems that the authors took on board that it needed to be changed but ignored the method. The consequence is that people updating to version 2.2 using the default config file are in a bit of a mess if they use a text containing non US-ASCII characters, especially foreign languages (wrt English)

I noticed first because my British blog (this one) frequently uses the pound sterling character £. Having corrected all those I have noticed a few others, for example ô became Ã` and — became –.

Note that this does not affect new blogs at all.

For blogs upgrading from an earlier version to 2.2 the lines to watch in wp-config.php are define('DB_CHARSET', 'utf8'); and define('DB_COLLATE', '');. They didn’t use to be there. I think the mistake was taking any notice of the sample file—silly me, I thought it was necessary to keep all files up to date.

At least there should be some warning about it as it is a natural mistake—I only found the trac entry after the event, the announcement didn’t mention it. There is some documentation about it but that is not something you would naturally look for. Now I have the problem that I have fixed some by hand and made some posts with the new system so how do I fix it—change them all by hand or revert and change those ones I have done back?

Comments are closed.

^ Top