TapUnicode in WordPress

As I mentioned in the previous post, there can be a problems inserting foreign text into WordPress. I have done it in the past with simple accents for French and German with no problem and for some special characters I used the &#….; codes but when it came to pasting in a chunk of Arabic it didn’t work at all, just displaying a bunch of question marks. I suspect that there would be a similar problem with Hebrew, Chinese, Japanese, Cyrillic and any other non-western scripts. I had a search around and the first suggestion I came across was Obsessed with the Press which suggested commenting out the lines for DB_CHARSET and DB_COLLATE in wp-config.php. This appeared to work (on a test site) but looking at older posts I could see that some characters in there were now corrupt, displaying a white question mark in a black diamond. In the comments on the same page there was a suggestion to not do that but just change DB_COLLATE to the value ‘utf8_general_ci’. This didn’t really work either. There were suggestions on other pages to set it to ‘utf8_unicode_ci’ and various other things, so it was time to do some more serious investigation.

It looks like the problem is not really the fault of WordPress at all but the MySQL installed on some sites (including mine). Deep in the MySQL is a configuration parameter for the default character collation and it is often set as supplied to ‘latin1_swedish_ci’—Why? Because MySQL was originally Swedish! If it was just taken out of the box and installed then that will be the default you get for most of your tables because DB_COLLATE in WordPress is set to null and so takes the default. In practice you will find some tables are different, perhaps because they discovered it was important.

So, what does that mean for fixing the problem? DO THIS AT YOUR OWN RISK—I AM NOT AN EXPERT.

First—the second suggestion above was correct—change the DB_COLLATE line to read

define('DB_COLLATE', 'utf8_general_ci');

If you are setting up WordPress for the first time, this may be sufficient because it will use this value, but if you are hacking an old installation then you will need to correct it a bit. You need to go into phpMyAdmin and change some of the collations on your tables. The important one, which fixed my problems, is table wp_posts, field name post_content, but if you are planning to use unicode in post titles, comments and other places then you may need to do more of them. I am planning to be a bit cautious about changing too many in case it breaks something else.

Comments are closed.

^ Top