TapUnicode in WordPress

8 May 2010 19:01 by Rick

As I mentioned in the previous post, there can be a problems inserting foreign text into WordPress. I have done it in the past with simple accents for French and German with no problem and for some special characters I used the &#….; codes but when it came to pasting in a chunk of Arabic it didn’t work at all, just displaying a bunch of question marks. I suspect that there would be a similar problem with Hebrew, Chinese, Japanese, Cyrillic and any other non-western scripts. I had a search around and the first suggestion I came across was Obsessed with the Press which suggested commenting out the lines for DB_CHARSET and DB_COLLATE in wp-config.php. This appeared to work (on a test site) but looking at older posts I could see that some characters in there were now corrupt, displaying a white question mark in a black diamond. In the comments on the same page there was a suggestion to not do that but just change DB_COLLATE to the value ‘utf8_general_ci’. This didn’t really work either. There were suggestions on other pages to set it to ‘utf8_unicode_ci’ and various other things, so it was time to do some more serious investigation.

It looks like the problem is not really the fault of WordPress at all but the MySQL installed on some sites (including mine). Deep in the MySQL is a configuration parameter for the default character collation and it is often set as supplied to ‘latin1_swedish_ci’—Why? Because MySQL was originally Swedish! If it was just taken out of the box and installed then that will be the default you get for most of your tables because DB_COLLATE in WordPress is set to null and so takes the default. In practice you will find some tables are different, perhaps because they discovered it was important.

So, what does that mean for fixing the problem? DO THIS AT YOUR OWN RISK—I AM NOT AN EXPERT.

First—the second suggestion above was correct—change the DB_COLLATE line to read

define('DB_COLLATE', 'utf8_general_ci');

If you are setting up WordPress for the first time, this may be sufficient because it will use this value, but if you are hacking an old installation then you will need to correct it a bit. You need to go into phpMyAdmin and change some of the collations on your tables. The important one, which fixed my problems, is table wp_posts, field name post_content, but if you are planning to use unicode in post titles, comments and other places then you may need to do more of them. I am planning to be a bit cautious about changing too many in case it breaks something else.

TapInternational TLD

10:56 by Rick

A few days ago (5 May) saw a great leap forward in the development of the internet. For the first time, top level domain names (TLD) are permitted using non-Latin scripts. In particular, three country codes have been assigned by ICANN. These are for مصر (Egypt), السعودية (Saudi Arabia) and امارات (United Arab Emirates). They are the first country codes which are not two characters (except the “cat” = Catalan anomaly), possibly because they thought there was no need to maintain the restriction if they were branching out into other scripts.

These first ones are all in Arabic which is a right to left language. That means that when you see one in the address bar it will appear the other way round to usual with http://TLD dot then the lower level parts of the domain name in reverse order but still followed by the / and the directory path as usual, even if in Arabic. Actually this is more logical all round and is how all URLs should have been but it is too late for that now.

[I would like to have shown you examples directly here but my editor and WordPress don’t work well with these scripts—I will need to work on that.] A good place to look is the Wikipedia page towards the bottom.

The implementation in browsers seems to vary and may also be dependent on what the server does as well. The ICANN Arabic test page http://مثال.إختبار/ works well in Firefox (Mac and Win) and Safari (Mac & iPhone)—the whole of the URL in the address bar after the http:// is in Arabic. In IE7 & 8 (Win) the address you see in the top bar is what looks like random Latin characters. For the tests I have done, Safari always gets it right, Firefox sometimes and IE never; I would be interested to hear of other results. An example of one that doesn’t work well in Firefox is the Egyptian Ministry of Communications and Information Technology http://وزارة-الأتصالات.مصر/ .

The code conversion is called Punycode and uses a rather strange algorithm to convert any Unicode text into ASCII. It is pretty unreadable but has to exist because the DNS system only allows ASCII so Punycode allows domain names in any character set (and any mix) to be uniquely resolved. I don’t know if this is always the case but the ones I have seen all start “xn--“. I imagine that, in time when implementations are sorted out, that this will become transparent to the user.

One worrying security implication of these “foreign” character codes in URLs is that some letters look very similar to Western Latin ones. So if you see a familiar link, to your bank say, it may not be quite what it seems. For example if the “ο” in “www.llοydstsb.com” is actually a Greek Omicron (which it is on this page) the fake address could direct you to a phishing site. It is possible that the behaviour of IE is deliberate to avoid this problem but I somewhat doubt it.

[This post has been revised since I discovered how to insert the Arabic characters. I will write up how it is done later.] [Updated to include IE7 & iPhone]

TapThe People have Spoken

7 May 2010 07:58 by Rick

Did anyone hear what they said?

It is clear to me what they said and it is exactly the same as they have said since I was first able to vote back in the 70s—the electoral system in this country is broken. Just look at the figures. At the time of writing (08:45) the results are CON 36%, LAB 29%, LIB 23%, other 12% and the prediction is not a lot different. That says we don’t want a block-busting majority. We are fortunate that for the first time for ages, the seat count says something similar, though in very different proportions. So let’s listen to the people and do something about it.

TapBlind copy

6 May 2010 12:11 by Rick

I hope that everyone reading this knows when and how to use blind copy (Bcc:) for emails. If not then you can review it here.

A feature I miss from modern email programs is the facility to create a blind distribution list which saves having to remember and also has a title. A mainframe mail system we used in the eighties had this feature. That way you just sent the email to the list and the program used the title in the “To:” field and sent to everyone using Bcc. A useful extension of this feature would be to allow you to specify in the distribution list who should get it in clear and who should get it blind. It would be very useful for circulating minutes etc.

TapRPM

28 Apr 2010 12:30 by Rick

When searching for information on the web one thing you need to ensure is the reliability of the information, but once you have combated that you also need to look at its relevance. I came across this problem today when looking at a US report about Apple retailing in Japan. Although the article was factual, many of the comments were ignoring the context and trying to apply the situation to their own experience. When searching the web many of the articles you will find (in English) will have American authors so you need to make sure that what they are saying is applicable here in the UK. I have already addressed the issue of copyright law in some detail, but here is another one.

Resale (or Retail) Price Maintenance (RPM)

Resale price maintenance is the practice where a manufacturer (or wholesaler) and retailers agree that the latter will sell at certain prices or set a minimum or maximum price. If a reseller refuses to maintain prices, the supplier will stop doing business with it. A minimum price is the usual case.

It is illegal in the UK (Resale Prices Act 1964) except where proven to be in the public interest e.g. for books (Net Book Agreement). Suppliers may propose a Recommended Retail Price but cannot legally enforce it. Although the benefit to customers has been seen in much lower prices due to competition, the unexpected effect has been the emergence of large market-powerful retailers such as supermarkets at the expense of smaller local shops and grass-roots suppliers such as farmers.

It is generally permitted in the US (2007 Supreme Court Judgement) and most other places in the world but it is constantly under debate.

A grey market is the trade of a commodity through channels which, while legal, are unofficial, unauthorized, or unintended by the original manufacturer. The main type of grey market is imported manufactured goods that would normally be unavailable or more expensive in a certain country. The measures that manufacturers can use to try to limit grey market supplies are restricted within EU borders by competition law but are mostly legal across other borders. Techniques tried include stop of supply to the source, loss of warranty to the end user, labelling regulations and trademark enforcement. DVD region codes are a technological way used to (try to) restrict grey imports. eBay and other online markets have a reputation of removing listings at the request of manufactures or agents without much investigation sometimes assuming that they are counterfeit goods.

TapDisaster upon Disaster

22 Apr 2010 10:26 by Rick

What happens if you get a real disaster during a disaster training exercise? Iowa State discovered yesterday. The exercise was a simulated bio-hazard at a major sports event. The real disaster was a simultaneous failure of all the 911 computer systems across the state. It also affected city and council management, fire and police services. Read the full story for the details. The culprit—McAfee Anti Virus false alarm. They will take a while to recover the negative PR from that one.

TapNostalgia

16 Apr 2010 11:39 by Rick

These are all machines I have worked on in the past—hover for more details.

Thanks to Mark Richards for the photos.

TapTime to think of political things

14 Apr 2010 19:02 by Rick

of cabbages and kings.

I have always felt that I align mostly with the Left. That seems the appropriate place to be as a Christian, just like the Methodist founders of the socialist movement. Just that, somehow, the Left doesn’t seem to align with me. Giles Frazer summed it up very well in his Church Times column last week.

When I decided that I had to be at least a little politically active, I just couldn’t bring myself to join the Labour Party. In practice there is no party that anyone can fully believe in all of the policies—that would be an interesting challenge question to the candidates. But we are stuck with a party system and, perhaps if we ever get some sort of decent proportional voting system we will be able to say that “I like 50% of the policies of A, 25% of B and 25% of C” or similar. There seems to be no other way. So slightly active I am, but I won’t canvas because. if challenged on the doorstep over a particular policy, I can’t be sure that I would agree with what the manifesto says.

TapPassion Sunday

29 Mar 2010 12:08 by Rick

I must admit to being completely confused by the (Anglican) church calendar. Passion Sunday used to be the fifth Sunday in Lent; Palm Sunday was the sixth Sunday i.e. yesterday. At our own church, I am not aware that we have ever really celebrated Passion Sunday according to the lectionary because it just seemed out of place. It is the story of the arrest, trial and execution of Christ i.e. the period immediately leading up to and including Good Friday. We have always left that to Holy Week. It was also before Palm Sunday, the entry into Jerusalem, which was chronologically inconsistent.

As it happens, last weekend we were away from home so I went to the local church where we were staying. This was a little more traditional than I am used to and they clearly follow the lectionary. I see now that the Passion story has been tacked onto the end of the Palm Sunday service. There is what is called the “Liturgy of the Palms”, the bit we are used to, and we sang “Ride on, ride on in majesty” and a couple more of those good old Palm Sunday hymns—then we did a phase shift and leapt into the “Liturgy of the Word” which was the Passion story for the sermon and communion.

I agree that it is now chronologically correct but are they expecting no one to attend on Good Friday let alone on the other days of Holy Week that they have to make sure they get the whole story in before Easter Sunday.

TapGentelmen

4 Mar 2010 22:09 by Rick

Seen in St Helier, Jersey

^ Top