July 2010 Archives
In Unicode and the Shavian Alphabet, I wrote about the incompatibility between two online translators: shavian.org's one that translates English into the Shaw alphabet, and Pīnyīn.info's one that translates characters into their Unicode numbers. To summarise: the Shaw alphabet, also known as the Shavian alphabet, was invented in a competition to design an alphabet in which English is spelled as it sounds. I used it as an alien programming language in a cartoon, generating my text with shavian.org's transliterator. I then tried to convert the transliteration into Unicode numbers by pasting into Pīnyīn.info's translator. But the result had the wrong codes, and twice too many of them. Thomas Thurman, author of shavian.org's transliteration script, mailed me to explain why:
With reference to your column at http://www.drdobbs.com/blog/archives/2010/06/unicode_and_the.html : the reason the translator at http://www.pinyin.info/tools/converter/chars2uninumbers.html choked on the Shavian characters you gave it is because all Shavian characters have codepoints above 0xFFFF, and therefore (if you're using UTF-16, which the pinyin.info translator appears to be) they won't fit in a single word and will have to be represented using surrogate pairs. Wikipedia has a reasonable coverage of surrogate pairs: http://en.wikipedia.org/wiki/Surrogate_pair , but briefly, it's a way to represent a Unicode character whose codepoint is too high by using a pair of otherwise illegal characters, both of whose codepoints are low enough. Hence the effect you noted of having "the wrong codes, and twice too many of them".
The fault is presumably with the pinyin.info translator, which shouldn't give out surrogate pairs unless explicitly asked, but it does go to show that, as Wikipedia puts it, "code is often not tested thoroughly with surrogate pairs. This leads to persistent bugs, and potential security holes, even in popular and well-reviewed application software", or as you put it, "computing still is not mature".
Thomas (author of the transliterator script on shavian.org).
How does Google decide what's news? Google "Accounting Error" under "Web", and the spreadsheet joke that I blogged this morning appears as the fourth and fifth search result, under the heading News for "Accounting Error". Google the same words under "News", and it becomes the first entry. I suppose it's because I used the phrase "diplomatic visit" plus some reasonably recent famous names: Mother Teresa, Pol Pot, Milošević; and perhaps even Turing, and Morecambe and Wise (I could, after all, have been writing about intellectual-property rights in the media, or some such). Anyway, Google, I claim my free prize for subverting your news-recognition algorithm.
I also wonder whether the piece you are now reading will itself become news. Could Google–news-spoofing become the new Googlewhack?
Satan is paying a
diplomatic visit to St. Peter. They sip tea
together in St. Peter's palatial office, and
Satan gazes out of the wall-sized window at the
cloudscape beyond with its bewinged,
behaloed and beharped inhabitants. He
spots Joan of Arc, Mother Teresa,
Morecambe and Wise, Alan Turing,
— "You've got Tamburlaine? He's meant to be one of ours."
— "Head Office computerised our accounting functions. They sacked the Recording Angel and replaced him by a cumulated–sin-score spreadsheet. But the Excel developer missed some cells out of the SUM range. It's fixed now, but no-one noticed the bug for years —"
He gestures, and Satan sees Pol Pot, Genghis Khan, Vlad the Impaler, Milošević, ...
I want to move. But I can't decide where to, for there are problems in my environment, and they seem universal.
Posted by Bill Benson to the Microsoft Excel Developers' List EXCEL-L.
After having dug to a depth of 10 feet last year, New York (USA) scientists found traces of copper wire dating back 100 years and theorised the existence of a telephone network 100 years ago.
Not to be outdone by America, a retired doctor living in London dug to a depth of 20 feet in his garden, and amazingly found traces of copper also. His exploits were published in the Marylebone, Paddington and Pimlico Mercury and the Fulham and Hammersmith Chronicle, both of which proclaimed: "Retired physician finds incontrovertible evidence that British telecommunications predated the War of 1812."
Not to be outdone by a Londoner, Yorkshire made its own contribution to the advancement of archaeological discovery. The county commissioned Charles Sickels, an unemployed excavator, to dig to a depth of 30 feet. He uncovered, in a word, nothing. This prompted the Yorkshire Post and the Huddersfield Daily Examiner both to run separate pieces trumpeting that 300 years ago, Yorkshire had already gone completely wireless.