July 2010 Archives
In Unicode
and the Shavian Alphabet, I wrote about the incompatibility
between two online translators: shavian.org's one that
translates English into
the Shaw alphabet, and Pīnyīn.info's one that
translates characters into their
Unicode numbers. To summarise: the Shaw alphabet, also
known as the Shavian alphabet, was invented in
a competition to design an alphabet in which English
is spelled as it sounds. I used it as an alien
programming language in a
cartoon, generating
my
text with shavian.org's
transliterator.
I then tried to convert the transliteration into Unicode
numbers by pasting into
Pīnyīn.info's
translator.
But the result had the wrong codes, and
twice too many of them. Thomas Thurman, author
of shavian.org's transliteration script, mailed me to explain why:
With reference to your column at http://www.drdobbs.com/blog/archives/2010/06/unicode_and_the.html : the reason the translator at http://www.pinyin.info/tools/converter/chars2uninumbers.html choked on the Shavian characters you gave it is because all Shavian characters have codepoints above 0xFFFF, and therefore (if you're using UTF-16, which the pinyin.info translator appears to be) they won't fit in a single word and will have to be represented using surrogate pairs. Wikipedia has a reasonable coverage of surrogate pairs: http://en.wikipedia.org/wiki/Surrogate_pair , but briefly, it's a way to represent a Unicode character whose codepoint is too high by using a pair of otherwise illegal characters, both of whose codepoints are low enough. Hence the effect you noted of having "the wrong codes, and twice too many of them".
The fault is presumably with the pinyin.info translator, which shouldn't give out surrogate pairs unless explicitly asked, but it does go to show that, as Wikipedia puts it, "code is often not tested thoroughly with surrogate pairs. This leads to persistent bugs, and potential security holes, even in popular and well-reviewed application software", or as you put it, "computing still is not mature".
Thomas (author of the transliterator script on shavian.org).
How does Google decide what's news? Google "Accounting Error" under "Web", and the spreadsheet joke that I blogged this morning appears as the fourth and fifth search result, under the heading News for "Accounting Error". Google the same words under "News", and it becomes the first entry. I suppose it's because I used the phrase "diplomatic visit" plus some reasonably recent famous names: Mother Teresa, Pol Pot, Milošević; and perhaps even Turing, and Morecambe and Wise (I could, after all, have been writing about intellectual-property rights in the media, or some such). Anyway, Google, I claim my free prize for subverting your news-recognition algorithm.
I also wonder whether the piece you are now reading will itself become news. Could Google–news-spoofing become the new Googlewhack?
Satan is paying a
diplomatic visit to St. Peter. They sip tea
together in St. Peter's palatial office, and
Satan gazes out of the wall-sized window at the
cloudscape beyond with its bewinged,
behaloed and beharped inhabitants. He
spots Joan of Arc, Mother Teresa,
Morecambe and Wise, Alan Turing,
Tamburlaine —
— "You've got Tamburlaine? He's meant to be
one of ours."
— "Head Office computerised our
accounting functions. They sacked the
Recording Angel and replaced him by a
cumulated–sin-score spreadsheet. But
the Excel developer missed
some cells out of the SUM range. It's fixed now,
but no-one noticed the bug for years —"
He gestures, and Satan sees Pol Pot,
Genghis Khan, Vlad the Impaler, Milošević, ...
I want to move. But I can't decide where to, for there are problems in my environment, and they seem universal.
Back when Dobbs ran an Artificial
Intelligence Newsletter, I wrote an article about
how to program Sony's robot dog,
the Aibo.
Sony were kind enough to send me a CD containing two images of Aibo.
I came across this yesterday, and thought you
might like to see them. Here is Aibo on his own:
Posted by Bill Benson to the Microsoft Excel Developers' List EXCEL-L.After having dug to a depth of 10 feet last year, New York (USA) scientists found traces of copper wire dating back 100 years and theorised the existence of a telephone network 100 years ago.
Not to be outdone by America, a retired doctor living in London dug to a depth of 20 feet in his garden, and amazingly found traces of copper also. His exploits were published in the Marylebone, Paddington and Pimlico Mercury and the Fulham and Hammersmith Chronicle, both of which proclaimed: "Retired physician finds incontrovertible evidence that British telecommunications predated the War of 1812."
Not to be outdone by a Londoner, Yorkshire made its own contribution to the advancement of archaeological discovery. The county commissioned Charles Sickels, an unemployed excavator, to dig to a depth of 30 feet. He uncovered, in a word, nothing. This prompted the Yorkshire Post and the Huddersfield Daily Examiner both to run separate pieces trumpeting that 300 years ago, Yorkshire had already gone completely wireless.
