[ Jocelyn Ireson-Paine's Home Page | Publications | Dobbs Code Talk Index | Dobbs Blog Version ]

Unicode and the Shavian Alphabet

With ten tentacle tips effortlessly keeping track of the goto's and their destinations in his spaghetti Hello World program, Fnork cannot understand why mere Earthlings have so much trouble with labels. For last week's cartoon Labels, I wanted a convincingly alien writing system for Fnork's programming language. But I didn't want to use Armenian or Georgian or Telugu, beautiful though they are. Or Tamil, which I've always thought of as visual jazz, all neon lights and Rhapsody in Blue made graphic. Some reader would be bound to recognise them as real scripts. So I used the Shaw alphabet.

The Shaw alphabet is also called the Shavian alphabet, and was invented by Ronald Kingsley Read, winner of a worldwide spelling-reform competition provided for in the will of George Bernard Shaw, who thought English spelling a waste of time and space. As evident from Sound-writing 1892-1972: George Bernard Shaw and a modern alphabet, in which Read recalls:

The Will was wilfully made in language more Shavian than legal in so far as its Clauses 35-38 dealt with the alphabet. Beginning with Sub-section 35(1), it calls in effect for some estimate of the world's man-hours wasted in writing and printing English with an alphabet of 26 instead of 40 or more letters; and a valuation in money of those wasted hours. This impossible task was entrusted to Mr P A D MacCarthy who, having investigated, could only report that no reliable data exists for any meaningful estimate.

Of course, Shavian never superseded Roman. But it's graceful — more so than Klingon, which I rejected on the grounds that the people my cartoons depict are all peace-loving, not wont to incinerate the next-door planet just because somebody trod on their tentacles — and I was pleased to find that it is in Unicode. Klingon, thankfully, is not.

Indeed, here is a PDF of the Shavian Unicode chart, linked from the Unicode Consortium's Unicode 5.2 Character Code Charts page. The Consortium must like Shavian, because they have translated their What is Unicode? into it. But you may not be able to read the translation unless you install a Shavian font. I used the freeware Shaw Sans No. 2, a TrueType file linked from Fonts for the Shavian alphabet. Shaw Roman No. 1 is also free, but I feel the serifs make it cumbersome. To install the fonts under Windows XP, I followed the instructions at About.com's How To Install TrueType or OpenType Fonts in Windows.

Googling an image search for "Shavian alphabet" will find several keys to the letters and their sounds. Moreover, shavian.org have an online translator. Again, this probably won't give anything useful if you haven't a Shavian font. I've just tried it on the word goto, and realised that the manual translation in my cartoon has a bug. The two o's have different sounds, so should be represented by different letters.

As well as Shavian, I used some Unicode geometric shapes, Japanese double quotes and comma, the Star of David, right floor, and left ceiling. The result, generated from the numeric Unicode here, makes an interesting browser test:
𐑤𐑤 𐑦𐑓 ⌈ 
𐑯.𐑯𐑰.⬠ ⌋ 
𐑜𐑴𐑑𐑴 𐑤𐑱
   𐑜𐑴𐑑𐑴 𐑤𐑯 
𐑤𐑱 𐑐𐑮𐑦𐑯𐑑 
✡、 𐑕◣𐑯
   𐑜𐑴𐑑𐑴 𐑤𐑤
𐑤𐑯 𐑕◄『 』
   𐑐𐑮𐑦𐑯𐑑 ✡、 
𐑤𐑴 𐑦𐑓 ⌈ 
𐑯.𐑯𐑰.□ ⌋ 
𐑜𐑴𐑑𐑴 𐑤𐑐
   𐑜𐑴𐑑𐑴 𐑤𐑰 
𐑤𐑐 𐑐𐑮𐑦𐑯𐑑 
✡、 𐑕◣𐑯
   𐑜𐑴𐑑𐑴 𐑤𐑴
𐑤𐑰 𐑧𐑯𐑛

There is a different online translator at Pīnyīn.info. It's adapted from a script by Steve Minutillo, and converts characters to decimal Unicode. For example, in my essay Dress Code, about image over comfort and why people have to wear suits and ties, I mentioned the Arabic word سِرْوَال. This denotes a baggy trouser, worn in North Africa and other parts of the "East", which is very comfortable in the heat but probably about as welcome at the average business-computing do as a "Google Wins" T-shirt at a Microsoft product launch. The Pīnyīn.info translator tells me that in Unicode, this word becomes سِرْوَال. Which is easier to insert into an HTML file than the raw Arabic. I also used the translator just now, so that I could write the i-macron in Pīnyīn.info's name. As decimal Unicode, this is ī.

Unfortunately, there is an incompatibility between the Shavian translator and the Unicode translator. Because if I type goto into the Shavian translator, I get a string that is the correct Shavian. But if I paste that into the Unicode translator, it gives me �������&#5643. This has the wrong codes, and twice too many of them. What a great user experience! And I'm not criticising the authors of these translators when I shout that and then stomp off to the pub to drown my frustration. I'm merely pointing out that computing still is not mature. But one day, we'll be free from such incompatibilities. It will be the same day that nobody has to wear a tie, Microsoft lies down with Google, and the English reform their spelling.