Douglas Crockford

Blog

Books

Videos

2019 Appearances

JavaScript

JSLint

JSON

Github

How JavaScript Works

About

Turkish

Kemal Atatürk was the founder of the modern Republic of Turkey. He instituted a large set of political and social reforms almost a hundred years ago, including the adoption of a new alphabet. The previous alphabet was based on the Arabic script. The new reform alphabet was based in the Latin characters, adapted to the specific requirements of the Turkish Language.

Aa Bb Cc Çç Dd Ee Ff Gg Ğğ Hh Iı İi Jj Kk Ll Mm Nn Oo Öö Pp Rr Ss Şş Tt Uu Üü Vv Yy Zz

Some of the letters are decorated with ˘breve, ¸cedilla, and ¨umlaut. The letter I is split into two distinct letters: the İ with a dot and the I without a dot. Notice the case pairings: Iı İi. The letters Qq Ww Xx are not used.

The new alphabet was intended to increase literacy in the country. It supported the arrow of secularism. And it acted as a ratchet, making it much harder to reverse Atatürk's reforms. The new alphabet was put in place with remarkable speed. It fulfilled all of its goals.

But then something unexpected happened in 1991.

The Problem

Unicode has become the world's character set, making it possible for every program to be useful in processing data expressed in all of the languages. It also makes it possible to have expressions in many different languages in the same document or database.

Before Unicode, the Turkish writing system, like all other national writing systems, was manual, then mechanical, then computerized. Early Turkish computers worked exclusively with the Turkish alphabet. Atatürk's alphabet was working brilliantly. Unfortunately, Atatürk could not have anticipated Unicode.

Unicode makes some assumptions about the representation of upper and lower cases, there being a common mapping from one case to another. Every language that cares about the letter A can agree that upper(a) = A and lower(A) = a.

The letter I is different. Turkish has two, the letter I and the letter İ, and the case rules are incompatible with the rest of the world. Software that works well in the world can fail in Turkey because of that incompatibility. When changing the case of a text for comparison, normalization, or display, if the world's case mappings are used on Turkish text, then the mapping will be incorrect, which can cause misspellings and software failures.

A popular mitigation is to provide additional mapping functions: locale_upper and locale_lower. These function work differently in some places than in others, which is extremely alarming if you are concerned with testability. The locale functions work the same as the original functions in most places, but in Turkey they correctly implement the Turkish case rules. So in theory, it is possible to write software that works correctly everywhere, even in Turkey.

In practice, it does not work. It is likely that an average coder does not know or care about the locale functions. It is even likelier that the coder's manager is saying "don't bother about Turkey, focus on meeting deadlines instead."

Also, the locale functions do not really work. What matters is not where you are or what language system you have opted in to. What matters is the language being acted upon. It is possible to use English in Turkey, ve Amerika'da Türkçe kullanmak mümkündür. It is possible to use multiple languages in the same document and it is possible to have words from multiple languages in the same sentence. The locale functions can not solve that problem without putting an undue burden on the coders, and most coders are not incented to take on that burden.

The Solution

I believe that if Atatürk were alive today, he would recognize that there is a bug in the Turkish alphabet, and he would move swiftly to repair it, improving the value of the Internet to the Turkish people.

The solution is to replace with Ww and make Ii conform to international convention. With this change, the Turkish case mappings become compatible with all of the other Latin-based languages.

Modern
Turkish
The Solution IPA Phonetic Description
lower
case
upper
case
lower
case
upper
case
i İ i I /i/ Close front unrounded vowel
ı I w W /ɯ/ Close back unrounded vowel