This entry is part 1 in the series On Alien Languages

I’ve tried every way I can think of to set this topic aside until after I had finished the series on names, but it just doesn’t work.

So I guess it’s time to take a side-trip into the wonderful world of creating and simulating Alien Languages in RPGs…

I once, back in the early 1990s, wrote a piece of software for my Commodore-128 that created alien languages on demand. It took about three 16-hour days.

The catalyst was a library book on codes which contained a number of tables, including:

  • first syllables of a word by frequency of occurrence in English;
  • last syllables of a word by frequency of occurrence in English;
  • frequency of occurrence of individual letters of the alphabet in English;
  • frequency of occurrence of individual letters of the alphabet as the first character of a word, in English;
  • intervening syllables of words by frequency of occurrence in English and total word syllable count;
  • likelihoods of a specific letter of the alphabet being followed by another specific letter of the alphabet in English;
  • probability of a word containing a specific number of syllables by specificity and technicality of subject.

Adding to that was some information from other sources (I no longer recall what they were):

  • derivation trees showing inheritance of related words;
  • word relationship chart showing the degree of change in language over time with a concordance of key events stimulating that change.

The premise behind the software’s function was simple: a text editor that mapped a new language onto an English “translation” that had been entered.

Step One: Create the Language morphology

The first step was creating the language morphology, the shape of the language. The book on codes had talked about the creation of random words in English using these probability charts as a means of generating a code that could not be broken using frequency tables because the frequencies of occurrence would be the same as natural English. I thought that was a brilliant idea if reversed – so that an alien language (bereft of all word meanings and relationships) could be created simply by randomizing the numeric values in the different tables mentioned in the first list and using the result to generate a list of words.

The Morphology Algorithm

This starts with a table of every possible syllable, defined as a vowel or vowel set, plus a consonant or consonant set. This was generated using a simple nested set of loops and stored in a language file on the PC.

Another set of loops then added each possible consonant to the start of each of these to create a larger list of additional possible first-syllables for words. These were stored in a separate language file on the PC.

A new pair of loops then assigned random ratings out of 1000 to preset table structures to create the equivalents of the tables derived from the code book for the new language. A high value was “common”, a low value was “rare”. In the more sophisticated later versions of the program, these tables were presented onscreen for manual editing. Once complete, these were stored in additional language files on the PC.

Each syllable in the first table was then analyzed for frequency of occurrence in different places within a word: first syllable, first intervening syllable, second and subsequent intervening syllables, and last syllable, all according to the rules defined in the previous step. For convenience, two separate tables were output – one was the combination of all syllables from both tables one and two, and only applied to the first syllables of words, while the second dealt with syllables in the rest of the world. The updated language morphology files were then written back to disk.

This approach was used because it permitted backtracking part or all of the way, if the results proved undesirable – I could generate a list of the 50 syllables most likely to start a word and display them, then go back and tweak the ‘rules’ accordingly. I could extract a list of the words which had the highest start-of-sentence rating multiplied by the end-of-sentence rating to derive the most common monosyllable words of the language – then tweak the language morphology if I didn’t like what I saw, then the most common two-syllable words (combining the most common start-syllables with the most common end-syllables.)

I could also manually override individual results, something I did to ensure that the most common words were not recognizable English and were readily pronounceable.

Step Two: Generating The Language

The initial version of the language generator did nothing more than take a piece of English text, break it into individual words, then generate a new non-human word using the tables of probability of occurrence of individual syllables.

I quickly started adding refinements to this basic model.

Remembering a word allocation

The first refinement added seems fairly obvious – keeping a running dictionary of English-to-alien words so that the same English word didn’t end up with three different alien words. Each time an English word was offered for translation, this dictionary was consulted for a prior translation, and if one was found, it was used. Each time a word was generated and matched to an English equivalent, it was compared to the list of alien words already constructed and if it matched an already-allocated word, it was discarded and a new translation-word generated.

Preloading the translations: a working vocabulary

The second refinement was to preload the translations with a working vocabulary. I started with the 100 most common English words – things like “An”, “And”, “But”, and so on. I then added a “specialist interest” – defining one or two subjects which were fundamental to the race to whom the language belonged. For the elves, it was plants and plant parts and words associated with the plant side of biology such as “grow”, “bud”, “shoot”, “leaf”, and so on. For Dwarves it was minerals and mining and tools relating to that activity. Orcs were tactics, and war, and hunting, and so on. These were padded out using a thesaurus – the same one to which I refer to this day – until I reached somewhere between 500 and a thousand English words, which I defined as the language’s working vocabulary. Translating those – with a weighted algorithm to select shorter words – gave me a massive head-start in constructing alien languages. Oh, and I also included words relating to any special abilities the race might have, and any values the race lauded or looked down apon.

Elementary Grammar

After that, I worked out a way to define a basic grammar, which defined the English words as one of four things: Nouns, Verbs, Words relating to a racial specialty, all other words were lowercase. This was achieved simply by listing the English text with a number in front of each English word (starting the count at zero and increasing it by one) and then telling the computer which numbered words were nouns, then which ones were verbs, and so on. A further refinement still later dealt with grammatical relationships, connecting verbs with the subject of the action (what it was being done to), the tool or operator (what it was being done with) and so on.

All this permitted me to incorporate simple non-English grammatical rules to both the order of the words and the spellings – for example, I could set a rule that the subject of a verb always start with the first letter of the verb inserted to the word. These rules were necessarily hard-coded into each language – I always intended to work out a soft-coding solution but never got around to it.

A key principle was always to effect these grammatical changes to the English to be translated BEFORE any translation took place so that I could see what was happening, and make sure the ‘rules’ were working properly.


As part of the elementary grammar project, I inserted rules into the translation algorithm to permit standardized changes to indicate tenses, and grammatical rules to translate the English text into the current tense plus (for future) or minus (for past), and so on. These also grew in sophistication over time, permitting me in the case of a long-lived race to impart nuances such as “the past within my lifetime” and “the past in the time of my parents or ancestors” or “before I am no more” or “after I am gone”.

The general principle

The general principle was to use a body of English text as a test-translation. Each one would introduce some new concept to be taken into consideration, whether that was the relationship between verbs and their associated nouns, or tenses, or whatever; once built into the rules of the translator, I could move on to the next attribute of ordinary English.

One of the big improvements was to subdivide that list of words from key subjects from the initial elementary grammar and use them as metaphors for root words describing more complex, advanced, or subtle terms, in exactly the same way that real languages develop. this was achieved by numbering the English root words from the key subjects and appending the appropriate number to a related English word as a step in a translation, building the language up, one word at a time.

This produced some interesting and insightful language elements along the way – the Elvish words for battle, war, and violence in general became derived from their words for “Spring” (the season) and the competition between plants for sunlight. This provided a key insight into their perception of the world – what a human might see as a peaceful glade became a battleground of unceasing violence between plants, simply because the Elvish perception of time was different, a longer view if you will. This was a random choice on the part of the language-generation software which I could have overridden if it seemed inappropriate – if I had, it would have produced a different but still sensible alternative, resulting in a distinctly different conception of elves.

Similarly, the root word for weapons, battle, and so on, was randomly chosen to be “stick”, with a hyphenated preliminary syllable describing the construction material and a hyphenated subsequent syllable describing the shape of the “stick”, followed by another describing the type of movement or action required to use it. A sword might thus be literally described as a “metal-stick-sharpedged-slash” – which, in Orcish, might be “Zhu-est-con-zah”.

The Aging of a language

As time passes, languages become more streamlined, some words pass out of favor and others are introduced to describe new relationships, perceptions, or phenomena. While fully simulating this process was way beyond the program that I wrote, some rudimentary consideration was given to how these phenomena would manifest on words and phrases that were already old.

Initially, I did this manually, simply by saying translated words and phrases quickly (aloud) and seeing how things ran together. If there was a natural divide, where the tongue stumbled, either the language would change to become more sophisticated or the word would change to become more easily pronounced. For example, take that Orcish word for sword – “”Zhu-est-con-gah” – either the word for sword becomes “Conzar”, one of a class of objects which can be described as “Zhooest”, or the word itself runs together – “Zoostonga” – and then possibly just “Zoostong”.

But then I found a way to simulate this using a random action within the language generation system itself. The notion is that certain consonant pairs would be depreciated in favor of a simpler combination or just one of the pair. Where these consonant pairs couple two syllables of a word together, the resulting word becomes more streamlined. A random determination – which could be weighted or have its own logical rules applied – would decide whether the language or the word would evolve.

This language aging was further reinforced by having the rules of the language evolve over time, updating the core tables that are used to generate words. I determined, based on the information in those other sources, that a language would evolve between 0.1 and 0.5 % each year. Certainly, that seemed about right for English, where text from a century earlier (HG Wells or Mary Shelley’s Frankenstein, for example) had a slightly different flavor but was still mostly understandable without difficulty, while the writing of Shakespeare and Chaucer is much less so – and if one goes back a little further still, into medieval times, it is almost unrecognizable.

Under the principles of word breakdown (language evolution) vs word streamlining, the degree to which a word was subject to streamlining – its age – could be randomly assigned, as could a timeline of the degree of language drift each year. Accumulating drift as a word grew older then gave a percentage chance of that word being streamlined, and the depreciation rules would then indicate how that streamlining would occur within that particular word.

The Integrated Evolution of History

With this as a basis, it was even possible to indicate “key years for key subjects” – years in which great progress was made in one subject or another, and which therefore had an unusually high chance of language evolution. Words describing agriculture would necessarily become more diverse and specialized with the invention of crop rotation, or irrigation, or any of a number of other developments. Metalworking would similarly have its watershed years.

At first, it seemed like there would be altogether too many such to be useful, but it soon dawned on me that I only ever needed to deal with a small subset of the possible watershed years and subjects. They key point that I had initially overlooked was that I wasn’t constructing a whole language – I was constructing a mechanism that developed a basic core of a language that was extended as necessary to translate key elements of dialogue. I would never need to write a book in Elvish or Dwarvish or whatever – just some key phrases and perhaps a page or two of old text.

This realization made it possible to simply keep track of when certain words evolved, establishing a timeline of changes within the translation for just those subjects touched on in whatever I happened to be translating at the time.

In modern times

All this was done long before the internet really reached the masses. I wouldn’t, and don’t, do things this way any more. So how do I create non-human languages in the modern era?

The seeds of the technique have evolved out of the work that I did back in the 90s. But, rather than explain it now – because I’m out of time – I’m going to demonstrate it with material from my Shards Of Divinity campaign.

So here’s what’s going to happen from here: starting in a week or two, and continuing every 2nd week or so until they are finished, I’m going to be presenting one of the national states from that campaign. I’ll be supplying exactly what I gave the players, but where the information was presented to them by subject – everything on politics for all of the Kingdoms at the same time, for example – here, everything will be organized into a kingdom dossier. That includes notes on the naming of characters and instructions on translating the language, which is the whole point of the series – everything else is there to provide context. I’ll round out each one with some discussion on the principles used, and some of the background of the different ideas and why I chose those particular nations – and I might even slip in some additional notes and hints that the players haven’t received yet.

Oh, and I’ll precede the whole thing with a quick introduction to the overall political concept, which is so deep that the players haven’t fully grasped it yet!

So buckle up – this discussion is about to take a left-hand-turn at high speed…

Related Posts with Thumbnails
Print Friendly