Follow TV Tropes

Following

So You Want To / Create a Conlang

Go To

So you've decided to create a conlang, whether to add some depth to your own setting, because you were paid to do so for someone else (lucky!), or maybe just for funsies. You love to read The Lord of the Rings and watch stuff like Star Trek, Defiance, Game of Thrones, and Thor: The Dark World, so you've decided to craft your own language.

First, be sure to check out Write a Story for basic advice that holds across all genres. Then, get a look over a rundown of the genre-specific tropes that will help you, hurt you, and guide you on your way.

The focus of this article is on naturalistic conlangs — that is, languages with an aim to look realistic, especially as compared to other languages of the world. Some famous conlangs, such as Esperanto and Lojban, were designed to be as logical as possible, rather than to feel like natural languages.

A few notes before we begin:

  • The best way to get prepared to is to learn another language. It's the best way to allow you to get to grips with some of the concepts described here and to move beyond the confines of the English language. As the Language Construction Kits notes, conlangers who only know English are "pretty much doomed to produce ciphers of English."
  • This article uses the International Phonetic Alphabet, or IPA, to represent sounds.
  • The user who started this article, Iura Civium, will probably make reference to some of his languages over the course of the article. If you wish to expand this article, feel free to use examples from your own languages.

Necessary Tropes

Choices, Choices, Choices

There are four stages to creating a conlang:

  1. First, you should start with the phonology, deciding which sounds are used in the language. You can have ideas for other aspects of the language in question before you start, but the sounds are going to be necessary for actually making parts of the languagenote .

  2. Once you have this done, you can start with the grammar (or syntax). To a first approximation, this is how the language works internally—how you put the pieces together to encode relations between the words. This topic covers things like word order, syntactic alignment, typology, and grammatical number.

  3. Next (or concurrently), you can work on morphology. This is the shape of the functional bits of the words in your language — prefixes, suffixes, infixes, function words, consonant mutation, things of that nature.

  4. Finally, you have the task of creating a lexicon — the actual words.

Should you want to, you can try working on a method of writing your language.

Pitfalls

  • Don't try to do or have everything. There is a lot of different sounds and concepts in linguistics. Don't try to shove everything into a language. This is not to say you can't have languages that involve a lot of sounds or concepts—just don't bite off more than you can chew. In conlanging circles, this pitfall is known as a "kitchen sink conlang".
  • Think about the pieces you have and how they fit together. You may find that, instead of adding a new piece to the language, you can express an idea using pieces you already have. If sounds change along with the grammar, think about what logical patterns they might fit.
  • It is a good idea to avoid Translation: "Yes". This should be self-explanatory. (Of course, if you're making, say, a joke language, feel free to ignore this.)
  • Don't be afraid to make mistakes or change your language. It is good to have ideas as to where you're going with a conlang, but don't let them crush your creativity. If something happens to be unwieldy, be open to changing it. If something happens to present an interesting option for your language, consider seeing where it goes.

Departments

Phonology

This is the sound set of your language and how the sounds work together. There's two major types of sounds, consonants and vowels, as well as things like stress, tone, and phonation. In addition, there are things like syllables, prosody, and sandhi.

For the most part, the focus here is on human languages. If you are designing a language for aliens whose mouths have different features to humans, they may not have consonants that are common in human languages, while also having consonants that are either rare or completely new. For example, an alien species with no tongue will only be able to make consonants that involve the lips and throat, and they will also be limited to what vowel sounds they can make. It may take a more detailed knowledge of how consonants to have a realistic idea of what the differences might be.

If you are designing an alternative writing system for your language, you do not need to create it at this stage (more advice on this later). However, you will need to devise a way to represent it in the Latin alphabet (romanization). Don't add letters to the Latin alphabet, although you can use markings on the letters and combinations of letters. The letters used for vowels should be based off the IPA rather than English.

International Phonetic Alphabet

The IPA is a valuable tool for displaying how words in a conlang are pronounced. It works by giving each phonetic sound a single letter. For instance /g/ represents a hard g sound, and never a soft g (or English j) sound, which is represented by /dʒ/. Some sounds (like the previously mentioned /dʒ/) may instead be represented by two letters if they are actually made up of more than one 'pure' phonetic sound. For instance, the 'ch' sound is represented as /tʃ/ because it's actually made up of a 't' and 'sh' sound. The 'ow' sound (as in now) is represented as /aw/ because it's actually a combination of an 'ah' and a 'w' sound juxtaposed. A compound vowel sound like this is called a diphthong.note 

IPA is fairly intuitive for consonants. The main oddity (for English speakers) note  is /j/, which represents the 'y' in yes. A few other symbols are brought in, such as /ʃ/ for the 'sh' sound. The vowels are trickier. English uses 5 vowels (and sometimes 'y' as well) to represent 4 times as many vowel sounds, including about 12 that aren't diphthongs. Not only that, but IPA is based off how the vowels were pronounced in Latin. English has significantly deviated from this, thanks to the Great Vowel Shift. For instance, the vowel sound in see is represented by /i/, because that's how "i" was pronounced in Latin.

Consonants, vowels, and the sonority hierarchy

There are many, many possible consonants. There are slightly fewer vowels. A realistic conlang is unlikely to use the exact same set of sounds as English.note  A conlang that is designed to sound exotic or alien, like Klingon, may include some unusual consonants while eliding common sounds like /k/.

One thing you're going to want to do to keep your phonology naturalistic is not to just dump a pile of random consonants and vowels into your language. There tends to be some form of symmetry in consonant inventories, though you don't have to have complete symmetry; gaps in consonant systems are perfectly naturalistic as long as you don't go overboard.

The following things to consider about consonants:

  • Voiced or Voiceless... On a consonants chart, consonants come in pairs. For instance, /s/ is the voiceless counterpart to the voiced consonant /z/. Two counterparts like this may sometimes get swapped into each other. Notice how the "s" in "cabs" gets turned into a "z" when it comes after the voiced "b" consonant, and compare that to how it sounds in "maps". Most languages don't bother having this contrast for nasal consonants, but Icelandic and Welsh, among others, do contrast the voicing of nasals. Voicing contrast of laterals (L-like sounds) and rhotics (R-like sounds) is also rare, but does occur in some languages.
  • ...or aspirated? There is a third option for stop consonants. They can be aspirated too. English has both the aspirated /pʰ/ sound and the non-aspirated voiceless /p/. This is the difference between the 'p' sounds in pie and spy. You don't notice, because English doesn't treat the sounds as contrasting. But some do. Chinese, in particular, draws a distinction between aspirated and non-aspirated stops and not between voiced/voiceless stops. This is why Beijing is written with a b, even though English-speakers might think the sound is more like a p. The older spelling of Peking reflects this. South Asian languages often have "breathy" voiced aspirated sounds—the bh in Hindustani Bharat, for instance, is pronounced as /b̤/.
  • The "r" sound: The weak 'r' sound found in most varieties of English is actually very rare. This is represented as /ɹ/.note  Languages more often have a tapped r /ɾ/, employ Trrrilling Rrrs with /r/ or a guttural /ʁ/. Spanish has two r sounds.

Vowels are often divided into front vowels ("ee", "eh"), back vowels ("oo", "aw") and central vowels (which tend to sound duller). "ah" sounds can count as either, though most often as back vowels. Central vowels are rarer, and you usually have an equal or greater number of front vowels than back vowelsnote .

  • How many vowel sounds a language has can vary considerably. As mentioned already, English has quite a lot of monophthongs (i.e. vowel sounds that aren't diphthongs). Most other Germanic languages have a fairly big inventory, Dutch having 19 monophthongs and 4 diphthongs. Others have less; Spanish only really has 5, while the late Ubykh got by with 2note .
  • Languages that are stressed-timed tend to have more vowels. This means that syllables vary in length depending on stress. Unstressed syllables will often use duller vowel sounds than the stressed syllables. In languages that are syllable-timed, every syllable is the same length. This is partly why Spanish only needs 5 vowel sounds. Compare the English pronunciation of "fajita" (fuh-HEE-tuh) with the Spanish pronunciation (fah-hee-tah).
  • Despite having so many vowel sounds, English does miss out a few. These include the "ö" and "ü" sounds from German (actually two pairs of long and short sounds).
  • Some languages such as Chinese make use of the tone that a vowel is said with. Thus "ma" can mean mother, "hemp", "horse" or "scold", depending on the intonation it is said with. A similar example in English would be the difference between "What." and "What (the hell)?!"

If you still can't decide on an inventory, it might be helpful to know that the most common consonants are /p, t, k, s, m, n, l/ and the most common vowels are /a, i, u/. All languages have most of these soundsnote , and most languages have all of these sounds.

Syllables

A syllable can be divided into two parts: The onset, or everything before the vowel (or syllabic consonant if you want to get fancy), and the rhyme (or rime), which is everything from the vowel to the end. Rhymes can be further divided into two parts, the nucleus, which is typically a vowel, and the coda, which can either be consonant(s) or nothing. Linguistically, languages have a tendency to have larger onsets than codas. This is because generally speaking, it's easier to hear a consonant when it comes before a vowel rather than after it. However, there are exceptions; Turkish, Persian, and Arabic generally only allow one consonant at the beginning of syllables while also allowing multiple consonants at the ends, and through at least one analysis, Arrernte only has coda consonants.

Prosody

This is sort of the rhythm and intonation of speech when a language is spoken. American English, for example, likes to have a tonal upshift at the end of questions, and stress matters in words (think próceeds versus procéeds).

Sandhi

Sandhi covers the various sound changes which occur at morpheme and word boundaries — for example, the various ways the -s plural can be pronounced in English depending on the sound made before the plural, the way English uses 'a' and 'an' depending on what comes after the word, the way French drops most consonants on the ends of words, the way Japanese will sometimes substitute 'g' for 'k' in compound words like hiragana (hira + kana) and so on.

Morphology

Morphology is the way that words are formed, depending on how they are inflected or derived from each other.

Inflection and Derivation

Inflectional morphology changes the grammatical function or role of a word. For example, in English one can change the word "walk" to past tense by adding the affix "-ed" ('walked') or to progressive aspect with "-ing" ('walking'). Inflection may also be irregular some senses, such as "run" becoming past tense with "ran."

Derivational morphology changes the meaning or type of word. To state that someone writes habitually, you can add "-er" to the verb "write" to make the noun "writer," and to state that someone gives glory, you can add "-ify" to "glory" to make the verb "glorify." Some can change within the same category, such as making adjective "reddish" from adjective "red," or verb "encircle" from verb "circle." Multiple affixes can go onto a word, such as making "conceive" — think about — into "inconceivable" — cannot be thought about.

Types of Morphology

There are four main types of morphological typology: isolating, fusional, agglutinative, and polysynthetic. These are not strict categories, but rather a spectrum from strictly analytic to extremely synthetic.

Isolating

An isolating language marks morphemes most often as separate words in their own right rather than inflecting the noun itself. East and Southeast Asian languages tend to be mostly part of this, such as the Chinese languages and Vietnamese, where most every word is separate. Rather than using an affix, Mandarin uses particles like men to clarify grammatical information.

Agglutinative

Agglutinative languages inflect and derive words with affixes that each carry their own independent meaning. Turkish, Korean, and Quechua, for instance, add information to their words by adding affixes for number, case, and more. For example, Turkish "evlerime" means "to my houses", with the plural suffix -ler, the possessive suffix -im, and the dative suffix -e. Agglutinative languages are generally regular with few exceptions.

Fusional

Fusional languages often carry multiple meanings within a single morpheme. This is found in many Indo-European languages like German and Latin, and Semitic languages like Arabic and Hebrew. In Spanish, the word hablo means "I speak," with the suffix -o meaning first person, singular number, and present tense.

Fusional languages can also be rather quick to speak despite having to learn the affixes by brute force. Often, this can result in multiple declensions or conjugations, depending on gender. French verbs can have up to about 40 different forms, and Latin verbs can have over 100. In Spanish and Latin, subject pronouns are usually dropped because the verb tends to make it clear what the subject pronoun would be.

Fusional languages are also more likely to have grammatical irregularity. The most frequently used verbs often end up being irregular. In some cases, irregular verbs do follow a pattern, but it's a rare one or impossible to tell if you don't know of it. Occasionally, some verbs are irregular (like to be in English) because they were formed by different conjugations of verbs merging; these are called suppletive verbs.

Polysynthetic

Polysynthetic languages are mainly found in North American languages such as Nahuatl, Navajo, and Greenlandic. These can carry many affixes at once and often form a single, inseparable sentence.

While these are useful categories, they are not perfectly delineated. Despite being part of the more fusional Indo-European family, English leans towards being analytic. Most verbs have only four forms (walk, walks, walking, walked). Some have five (do, does, did, done, doing), and others have three (put, puts, putting). The odd one out is to be, which has eight. A lot of grammatical information is formed by auxiliary (helping) verbs, either with the verb inflected (I was walking) or not (I will walk).

In analytic languages, by contrast, verbs may not change at all. Chinese uses separate words to mark whether an action is in the past or present, but also often relies only on the context. In English, the present tense is sometimes used to talk about future events ("We are catching the train at 10 o'clock tomorrow."). Chinese goes further by not actually having a future tense.

Grammar

If words are the building blocks of language, then grammar is the mortar that holds them together. It tells the speaker how to put the words together in ways that make sense through specific categories. These can be roughly divided between action words (verbs) and substantive words (nouns, pronouns, modifiers, etc.), and can provide room for various kinds of expression beyond what the native speaker may think.

Nouns

A noun is a person, place, thing, or idea. Nouns can be the subject of the sentence (the ones doing the action) or the object (the one on the receiving end of the action).

Number

In English, like most languages, a noun changes if it is plural. A few nouns have an irregular plural (children) or none at all (sheep) but most follow a regular set of rules. Most European languages have singular and plural nouns, but some, like Welsh, also follow collective-singulative morphology, where the plural is unmarked and one part of a whole is marked insteadnote .

Still others have completely different numbers, like Modern Standard Arabic's dual number for two of something, or Kurmanji's paucal for a few of something; Na'vi has a mandatory trial number for exactly three of something. The dual is by far the most common of the non-singular/plural numbers, given that many things come in natural pairs, such as the parts of the body. Even languages that don't have the actual dual often distinguish it; think both vs. all; between vs. among; either vs. any.

Gender

Most people who have already learned a foreign language will have already met 'gender marking'. Many languages like French and Spanish divide nouns into 'masculine' and 'feminine' categories, even if they are inanimate objects. Others like German, Russian and Latin add 'neuter' as a third gender. And others like Swedish and Dutch have genders that are not based around masculine and feminine. Some have 'animate' or 'inanimate' genders, and Bantu languages like Swahili have up to 18 gendersnote .

Although nouns that are clearly related to one gender tend to take the same gender in grammar, often it has more to do with the noun's ending. The German word for "girl" (Mädchen) is neuter because all German words that end with -chen are neuter. If used, gender tends to affect the articles the noun takes, what adjectives it takes, and what pronouns replace it. Occasionally it can affect verbs as well.

Case

Some languages have inflections for nouns based on their role in a sentence, called cases. A simple example is Esperanto, which states that if a noun is the direct object of a sentence, the suffix '-n' is added. This is called the accusative case, and the bare, uninflected form is called the nominative. Another example is how English inflects nouns to show possession with '-'s.' This possessive form is formally called the genitive case.

These basic cases also have opposites: in Basque, for instance, the subject and direct object are marked as the absolutive case, and the agent of a transitive verb is marked separately as the ergative. In Persian, the final noun in a phrase takes a suffix -(y)e to be marked in the construct state before a possessor's nounnote .

Other languages have a more complex system of cases. For example, German distinguishes between accusative for direct objects and dative for indirect. Latin also distinguishes the ablative to show source or state, a deprecated locative to show location, and a vocative to show address, in five separate patterns. Russian uses the prepositional case for the object of a preposition and the instrumental for means and passive agent. Dravidian languages like Malayalam and Tamil add the comitativenote  to show accompaniment or association.

Pronouns

Pronouns are words that replace or clarify the noun.

Personal Pronouns

Most languages have separate singular and plural pronouns, and also separate ones for each grammatical 'person':

  • 1st person: I or we
  • 2nd person: (Thou or) You
  • 3rd person: He, she, it, or They

Pronouns are far more inflectional than their respective nouns. Sometimes, pronouns change for gender where nouns are not marked for it. As in English, a language that lacks grammatical gender, this is often limited to the third person; Arabic, a language with grammatical gender, also distinguishes it in the second. Pronouns usually change for case (I for a subject, me for an object), even in languages that do not have noun cases. For example, Spanish, which lacks cases, distinguishes up to six forms of each pronoun: the nominative (e.g. yo, él), prepositional (mí, él/sí), accusative (me, lo), dative (me, le), genitive (mío/a, suyo/a), and comitative (conmigo, consigo).

Some may distinguish between relatives and strangers in the third person, which is called obviation. One of the more common distinctions is whether 'we' includes the addressee or not, like in Malay's kami (they and I, exclusive) vs. kita (you and I, inclusive). However, formality is also a rather common part of such systems; think Spanish's vs. usted, where the former is used for familiarity or subordinates, and the latter used for politeness or superiors—and some dialects even include vos, which is even more informal than .

Demonstrative, Relative, Interrogative Pronouns

Demonstratives are words that display where a noun is. In English, "this" and "these" refer to objects near the speaker (proximal), whereas "that" and "those" refer to objects farther away or less relevant (distal). Spanish has a three-way distinction between proximal "este/a," medial "ese/a," and distal "aquel/la." Demonstratives can also be used as determiners: while "this" could stand on its own, it could also modify another word: say, "this dog," or "this article." Hindustani and Turkish use demonstrative pronouns as third-person pronouns.

Relative pronouns and interrogative pronouns generally have identical forms in English: for instance, you can say "When did this happen?" as an interrogative, but the relative pronoun can introduce such a word. "I was there when it happened." Hindustani, however, separates these; where "kahaan" means "where?," "jahaan" means "where," which applies down most of the words.

Articles

Articles are words that identify a noun. Words meaning "the" are definite articles, and words for "a"/"an" are indefinite articles—some languages like French have partitive articles for a part of something, and German has negative articles for none of something. Some languages, but not English, change the article depending on the noun being singular/plural, its gender, and (more rarely) its case—think how Spanish has "el, la, los, las," which all mean "the." Most often, these are derived from demonstratives or numbers regardless; English "the" is comparable to "that," and German "ein" (a/n) is comparable to "eins" (one).

Although they are very common words in the languages that have themnote , other languages don't use articles. Russian, Chinese, and Japanese are a few examples. And when languages without articles learn ones with them, the distinction may seem arbitrary—what difference really is there between "the thing" and "this/that thing?" Conversely, French requires articles to be used for all nouns unless they are the name of something and has another set of articles for uncountable nouns (i.e. you can't just say "add sugar" in French, you have to say "add some sugar"). The Scandinavian languages have replaced the separate definite articles with suffixes that are added to the nouns.

Verbs

Verbs are the soul of grammar. Not only are they the action words of a sentence, but they tend to show the time (tense), manner (aspect), attitude (mood), and evidence (evidentiality) where it happened. Depending on the language, a verb may be inflected to agree with the subject or with both the subject and the direct object, or it may not be inflected to agree with either.

Tense

Tense marks the time an action took place in. Generally, English people are taught that the language has three tenses: past ("ate"), present ("eat"), and future ("will eat"), although technically it only has two marked tenses—past and non-past. Some languages distinguish between the recent and remote past, such as in Italian—the passato prossimo, found in hai amato (you loved recently), and the passato remoto, found in amasti (you loved).

Relative tenses also can occur if the action is told from a certain point of view. The past perfect, or pluperfect, is when a past event is recounted from the point of view from another past event—think English "I ate" vs. "I had eaten"—past-in-the-past. Future-in-the-past is the opposite: a future event from the point of view of a past event. Future perfect is past in the future, and future-in-the-future is the same.

Not every language has tense; in analytic languages like Mandarin, the verb is only displayed as its own word with no tense information added on. This also applies to Malay, which despite having various derivational affixes only marks time of an action by context or by a temporal adverb like "still" or "already."

Aspect

A realistic conlang is likely to have some differences with the tenses compared to English. English has two manners of expressing the present, distinguished by aspect; where "I stop" signifies completion, "I am stopping" signifies incompletion. The former is perfective and the latter is imperfective. English subdivides this aspect further in the past, where you could say "I was stopping" (progressive or continuous) or "I used to stop" (habitual). Few languages use progressive tenses in the present as often as English, and many have none at all. German, for instance, completely lacks aspect, while Hindustani distinguishes perfective, continuous, and habitual in the present and past. Some Mayan languages even opt to simply using imperfective and perfective aspect for telling storiesnote .

There is also another aspect, the perfect, and suggests an action's relevancy to a situation. It has been called the retrospective as well to contrast with the perfective, considering that they are often completely different things. In Germanic and Romance languages it is formed by adding "have" or the equivalent word to the verb; in French and Italian it has essentially become the main way to discuss the past. This is also inflected for past and future tenses, becoming the past perfect (or pluperfect) and future perfect.

Generally, the further in the past your tense is, the more aspects it will distinguish, at least grammatically. Standard American and British English does not distinguish the present habitual from the past habitual—you can't say "I use to run" or "I will use to run"note —but it does distinguish future perfective and progressive.

Mood

However, verbs do not usually neatly fall into a present-past-future paradigm. What about unreal events? This is where grammatical mood comes in. Most of what has been mentioned so far has been indicative or declarative, meaning that they are real events that happen at some time. However, there are other irrealis events, that can be divided into three rough categories.

Epistemic moods revolve around what could or might be—hypothetical or uncertain actions. Moods like this include the subjunctive, which sees high usage in Indo-European languages, and the potential, found in Uralic languages and Japanese. In English this often is used in "that" clauses (e.g. "He wants that I stop") or set phrases (e.g. "If I were you"), but sees more use in Romance languages, Germanic languages, Iranic languages, and more. It often is used to replace an infinitivenote  if it does not agree with the original subject, but in languages such as Arabic and Modern Greek it replaces the infinitive altogether.

Deontic moods revolve around what should be, and include imperatives, jussives, and optatives. Imperatives in English use the bare form of the verb; "Stop!" Jussives essentially function as first- and third-person imperatives, in a sense of "he should stop" or "I should stop," but languages like Arabic use them as generic should—or a past subjunctive. Optatives reflect wishes or hopes (e.g. "May you stop"), but can overlap somewhat with subjunctives.

The conditional revolves around what would be if a certain condition were met. In Romance and Germanic languages, this is often expressed using future verbs inflected for past tense; "would" is actually the past tense of "will" in English. However, languages like Hindustani prefer to express it as the past tense of a subjunctive verb; in that sense, it functions more as a counterfactual mood.

In many languages, the irrealis moods often are conjugated in one or two categories, most often the imperative and, if there is a specific irrealis, subjunctive or conditional. The jussive and optative often may manifest as usages of the subjunctive and/or imperative, for instance, if they aren't there to begin with. In Spanish "Viva la revolución," the subjunctive form of "vivir" is used in an optative sense: it literally means "May the revolution live," or more idiomatically "Long live the revolution." In addition, irrealis moods often take fewer tenses than the realis moods. Portuguese, for instance, distinguishes five tenses in the present and three in the subjunctive.

Evidentiality

Although rarer than the other three forms, some languages mark evidentiality—that is, what evidence one has for a thing that happened, happens, or will happen. This can be seen in the direct-inferential past dichotomy in Turkish, but can often be seen in a trichotomy in Quechua between direct evidence, inference, and mere heresay. Romance languages also have this in a sense, using the conditional mood instead of the indicative present to mark inference, while Hindi has a distinctive presumptive mood to say something along the lines of "this might be."

Non-finite Forms

Verbs can often become nouns by using infinitives, gerunds, and supines ("to know me is to love me," the changing of the guard) and adjectives using participles (a stopped clock, a speeding driver).

Adjectives, Adverbs, and Adpositions

Adjectives describe a noun. These words are a simple matter in English. Many languages have adjectives that inflect to 'agree' with the noun. In most European languages, if the noun is plural, the adjective also has to take a plural form. In languages that have gender marking, adjectives usually inflect for the noun's gender too. Some even inflect for the noun's case. However, in some languages like Persian they are more similar to a genitive—for example, "red apple" in Persian would be similar to "apple of red." In other languages, adjectives can be filled by the function of verbs. This is the case in many Native American languages, such as Lakota, which does not have a word for "blue" as an adjective—rather, it has a verb for "to be blue," which is inflected in one order to mark a state of being. Sometimes a language may have both—think of the participles mentioned earlier.

Adverbs describe anything but a noun. The rules with these are usually quite simple. The most important thing to note is that most languages have a way to convert adjectives to adverbs. English adds -ly, while French adds -ment, though there are other adverbs that are not formed this way. German, on the other hand, often allows adjectives to also be used as an adverb.

Adpositions are words that describe relation between nouns. These tend to be used more often with languages that do not have extensive case systems. For instance, while Latin had five cases for each noun, prepositions such as "cum," "ad," "ab," and "in" filled semantic roles that could not be filled by the case system. In fact, prepositions could mean different things for different cases: while "ad astra" (accusative) means "to the stars," "ad astrīs" (ablative) means "in the stars." While Hindustani only has two true cases, nominative and oblique, its postpositions function essentially as case markers. "Ne" marks the ergative case, "ko" marks the accusative and dative, "kaa" marks the genitive case, "ken" and "par" mark two different locatives, and so on.

It is worth noting that while English adjectives, adverbs, and adpositions are prepositive (before the noun), some languages prefer postpositions. Adjectives in the Romance languages are postpositive with prepositive adverbs, and adpositions in Turkish are postpositive.

Word Order

Given three grammatical categories (Subject, Verb, and Object), there are six main word orders:

A language can also have no dominant word order. Latin, especially Latin poetry, allows this to happen; the noun's case endings make it clear which is whichnote .

There is an additional class called topic-prominent or topic-comment, though the subject and topic need not be the same thing. Wikipedia has examples of possible word orders in such languages.

Alignment

It is useful to list several relevant terms for this section:
  • Agent
  • Experiencer
  • Object
  • Patient
  • Subject

There are some different possible ways for you to align your syntax.

  • Nominative-accusative. A and E are marked the same (nominative), P is marked differently (accusative). Think English.
  • Ergative-absolutive. P and E are marked the same (absolutive), A is marked differently (ergative). Think Basque.
  • Transitive-intransitive. A and P are marked the same (transitive), E is marked differently (intransitive). Very rare—think Rushani.
  • Tripartite. A (agent), E (experiencer), and P (patient) are all marked differently. Think Na'vi.
  • Split-ergative. Appears as ergative-absolutive sometimes, nominative-accusative in others, typically dependent upon the tense of the verb. Think Hindi.
  • Austronesian (a.k.a. Philippine or direct-inverse). Nouns take either a "direct" marking if they're the subject or an "indirect" marking if they aren't, and the verb tells you what role the noun plays. Think Tagalog.
  • Active-stative. Nouns that are the subject of an intransitive verb can either be in the A or E role, depending on certain conditions. There are two subtypes:
    • Split-S, where the role is a quirk of the particular verb in question. Think Guaraní.
    • Fluid-S, where you can use either, but there is a difference in connotation depending on whether the noun is marked for the A or E role. Think Crow.
    • Most often in these languages, the E role—more patient-like—is often unmarked whereas the A role—agent-like—is marked. These are similar to ergative alignment.

Animacy Hierarchy

Some languages, such as English, don't really have these, but there are those that do, such as Navajo. An animacy hierarchy is a set of rules governing what subjects can act on what objects or what actors can have certain roles in a sentence. Wikipedia gives the typical hierarchy as:

  • 1
  • 2
  • 3
  • names
  • people
  • nonhuman animates
  • inanimates

where "1", "2", and "3" stand for those respective grammatical persons. However, there are exceptions: Indigenous American languages, like the aforementioned Navajo, often tend to place second-person pronouns higher in animacy than first-person pronouns. Split-ergative languages may use animacy hierarchy as part of this, however.

In essence, objects can appear in the agent role only if the patient role, if any, is either in the same tier or lower of the hierarchy—e.g., a general word for a person can appear as the agent if an inanimate object is a patient, but not the other way around. Getting around this can be done in one of several ways—you can have suppletive words that appear higher on the hierarchy than their referents "should"; you can invent morphological processes to keep the roles or order of the words the same, but change the meaning; or you can forbid it entirely and have such concepts handled with semantics or circumlocutions.

Typology

The morphological typology of a language determines how many words are inflected. It's best to imagine it as a triangle; in one corner, there are isolating languages (Chinese dialects, Hawaiian, most Southeast Asian languages, etc.), which have very few inflections, instead opting for word order and determiners.

In another corner, there are agglutinating languages (Japanese, Nahuatl, Turkish, etc.), which have a one to one ratio of morphemes and their meaning (e.g. one affix for case and one for number)

And in the third corner, there are fusional languages (Romance languages, Germanic languages, Semitic languages, etc.) which use one morpheme to refer to multiple meanings (e.g. one affix for both case and number).

No natural language is 100% isolating, agglutinating or fusional; analytic Mandarin Chinese has an agglutinating plural pronoun ( "I" but wǒmen "we") and agglutinative Turkish pronouns show some fusion (ben "I" and biz "we" but o "he/she/it" and onlar "they"). English used to be a fusional language but has largely become analytic aside from a few inflections and cases.

Head-directionality

This is a fancy way of saying "does the main part of the concept come at the beginning or the end of the phrase?". In head-initial languages, the important word tends to come at the beginning of the phrase (for instance, nouns precede adjectives, verbs precede adverbs). In head-final languages, the opposite is true (nouns tend to follow adjectives, verbs tend to follow adverbs). According to Wikipedia, no language is strictly head-initial or head-final in every category—it is a pattern, not an exceptionless rule.

Language change (or, putting languages through a wood chipper for fun and profit)

As most people are aware, the Romance languages (the big five being French, Spanish, Italian, Portuguese, and Romanian) are those languages that descended from Latin. They are distinct from Latin. They are also distinct from each other. How did they get like that?

If you want to develop your conlang into an entire language family, see SoYouWantTo.Create A Language Family.

  • Sound change: As Latin developed into Romance, the way people pronounced words began to change in (usually-)systematic ways. For example, Latin had long /l/ and /n/ sounds. On the way to Spanish, these sounds palatalized and ended up as /ʎ/ and /ɲ/, respectively–compare Latin annum 'year' > Spanish año, but anus 'ring' > Spanish ano 'anus'. (Don't look at me like that. It was the first example that came to mind and it illustrates the principle nicely.) Latin annum also yielded French an, which didn't palatalize, but turned into a nasalized vowel through a different process.
  • Grammatical change: The grammar changed too, as features were developed and lost. Latin had a distinction between vel 'and/or' and aut 'either/or' which didn't survive Vulgar Latin (e.g., French ou 'or' < Latin aut). Romance languages also mostly did away with the infamous case system of Latin (though it lingers on in a reduced form in Romanian). In terms of gains, some Romance languages developed new verbal inflections–-the French future tense developed out of a construction of the form [infinitive] + habere, as can be seen in, for example, mangerai '(I) will eat'. Spanish usted is a shortening of a respectful form of addressing someone else that didn't exist in Latin. Romance languages also gained stricter word order. In Latin, the case endings carried the information about a word's role in the phrase, so you could shuffle them around a lot, as readers who have the misfortune of being familiar with Cicero know all too well. As sound change destroyed the case system, speakers of Romance had to figure out some other way of helping figure out how the words fit together in the sentence, and they did this by making the order of the words more important. (Something similar actually happened on the way from Old English to Modern English because Middle English said "Word-final case markings? lolno".)
  • Lexical change: Words changed meaning, fell out of use entirely, or were borrowed. One prominent example is the word for "horse"–originally, this was equus in Latin. This fell out of use in Vulgar Latin, from which Romance developed; it borrowed a Celtic word for horse, which ended up as caballus (whence, say, French cheval). Famously, Portuguese saudade (look up the definition, it's nuanced) came from Latin solitatem, which just meant 'solitude'. The word "admiral" doesn't come from Latin, although it looks like it does; it's from Arabic amiir al- 'emir of the'note .

Orthography (Writing)

If you do decide to invent a separate writing system for your conlang, it may be worth leaving it to last, or at least not letting it dominate your conlang. For most of human history, language was almost always something that people spoke, with only a small elite being able to read and write. In fact, in many societies, writing was done in a separate and older language, such as Latin.

In the age of mass literacy, spelling can sometimes influence pronunciation (for example, the English word "hotel" used to have a silent "h") but it is far more common for the writing language to remain stuck in the past as the spoken language changes. A rule of thumb is that the written language will usually be more resistant to change than the spoken language. English is a spectacular example of this.

These are the types of writing system:

  • Logography: Each word has its own symbol. Examples of this include Chinese and Egyptian hieroglyphics. Learning to read and write Chinese means learning 2-3,000 symbols, and educated speakers often learn 8,000. The earliest writings were logographies. Writing was invented independently in at least three human civilizations and in all three cases, it started out as picture symbols. Over time, many symbols (and symbol meanings) changed so that for many of them, it was no longer obvious what they represented. In addition, new symbols were created by merging other symbols, sometimes more based on their sound rather than their meaning.
  • Syllabary: Each possible syllable has its own symbol. The Linear B system from Ancient Crete is a pure example. Japanese is a hybrid system, combining a logography (kanji) with a syllabary (kana). Japanese is suited for a syllabary because it is mainly limited to CV syllables (one consonant, one vowel). For languages which often have more complicated syllables like CVC and CCVC, a syllabary quickly becomes impractical. A syllabary is likely to originate from a logography. Other forms of writing system are likely to come from a syllabary.
  • Abjad: A pure abjad is an alphabet that only has consonants. The vowels are not shown; they may have to be guessed based on the context. A more common form is an impure abjad, in which vowels may sometimes but not always be displayed as either markings on the consonants, changes to the consonants or separate letters. The Arabic and Hebrew alphabets are impure abjads. An abjad works best when the language does not have many vowels.
  • Abugida: An abugida is an alphabet in which the letters are consonants, and the vowels are shown as markings on the consonants or changes to them. The reverse, with base vowel signs modified by consonant diacritics, only appears in the Japanese version of braille. Devanagari and most other writing systems that originated in India are abugidas, as is the Ge'ez (Ethiopian) alphabet.
  • True alphabet: Some linguists limit the word 'alphabet' to mean not just any writing system, but one which has consonants and vowels as letters. Almost all of these alphabets used in the world today are descendants of the Greek alphabet.
Hangul (Korean) is an interesting case: each consonant and vowel has its own symbol, like in a true alphabet, and those symbols are arranged into a symbol for the syllable, making it a hybrid between an alphabet and a syllabary.

Other things to think about:

  • Your writing system should be practical. A realistic writing system has got to be legible no matter how many different style write it down, although if the symbols represent more than one sound they will be more complicated. Similarly, the writing system has to be written in strokes, so it is not practical to have bits that have to be colored or shaded in.
  • What type of surface was the writing written on? If the writing was mainly carved into wood or stone, it is likely to be angular. On the other hand, if it was designed to be written on soft material like papyrus or parchment, it is likely to be more curly, so that scribes can write faster and avoid tearing the material.
    • This is precisely why the Latin and Greek alphabets split into upper- and lower-case in the Middle Ages. The original alphabets were upper-case only, but a separate alphabet was created for writing on soft material, which then became the lower-case form.
  • Are writing materials expensive, or were they historically expensive? The cost of writing materials in medieval Europe led to extensive use of abbreviations, including replacing certain letters in certain positions with diacritical marks applied to the preceding letters. The introduction of inexpensive writing materials has not completely undone such changes.
  • A realistic writing system will have imperfections, just like the other parts of the language. Sometimes, a letter will represent more than one sound and sometimes more than one letter can be used to represent the same sound.

Poetry

A conlang has to be very well developed before you can have a realistic idea about how poetry — including song lyrics — would work in the language. You need to have an idea on how its prosody works. It may also help to have some idea on the fictional history of the language. However, an exception can be made if you are not creating much apart from the poem and are not planning to develop your conlang in detail.

Poetry is designed to have a rhythm, with lines having matching lengths or at least following some sort of pattern. Exactly what makes a workable rhythm varies by language. In French, every syllable has equal length and stress, and so it is important in French poetry to have matching numbers of syllables. This is similar in Latin poetry, except that some syllables are counted as being longer than others. In English, syllables get stretched and squashed because it's stress rather than the precise number of syllables that determines the rhythm of the line.

Most poetry does not rhyme. In fact, most spoken English poetry does not rhyme either, but people will be more familiar with the forms that do. Latin poetry does not usually rhyme, because so many words have the same endings and therefore a rhyme is not impressive. In English poetry, it is harder to find rhymes, which makes them sound in more impressive. And that highlights an important part of how poetry works: it works if the poet can impress the listener (or reader) by coming up with lines within the constraints of its pattern. The pattern may be a popular poetic pattern, such as sonnets and limericks, or a poem may create its own patterns and constraints.

There are also other ways languages use the sounds of words to form poetry. Old English poetry was fond of alliteration. In Hebrew poetry, there is the concept of parallelism, which is a sort of "rhyming by concepts". There are also more subtle differences. In English poetry, rhyming two homophones (different words with the same sound) is considered a weak rhyme, but in French poetry it is considered a strong rhyme.

Another common theme of poetry is that it tends to use old-fashioned features of the language. In English, this includes words like "ere", "afar" and "whence". This is partly to give the poet more flexibility, but also to sound literary. Many poetic traditions are also influenced by whatever language is considered classical or prestigious at the time. In the Middle Ages, many European poets looked to Latin; the Romans themselves looked to Greek.

Extra Credit

The Greats

  • J.R.R. Tolkien. Why does Middle-Earth feel so alive? In large part because of its languages. Tolkien's so-called "secret vice" led to The Lord of the Rings with its many languages—he even stated that it was made in one sense as an excuse to place in all his languages.
  • David J. Peterson, full stop. The former president of the Language Creation Society, he created Dothraki for Game of Thrones as well as the languages used in Thor: The Dark World, Defiance, Star-Crossed, and The 100.
  • Paul Frommer's Na'vi language, created for Avatar.
  • Bilbaridion's conlangs. Although not much information is available beyond what is shown in his showcase videos on YouTube, even those alone show how much effort and ideas he has put into making them.
  • Marc Okrand's Klingon, created for the Star Trek cinematic universe. While not really a naturalistic language, it does succeed in sounding alien and its cultural influence is considerable. Looking at how utterly different it is can be useful and/or at least instructive and inspirational.

The Epic Fails

Resources/Further Reading

  • David J. Peterson's The Art of Language Invention deals with conlanging and features many relevant examples from both natural and constructed languages.
  • The Concepticon is a good way to build a starting vocabulary for a conlang.
  • The New Conlang Bulletin Board, a community for language creators.
  • For those who don't want to put in the work of actually creating all the new words, Vulgar is a highly customizable and powerful generator that will create a conlang when given a sound inventory. The free version is enough to get a sense of how it works.
  • Marc Rosenfelder's A Conlanger's Lexipedia provides ideas on coining words with etymologies attested in natural languages.
  • Etymonline, a database about the origins of words in English.
  • "Ergativity", by David J. Peterson. Notes on how to derive (split-)ergative languages from languages that were originally nominative-accusative.
  • FrathWiki, a conlanging wiki.
  • The aforementioned Index Diachronica is a resource for those interested in naturalistic sound changes.
  • Marc Rosenfelder's Language Construction Kit is a good starting place for those who want to build their own languages. Rosenfelder literally wrote the book on conlanging—the Kit is available in two volumes.
  • Omniglot, a source for looking at the various writing systems of the world (and invented ones).
  • Redditor /u/yaesen's "On Generating Ideograms" can help with creating ideograms.
  • Subreddit r/Conlangs and its corresponding Discord server provide support and advice to aspiring conlangers.
  • A Survey of some Vowel Systems is a good resource for making your vowels naturalistic.
  • Wikipedia has a large amount of information on linguistics topics.
  • The World Atlas of Language Structures (WALS) is a database of the features of natural languages. One can find data sets and examples there and can even compare multiple features across languages, which can be useful in getting ideas on how to proceed if a language already has certain features.
  • Marc Rosenfelder's page on yingzi is a good place to start for those of you who want to create logograms.
  • The Zompist Bulletin Board, run by Marc Rosenfelder, is a community of conlangers (as well as a forum for his own projects, such as the world of Almea).

Top