This project is a procedural generator for fictional human languages.
It is a personal project developed as I discover interesting things about linguistics.
A Gobbledygook language is represented by a JSON object containing the rules, words, and specificities of that language
A system of rules and forms describes the way a word changes based on its context
A Gobbledygook sentence is represented by a JSON object that contains a semantic definition of that sentence as well as all relevant context information
Gobbledygook contains 2 main tools :
generate-language/: creates a fictional languages with its rules and particularities, trying to make it sound interesting and organicuse-language/: takes as input a language and a sentence definition to output the sentence in that language
A language that has been generated this way can be saved as a JSON and reused.
Gobbledygook assumes a very simple conception of language
- phoneme inventory, with frequencies
- IPA representation and the language's transliteration code into latin alphabet (currently based on my personal preference rather than a recognized standard)
Languages differ in how much words change based on context. Gobbledygook uses 3 simplified morphology types:
| Type | Description | Example language | How "I love" changes |
|---|---|---|---|
| Flectional | Words change a lot, one suffix = multiple meanings | Latin | amo, amas, amat, amamus... |
| Semi-flectional | Some changes + some helper words | French | j'aime, tu aimes, il aime... |
| Analytical | Words barely change, meaning from word order + helpers | Mandarin | 我爱 (wǒ ài) - no verb change |
Morphemes are generated based on the above phoneme inventory by alternating vowels and consonants.
A declension system describes how words change (or not) based on their context. The relevant parameters that inflect a word are listed as rules, and the word forms are described in a nested forms object.
The rules and forms system:
- Rules list which parameters affect the word form, in order of nesting
- Forms is a nested object following that order, ending with a template string
Example for English noun declension:
{
rules: ["declensionType", "grammaticalCase", "number"],
forms: {
noun: {
nominative: { singular: "{morpheme}", plural: "{morpheme}s" },
genitive: { singular: "{morpheme}'s", plural: "{morpheme}s'" }
},
adjective: {
nominative: { singular: "{morpheme}", plural: "{morpheme}" }
}
}
}For "dog" (morpheme: "dog") in plural nominative:
→ navigate forms.noun.nominative.plural → "{morpheme}s" → "dogs"
More examples:
English inflects its nouns and adjectives by rules: ["declensionType", "grammaticalCase", "number"]
For yellow in the sentence the yellow taxis drive over the bridge, yellow follows the rules
declensionType = adjective,grammaticalCase = nominativeandnumber = plural, giving the formyellow
French inflects its nouns and adjectives by rules: ["declensionType", "declensionGroup", "grammaticalCase", "gender", "number"]
For cheval in the sentence les enfants adorent les petits chevaux, cheval follows the rules
declensionType = noun,declensionGroup = alEnding,grammaticalCase = accusative,gender = masculineandnumber = plural, giving the formchevaux
Cases indicate the role a noun plays in a sentence. English mostly uses word order, but other languages change the word itself.
| Case | Role | English example | Latin example |
|---|---|---|---|
| Nominative | Subject (who does the action) | "The dog bites the man" | "Canis mordet virum" |
| Accusative | Direct object (receives the action) | "The dog bites the man" | "Canis mordet virum" |
| Dative | Indirect object (recipient) | "I give a bone to the dog" | "Do os cani" |
| Genitive | Possession | "The dog's bone" / "bone of the dog" | "os canis" |
In Latin, the word "canis" (dog) changes form: canis → canem → cani → canis depending on its role.
Beyond grammatical roles, cases can express spatial and logical relationships. These are common in Finnish, Hungarian, and many other languages.
| Case | Meaning | English equivalent | Example usage |
|---|---|---|---|
| Locative | Location (static) | "at", "in" | "I am at the house" |
| Lative | Movement towards | "to", "towards" | "I go to the house" |
| Inessive | Inside something | "inside", "within" | "The cat is in the box" |
| Ablative | Movement away from | "from", "out of" | "I come from the house" |
| Instrumental | Tool or means | "with", "using" | "I write with a pen" |
| Benefactive | Benefit recipient | "for" | "I made this for you" |
In Finnish, "talo" (house) becomes: talossa (in the house), taloon (into the house), talosta (from the house).
Gobbledygook treats prepositions and case suffixes as equivalent mechanisms - both express the same semantic relationships.
The conjugation system follows the same rules and forms principles as declension.
Determination specifies which specific thing we're talking about:
| Type | Meaning | English example | French example |
|---|---|---|---|
| Definite | A specific known thing | "the dog" | "le chien" |
| Indefinite | Any instance of a thing | "a dog" | "un chien" |
| Demonstrative | Pointing to something | "this/that dog" | "ce chien" |
| Possessive | Belonging to someone | "my/your/his dog" | "mon/ton/son chien" |
| Proper noun | A name (no determiner) | "Fido barks" | "Médor aboie" |
The determiner and pronoun systems also follow the same rules and forms principles.
- A word order (SOV, SVO, VSO, VOS)
- A phrase structure for: noun, verb, adjective
- A sentence structure by type: declarative, interrogative (polar, open), negative
- a dictionary of root morphemes with their grammatical properties (gender, declension or conjugation group, object category such as inanimate, animal, etc.)
- abstract words for conjunctions adverbs and others (to be improved)
Gobbledygook cannot be exhaustive and represent all the possible languages. Some important simplications and shortcuts have been taken to make these tools.
They are represented as being a single vowel or consonant. They are created as such during language generation
I simplified the analytical - flectional continuum into 3 distinct categories to simplify the generation process
Grammatical and semantic cases are represented by a single system that folds syntactic roles (nominative, accusative) with semantic relationships (inessive, benefactive)
Prepositions with semantic meaning are also meant to be folded in this system, as prefixes, stored in a dedicated object.
The system considers prepositions and suffix declensions are functionally the same, except with a space for prepositions.
example:
- classic declension
- latin nominative
{morpheme}us - english genitive
{morpheme}'s
- latin nominative
- preposition form as a declension form
- english allative
{preposition} {morpheme}(/tu london/) - french instrumental
{preposition} {morpheme}(/avek panaʃ/)
- english allative
This simplifies the representation and is sufficient to generate fictional languages.
One sentence definition will only output one output for one language.
Gobbledygook does not support style, synonymy or subtext
For example, absence of determiner in a language is represented with ""
A declension that hasn't a suffix will be represented simply "{morpheme}"
I have used french, english and danish because I am familiar with these.
I also recognize that all 3 are closely related languages with close common ancestors and intense mutual influences, and are not super representative of the breadth of real life languages.
They are not fully implemented and many of their singularities are not represented.
These languages are meant to be used as tests for the useLanguage tool, and not exactly as an objective of the generateLanguage tool
All these limitations might become subject to future evolutions
Idiomatic expressions and ways of expressing ideas varying between language can't be represented with Gobbledygook as it exists now.
If a language says a sentence like "The sky rains water drops" (transitive formulation) to say it rains, and another one says "it rains" (intransitive), Gobbledygook wouldn't be able to do that
If a language says "Il pleut comme vache qui pisse" (image), Gobbledygook can't represent this either.
- tone systems
- agglutinative and polysynthetic languages
- ergative-absolutive alignmed languages
- tense aspects
- negation
- infixes, ablaut, reduplication
- intonation and stress
Gobbledygook uses amazon's Polly to voice its languages. This model uses preset voices based on existing languages and accents. These voices are not able to produce all the sounds that Gobbledygook generates.
Two new tools could be added to make this project extremely cool :
A tool that takes a language as input and a number of generations, and outputs a new language with realistic mutations.
A tool that takes in a human sentence (english probably) and encodes it into a Gobbledygook sentence definition.
LLMs might prove a good option, but there is probably a good traditional option.
interfaces.ts: types for all manipulated conceptsgenerate-language/index.ts: generation tool entry pointuse-language/make-sentence.ts: sentence production tool entry pointlanguages/french/index.ts: example of a real life language represented as a Gobbledygook languagesample-sentences/sample-sentences.ts: examples of sentence definitions in the Gobbbledygook format
- Morphème : word root, base lexical unit
- FormTable : structure
{ rules, forms }describing inflection of a word - rules : ordered list containing the inflection parameters (ex:
["gender", "number"]) - forms : nested object, according to the rule orders, ending with leaf nodes with a template string for the word flection.
- SentenceDefinition : Semantic representation of a sentence to transcribe
- Context : parameter passed to the sentence builder, that contains the language and the consolidated sentence element and utterance context (speaker, addressee role, etc.)
The Language Construction Kit
http://www.zompist.com/kit.html
IPA charts with sounds
http://www.internationalphoneticalphabet.org/ipa-sounds/ipa-chart-with-sounds/
Wikipedia article for vowel
https://en.wikipedia.org/wiki/Vowel
Wikipedia article for consonant
https://en.wikipedia.org/wiki/Consonant
StackExchange question on languages with few phonemes
http://linguistics.stackexchange.com/questions/4561/languages-with-the-fewest-phonemes
Quora question on languages with most phonemes
https://www.quora.com/Phonetics-Which-language-has-the-most-amount-of-sounds
http://ipa-reader.xyz/ https://cuttlesoft.com/blog/pronouncing-things-with-amazons-polly/
Wikipedia article for morpheme
https://en.wikipedia.org/wiki/Morpheme
https://en.wikipedia.org/wiki/Morphology_(linguistics)
https://en.wikipedia.org/wiki/Subject%E2%80%93verb%E2%80%93object
https://en.wikipedia.org/wiki/Agglutinative_language
https://en.wikipedia.org/wiki/Fusional_language
https://en.wikipedia.org/wiki/Isolating_language
https://en.wikipedia.org/wiki/Grammatical_category
Wikipedia article for consonant mutation
https://en.wikipedia.org/wiki/Consonant_mutation
Wikipedia article for sound change
https://en.wikipedia.org/wiki/Sound_change