WordSynth
NAME
wordsynth.pl — Randomly generate words for constructed languages
SYNOPSIS
wordsynth.pl [-N number ] file
DESCRIPTION
WordSynth is a perl script that randomly
generates words (actually syllable clusters) based on a language rules
file. It was inspired by GDW's Traveller language generation system,
but includes many enhancements. While it still takes some work to
create a good language rules file, WordSynth makes it easy to perform
tweaks and tests until usable results are reliably got. Download the
complete wordsynth distribution package from here. It is about 5k, tarred and
gzipped.
- The user can specify the language tables more easily, in a simpler,
cleaner, and clearer format. Translation from Traveller-style language
tables is straightforward and simple.
- Flexibility and power is provided by allowing the user to specify
additional forms of syllable structure and placement. Instead of having
a basic and alternate table, the user may provide structure tables for
syllables occuring at the beginning, middle, or end of the word.
- Consonants are classified by its position within a syllable and by
the position of the syllable in the word, resulting in four possible
categories. Vowels are also classified in four categories, by position or
stress, though the user may use them for other distinctions. This gives
the user some ability to specify a language while lowering the probability
of illegal combinations.
- User-specified filters, using Perl regular expressions, allow further
checking of illegal letter combinations.
Rather than force the user into a table structure designed for
rolling d6s by hand, WordSynth takes advantage of weighted values of each
possible result. The user merely specifies the letter and its weight.
If translating from an original Traveller table, one may simply count the
number of times an element occurs and use that as its weight. WordSynth
calculates probability on the fly from total weights given for each
element in the table versus the weight of each individual element.
For a short and informative introduction to the structure of
syllables, see the Wikipedia Syllable article.
Typical Usage
wordsynth -N 200 mylanguage.txt | sort | uniq > mywords.txt
Options and Command-Line Arguments
-N number |
Generate number words. Default is 1 word. |
file |
A language file must be specified. Wordsynth will read this file from the
current directory. See below for the format of the language file. |
STRUCTURE OF THE LANGUAGE FILE
WordSynth reads the language file, processing each TYPE..END element
in turn. Then it runs through the process of generating each
word.
Tables follow the format:
TYPE <name of table>
<list of elements>
END
TYPE is one of the following names:
- syllableStressPatterns
- initialSyllables
- internalSyllables
- terminalSyllables
- assimilationFilters
- wordInitialConsonants
- internalInitialConsonants
- wordInitialVowels
- internalStressedVowels
- internalUnstressedVowels
- wordFinalVowels
- internalFinalConsonants
- wordFinalConsonants
The list of elements follows the format:
<element1>,<weight1>
<element2>,<weight2>
...
<elementN>,<weightN>
<element> and <weight> are separated only by a comma
(,) character with no whitespace. The <element> is a string of
letters, interpreted according to the type of table. <weights>
are always integers.
The assimilationFilters table differs from the others in that it
is a list of pairs, where the first element is replaced by the
second.
Example Table Specifications:
TYPE initialSyllables
A,1
Cv,4
Cvk,2
END
TYPE assimilationFilters
aa,a
ii,i
uu,u
END
TYPE wordInitialConsonants
b,2
br,2
k,1
kl,1
m,1
z,4
zh,2
END
Explanation of Each Table:
- syllableStressPatterns:
- This table gives weights for the patterns of high (H) or low (L)
stress of the syllables in a word, and also the possible number of
syllables occuring in a word. The stress patterns is simply a string
of H's and L's, one for each syllable, followed by a comma, followed by
the weight.
- initialSyllables:
- This table describes the letter structure of syllables that begin
a word. The letter structure will typically be any of A, Ak, Cv, CV,
CVk, or Cvk.
- internalSyllables:
- This table describes the letter structure of syllables that neither
begin nor end a word. The letter structure will typically be any of V,
v, Vk, vk, cV, cv, cVk, or cvk.
- terminalSyllables:
- This table describes the letter structure of syllables that terminate
a word. The letter structure will typically be any of a, VK, vK, ca,
cVK, or cvK.
- assimilationFilters:
- The assimilation table is intended to be a list of consonant and/or
vowel combinations that each word is compared against. The software will
replace one combination with another. Thus it is not a standard table,
but rather a list of swapping pairs.
- wordInitialConsonants:
- This is the table
consulted for letter type 'C'. Use this table to list the consonants or
consonant clusters that are allowed to appear at the beginning of a
word.
- internalInitialConsonants:
- This is the
table consulted for letter type 'c'. Use this table to list the
consonants or consonant clusters that are allowed to appear at the
beginning of a syllable that does not begin the word.
- wordInitialVowels:
- This table is consulted for letter type 'A'. Use this table to list
the vowels that may begin a word.
- internalStressedVowels:
- This is the table consulted for letter type 'V'. Use this table to
list the vowels that may occur in a stressed syllable but do not begin
or end the word.
- internalUnstressedVowels:
- This is the table consulted for letter type 'v'. Use this table
to list the vowels that may occur in an unstressed syllable but do not
begin or end the word.
- wordFinal Vowels:
- This is the table consulted for letter type 'a'. Use this table to
list the vowels that may end a word.
- internalFinalConsonants:
- This is the table
consulted for letter type 'k'. Use this table to list the consonants or
consonant clusters that are allowed to appear at the end of a syllable
that does not terminate a word.
- wordFinalConsonants:
- This is the table
consulted for letter type 'K'. Use this table to list the consonants or
consonant clusters that are allowed to appear at the end of a word.
Notes on Terminology:
A syllable may consist of an onset (C or c), a nucleus (V,
v, A, or a), and a coda (K or k). Languages can be differentiated not
only by the list of possible consonant and vowel combinations, but also
by whether the onset or coda is optional or mandatory. The nucleus is
mandatory. Wordsynth also allows the language designer to decide which
syllable constructs may appear at the beginning, middle, or end of a
word.
The term vowel as used in this documents should perhaps be
sonorant, and may include monophthongs, diphthongs, or
triphthongs, as well as approximants or nasal consonants.
The term consonant may include a single consonant or
digraph, or a consonant cluster.
Syllable stress patterns in wordsynth are sonorant-based. Future
development may take into account moraic theory by adding additional
tables of consonants.
View a complete sample language file (and try
to guess the source of the words it attempts to imitate). This is called
ExampleX in the Wordsynth Online script, if you want to see it in
action. Additional examples are in the distribution file.
TRY IT ONLINE! (Updated Jul 26, 2003)
You can check out some sample output from pregenerated language files
by using the wordsynth online cgi script.