Algilez
Contents on this page
1
Vocabulary - where does it come from?
The Algilez vocabulary is compiled
from a number of different sources. These include:
|
Roget's Thesaurus,
headwords and keywords
|
These cover all word meanings in the English language (although there
may be some sub-classifications which are not covered)
|
|
Voice of America Wordlist |
Wordlist used in Voice of America
radio broadcasts. This list contains a useful number of words
relating to politics and present day life.
|
|
Longman Defining
Vocabulary |
The wordlist used by the Longman
English Dictionary to define its meanings. This list should
therefore include most of the commonly used and understood English
words.
|
|
Assessment and
Qualifications Alliance (AQA) French and German GCSE wordlists in
the UK
|
The wordlist used by the AQA board
for the French and German GCSE exams. This gives the French &
German words or phrases that a student would be expected to know in
order to take the AQA French or German GCSE exams. This is
normally taken by British school children aged 16 after about 5
years of learning. A pass at A, B or C level is equivalent to Common
European Framework (CEF) level B1. CEF levels vary from
A1 (beginner), A2, B1, B2 (A Level), C1 (University level) and
C2 (professional level).
|
|
Additional Sources |
These include common household items, animals (domestic, farm and
zoo), children's games and toys, colours and shapes etc2 |
2
Statistics
The
Algilez Vocabulary continues to be developed.
In July 2006 there were 4,662 Roget entries, by January
2012 this had increased to 5,851.
This was the result of development of the Algilez
documentation which includes the Grammar, Phrase Book, the
English-Algilez Translation Guide,
additional words from the AQA Examination Board and the draft
GCSE Lesson Book.
There are estimated to be about
a million words in the English language, so we still have some
way to got yet. Please
remember that the vocabulary still really only contains general
‘day to day’ words but is being added to almost every day.
Note also that the word list will not contain all the possible
grammatical combinations of tenses or adjectival affixes that
are possible in Algilez, just a few examples (usually of the
verb infinitive and adjective) are provided.
There were about 1725 root words used
as of January 2012, excluding plants, animals and proper names.
Algilez does not yet
contain a full set of individual words for plants, animals and
technical items that would be encountered in a natural language.
However, the core Algilez
vocabulary appears to be fairly robust for day to day use –
anything you can say in English you can say in Algilez!
Note that 'compound words' as defined
below are new words with a new semantic meaning.
Grammatical variations derived from any existing root or compound word
(e.g. conjugated tenses, adjectives & adverbs etc)
are sometimes included, in order to make finding the words
easier. However since there are no irregular words in
Algilez, all grammatical variations follow set patterns (e.g.
for tenses, verbs, professions, adverbs and adjectives based on
the root words) and are very easy to apply.
Statistics as of 27 January 2012 (note that these
are changing regularly)
|
Total vocabulary entries i.e. Algilez words
|
5,851
|
|
Total individual Root words
|
1,913
|
|
Main root words ' r ' (i.e. excluding plants, animals
and proper names)
|
1,725
|
|
Repeated root words, r2 & r3, (Note, not
duplicates but just repeated for ease of reference)
|
29
|
|
Animals
|
88
|
|
Plants
|
45
|
|
Proper names (countries etc)
|
55
|
|
Compound words (derived from the main root words)
|
3,902
|
|
Dictionary words (root words plus various affixes) -
the total number of words used so far
|
14,570
|
3
Comparison with English
|
Algilez Phrase Book
The Algilez Phrase Book (as of 23 June 2011)
contains
the following numbers of words and characters:-
|
|
English words 7434 (equivalent to 9 or 10 sides
of typed A4 in normal text)
|
|
Algilez words 6419 (86%). i.e. a
direct translation from English to Algilez requires only
86% of the words.
|
|
Even if the words 'the' (280) and 'a'
(191) are excluded from the English text (since they are
not included in Algilez) this removes 471 words, leaving
6963 English words. Algilez has still only 92% of
the words, showing that the grammar and format of
Algilez is more compact than English.
|
|
English characters 31244
|
|
Algilez characters 22298 (71% compared
with English)
|
|
Eliminating the characters from the words
'the' (840) and 'a' (191) will remove
1031 characters leaving 30,213 characters in the English
text. Algilez has 22,298 characters which is 74%
of the English.
Algilez GCSE Lesson Book
Translations of English and
Algilez sentences in the Algilez GCSE Lesson Book
(Jan 2012) gives similar results:-
English words 13,730, English characters
56,805. Without 'the' & 'a' 12,768 words & 54,629
characters.
Algilez words 11,052, Algilez characters
40,803.
Even with 'the' & 'a' removed
from the figures, Algilez has only 87% of the words and
75% of the characters compared with English.
Total figures for both documents gives 89% words and
74% characters
|
|
This demonstrates that over a fairly large amount of
standard text, Algilez contains over 10% fewer
words and 25% fewer characters. On average, Algilez words are shorter than
English words (3.6 characters compared with 4.2
characters). All of this contributes
to help reduce the burden of learning.
|
3
Vocabulary required for standard exams
What do you need to know?
The AQA exam board word list provides a
useful benchmark for estimating the words that need to be
learned for Common European Framework (CEF) level B1. I have combined the recommended
wordlists for the French and German exams (marked with 'F' in
the second column of the Algilez word list) which total 2843
words. Most words are identical for both exams but a
few hundred are specific to either French or German. Also,
many words are grammatical variations of the same root word.
The list contains many words appropriate to normal day to day
life for European teenagers and is therefore a reasonable list
for an intermediate standard of conversation and reading.
There are 1276 Algilez root words
(including plants and food) included in that list. 1567
words (2843-1276) are compound words or grammatical variations
derived from the root words. Hence as well as the grammar
rules, knowledge of approximately 68% of the 1693 Algilez root
words (and the compound words derived from them) would therefore
need to be learnt to achieve a CEF level B1 (UK GCSE level).4
4 How to
use the vocabulary
The Algilez vocabulary format is
based on the Roget Thesaurus numbering system (see
Roget's Thesaurus below).
Roget |
English |
Algilez |
Root |
Additional Algilez |
English |
|
Roget number
plus additional sub-classification letters |
Roget headword or keyword (this gives the basic meaning of the
word). Plus additional common words which express the same meaning. |
Algilez
word |
Root Word (r)
animal (a),
plant (p)
or Proper Name (n) |
Additional grammatical versions of the Algilez word. Normally
the adjective and verb |
English grammatical versions of the
Algilez words |
|
054b |
fullness, plenitude |
fu |
r |
fua, fuiz |
full, to fill |
|
136bb |
defer, postpone |
delgã |
|
delgãiz |
to defer |
|
209e |
up, rise |
up |
r |
suupiz, upiz, upa |
to rise: to raise, upper5 |
5 Roget's
Thesaurus
3.1 Ogden's Basic English
My initial starting point for a
simple wordlist was Ogden's 'Basic English'. This
appeared to be a well thought out and compact list of
850 common English words. The theory being that you can say
anything that you need to say just using those words.
Unfortunately
it soon became clear that 'Basic English'
was fatally flawed due to allowing multiple meanings
of words, since this is the only way the list can be kept down to 850.
Multiple meanings are confusing in any language and English suffers
particularly badly from this problem. Any artificial language must
be designed to avoid this.
3.2 Roget's 1000 categories of
meaning
The reference book which I used to
establish the various meaning of the words used in Basic English was
Roget's Thesaurus. In using the book in a detailed and methodical
way (rather than just looking for synonyms for essay writing or
crosswords), I began to realise the excellent work which lay behind it.
In compiling his Thesaurus, Peter Roget had first categorised the whole of
the English Language. This was a lifetime's work and the final
product is not just a detailed list of English synonyms, but, most
importantly, a comprehensive analysis of the English language into a
logical list of just under 1000 categories of meaning.
3.3 Other languages
It is a great complement to
Roget's intellectual ability and language skill that his categories are,
with only very minor exceptions, still valid today after 150 years.
Every single word in the English language can be placed in one of the
categories. Since the categories are based on meaning, then any
other language can be similarly categorised using the same 1000
headings. Roget himself was certainly familiar with, possibly even
fluent in French, German and Latin and did hope that a common world
language might benefit from his work. In fact, shortly after
Roget's Thesaurus was first published, versions also appeared in French
and German. It would be interesting to know if they have been
published in any other languages. Since crossword puzzles cannot
be a purely English language hobby, then I'm sure there must be more
versions around somewhere!
3.4 Different types of Thesaurus
Modern versions of Roget’s Thesaurus
sometimes use a different numbering system (e.g. starting with
'001: Birth' instead of '001: Existence'). Other versions
may not use numbers at all but simply be a list of synonyms in
alphabetical order. It is the classification system of
Roget’s original Thesaurus, as much as the grouping of synonyms
that makes Roget’s work so useful in language analysis (although
a few of his categories might seem a little questionable today).
The construction of a new logical vocabulary would be impossible
without first deciding what meanings it was necessary to express
and what associated words stem from those meanings. The
classification process was a lifetime’s work in itself.
Fortunately we are able to use the excellent work of Roget to
continue with the construction of a new language, today.
The Historical Thesaurus of English, recently published by
Glasgow University (after 45 years of work!) also uses a
different numbering system to Roget. However it is still
possible to compare similar meanings from one book to another.
In view of this, I have decided to retain the Roget
classification system for the time being, since it is better
suited to language development work. Given the enormous
workload in producing a new thesaurus, it appears unlikely that
a better version of the Roget system will appear, but we shall
see.
3.5 Language by numbers - a new way
of looking at language
A very interesting implication for language
development has become apparent from working with the Roget
Classification system. Since every word in Algilez has (or
will eventually have) a unique classification number, then it
would be possible to write sentences in Algilez using just the
classification numbers alone. This is made much easier by
the regular syntax of Algilez.
There are a number of implications for this.
Firstly, the actual words used could be changed very easily.
This has already proved very useful in this development stage of
the language when
words have sometimes needed to be changed to something more
suitable. For translation purposes, the substitution of
foreign language words might also be possible (although syntax
differences would still require further work to make a good
translation).
The second implication is that machine
reading of Algilez should be considerably easier, since any
sentence could theoretically be reduced to the numerical
components from the classified vocabulary list e.g.
| English |
He went with his
son to the park |
|
Algilez |
il goz
vek cuil ila u pãk |
| Algilez Roget numbers |
371cf 265c/125a 089a 011dea 371cf/564b
289b 837fa |
Whether this will eventually prove to be
worthwhile remains to be seen, but I suspect there is
considerable potential for further development.
3.6 Issues with Roget's Thesaurus
Occasionally I have come across classification
examples that I find questionable. Generally I have
accepted Roget's expertise in the matter and gone along with his
classification. However there are a few cases where I
cannot agree and have put words into a different classification
group. Whether the original classifications were down to
Roget or to later editors I cannot say. Similarly, I
wonder if these arguable classifications had been noticed and
debated previously? I list below the two examples found so
far. In any case it certainly does not detract from the
magnificent achievement of the creation of the thesaurus in the
first place.
|
English
|
Algilez
|
Original Number
|
Revised Number
|
Reason for change
|
|
Route, road etc
|
rut
etc
|
624
|
305
|
Roget appears to have mixed the meanings of 'way'
which in 624 is used to mean 'method/how' and that of
305 where 'way' is used as a passage/physical route.
I have therefore moved all words relating to routes,
roads, paths etc to 305
|
|
harm
|
bocid
etc
|
645
|
655
|
Roget uses 624 'Badness'. I think that 655
'Deterioration' fits better for harm.6
|
6 Algilez
classification example
4.1 Lateness
The word lists used to build the
Algilez vocabulary contained three similar words: delay, defer
and postpone. All three words come under the same category
of Roget 136, for which the Headword is 'Lateness'.
Lateness is a general term and is sub-divided into 'Lateness' and
'Delay', which I have given the numbers Roget 136a (Lateness) and
Roget 136b (delay). 'Delay', 'defer' and 'postpone' all come
under sub-section 136b (delay).
4.2 Delay,
defer and postpone
In other words 'delay, defer
and postpone' are all considered by Roget to have a similar
enough meaning to be grouped under the same category and to be a form of
'delay'. As an initial assumption, we could say that all three
words have the same meaning.
Note that this does not
imply that they would always be interchangeable whenever they
were used in English. It may be that under different
circumstances either 'delay' or 'defer' or 'postpone' might be
used, due to 'custom and practice' of normal English usage.
However, the point is, initially we are starting by saying that
the semantic meaning of the three words is identical and for
that reason any one of them could be used and the Algilez
word that represents that meaning (del)
would apply to any one of the three.
(Note that when I compare
'a delay' I am comparing it with 'a deferral' or 'a
postponement', similarly we are comparing 'to delay' with
'to defer' or 'to postpone'. It is the
semantic meaning that we are looking at here, not the
grammatical usage).
4.3 Differences between
Defer/Postpone and Delay
However in looking at the words
more closely, we may consider for example, that although 'defer'
and 'postpone' have identical semantic meaning, that meaning is
slightly different to that of 'delay'. In such a case we
need to modify the Roget numbering slightly e.g.
|
Roget Number |
English |
Algilez |
|
136ba |
delay |
del |
|
136bb |
defer, postpone |
delgã |
In this case we have chosen to
define 'defer' & 'postpone' as a delay
to starting something and therefore formed the compound word 'delgã
' from the roots 'del' (delay) and 'gã'
(begin). An alternative may have been to say that 'defer' &
'postpone' might be defined by 'del' (delay)
and 'hãp' (happening/event) , hence making 'delhãp'.
This then enables others to quickly
compare the semantic meaning of any words with both English and other
languages, both natural and artificial.7
7
Vocabulary
5.1
Classification
Algilez is classified
into approximately 1000 main classes of meaning.
The classification is based on those used by Peter Roget in his Thesaurus.
Each main heading of meaning is numbered.
Sub headings and individual words are shown by additional letters to a
maximum of 3 digits and 3 letters e.g.:
|
011c |
family |
fam |
r |
|
011ca |
mother |
pãrel |
|
|
011caa |
mum (mother) |
mã |
r |
|
011cab |
grandmother |
pãrpãrel |
|
|
011cb |
father |
pãril |
|
|
011cba |
dad |
pã |
r |
|
011cbb |
grandfather |
pãrpãril |
|
|
011cc |
child |
cu |
r2 |
|
011cca |
son |
cuil |
|
|
011ccb |
daughter |
cuel |
|
|
011ce |
sibling |
sib |
r |
|
011cea |
brother |
sibil |
|
|
011ceb |
sister |
sibel |
|
|
011d |
race, people |
peg |
r |
|
011da |
tribe |
fampeg |
|
5.2
Root Words
A Root Word is a Algilez
word that is in its most basic simple form. It is generally (but not
always) a noun and can have tense and verbal affixes etc added. An
example of a root noun is 'bel', meaning
beauty (an abstract noun). To
this root we add a verbal suffix to create the verb 'beliz',
meaning 'to beautify'. We can
also add an adjective suffix 'a' to make 'bela'
e.g. 'peel bela' (a beautiful woman). The same affix 'a'
can also make an adverb e.g. 'pintoz bela'
(beautifully painted). Note
that a 'qualifying' word following a noun will always be an adjective and
one following a verb will always be an adverb.
A number of frequently
used words consist of single letter roots e.g.:
|
journey,
travel, move place |
g | |
hear |
h | |
listen |
l | |
see |
s |
|
However the above roots are never used alone, they will always have an
additional letter or letters to make them into a noun, adjective, verb or
adverb e.g.:
|
to
travel |
giz |
|
I went yesterday |
me goz ozde |
|
a journey |
go |
|
Come here! |
gez he |
|
5.3
Algilez Vocabulary
The
has a separate web page and based on MS
Excel. It consists of a Algilez word list categorised by
Roget reference number. The wordlist can be copied from the web
page onto any spread sheet and then re-sorted into alphabetical order of
English or Algilez words as required.8
8 Why use a
classification system?
6.1 Starting with a word list
The starting point for a new
vocabulary is going to be one's own language. In my case, English.
The initial need is for a basic wordlist/vocabulary of the more commonly
used words, which can then be expanded to include the remainder of the
language. Given the tens of thousands of regularly used words
(including the variations of tense etc) and the hundreds of thousands of
lesser used words (including specific animal, plant and technical
terms), the difficulty is in knowing even where to begin.
6.2 The need
for classification
A second difficulty is that basic
word lists, no matter how common the words, do nothing to help with the
classification of the vocabulary which is essential for a new language.
Without classification and the sensible ordering of words of similar
meanings or those derived from the same roots, then, no matter how much
better the grammar is, the new language itself is going to be little
better than any natural language with all of its inconsistencies and
difficulties for the learner.
6.3 Previous approaches to
choosing vocabularies
Previous artificial languages have
generally succeeded in providing a simplified grammar but have generally
still tried to use lengthy and sometimes illogical European language
words as the basis for their vocabulary. They often use just an
alphabetical word list with little or no attempt at a classification of
word meanings. This may have eased word recognition by European
language speakers but would be meaningless to native speakers of
Chinese, Hindi, Arabic etc.
6.4 The Algilez classification
method
Perhaps the simplest way to
demonstrate the advantages of a classified list is to look again at the
words relating to family
|
011a |
consanguinity, kinship |
ken |
011cb |
father |
pãril |
|
011b |
kinsman |
kenpe |
011cba |
dad |
pã |
|
011ba |
uncle/aunt |
onk |
011cbb |
grandfather |
pãrpãril |
|
011bb |
uncle |
onkil |
011cc |
child |
cu |
|
011bc |
aunt |
onkel |
011cca |
son |
cuil |
|
011bd |
cousin |
kos |
011ccb |
daughter |
cuel |
|
011c |
family |
fam |
011ce |
sibling |
sib |
|
011ca |
mother |
pãrel |
011cea |
brother |
sibil |
|
011caa |
mum (mother) |
mã |
011ceb |
sister |
sibel |
|
011cab |
grandmother |
pãrpãrel |
011d |
race, people |
peg |
|
|
|
|
011da |
tribe |
fampeg |
English words such as son,
daughter, brother & sister have no common roots to denote male or female
or to denote child. Seeing the words together, in the same Algilez
classification group above, makes it much easier to see which words
ought to use a common root. The Algilez words follow a logical
pattern, making understanding and learning much easier and quicker.
Algilez uses many words of English
origin and I have chosen to use new root words if the word is 1)
frequently used and 2) would otherwise require a long compound word of
three or more root words.9
9 Choice of
root words
Root words are generally based
upon the abstract noun. In some cases there are a large number of
choices, any of which would work and none of them obviously right or
wrong. In these circumstances the tangible noun is often the one
chosen due to being the more common word. An example is friend 'fren'.
|
Grammatical use |
English |
Algilez |
|
Root Noun |
friend |
fren |
|
ex - quality |
friendliness |
frenex |
|
øk - result/outcome |
friendship |
frenøk |
|
iz - verb |
to befriend |
freniz |
|
a - adjective |
friendly |
frena |
|
a - adverb |
friendlily |
frena |
However, we could have used
'friendliness' as the main root word and modified the other meanings
accordingly e.g.
|
Root Noun |
friendliness |
fren |
|
tangible noun |
friend |
frenpe |
|
øk - result/outcome |
friendship |
frenøk |
Alternatively we could have taken
'friendship' as the main root word :
|
Root Noun |
friendship |
fren |
|
tangible noun |
friend |
frenpe |
|
ex - quality |
friendliness |
frenex |
In most cases the root chosen, in
order to maintain the shorter word (without affixes), has been that which
is most commonly used. In this case I have judged that 'friend' is
likely to be a more commonly used word than 'friendship' or 'friendliness'
and therefore defined friend as 'fren'
instead of 'frenpe'. (In fact
fren and
frenpe have slightly different meanings anyway but it illustrates
the point). See below for information about compound words.10
10 Creating
a compound word
Compound words are comprised of two or more root
words. In the section above are examples based on the root word 'Fren'
(friend). In these examples, we have used 'ex'
and 'øk', which are two commonly used
modifiers.
|
Grammatical use |
English |
Algilez |
|
Root Noun |
friend |
fren |
|
ex - quality |
friendliness |
frenex |
|
øk - result/outcome |
friendship |
frenøk |
However, not all word
creation is quite so obvious. Let us take the example of the word
'Passport'. This is a two-part English word, in common use and well
understood. However the word itself was probably created several
hundred years ago and would have been used to describe a letter of
permission allowing an English traveller to cross by sea into France.
Nowadays the two parts of the word do not accurately describe the function
of a passport and it would be confusing to just apply a literal
translation from English to Algilez i.e. pass-port =
pãs-goas.
We really need to think about what exactly
the function of the document is and then find the best words to
describe it. A dictionary definition
gives 'passport:- official document for use by a person
travelling abroad.' E.g.
a passport is a travel document, a means of identification, a
permit to enter countries etc. However, we do not want to
produce an unnecessarily complicated, multi-syllable word.
Some of the choices available are words such as:-
|
059b |
foreign country |
bosnax |
|
265d |
journey |
go |
|
494b |
authenticity, genuineness |
truøk |
|
547b |
identification, naming, point out |
den |
|
548a |
document, record, documentation |
rek |
|
733a |
authority |
fur |
|
756a |
let, permission, allowing, allow, may |
le |
|
756c |
permit, licence |
lepap |
Some of the above words are already two-part compound
words. In the end, the choice was made to use 'Goden'
which combined the meaning of 'Journey' and 'Identity' and seemed most
appropriate to the present use of the word 'Passport'.11
11 Creating
new words
This is on a new page
Last revised: 27 January
2012