The Dynamic Lexicon

Bilingual lexicons are important resources for machine translation and automatic alignment systems. This paper presents a simple and effective model for extracting bilingual lexicons automatically from pre-aligned parallel texts by using information retrieval techniques. The model is based on the assumption that two words/phrases are likely to be translations if they are aligned to the same word/phrase in a third language.
As a use case we used the ​Greek-English​, ​Latin-English​, ​Persian-English aligned parallel texts available in Perseus Digital Library ​ to produce ​Greek-Latin, ​Greek-Persian and ​Latin-Persian dynamic lexicons.
The size of our datasets (104 Ancient Greek/English works, 59 Latin/English 494 works, and Persian/Englis poems), the Ancient Greek/English dataset consists approximately of 210k sentence pairs with 4320k millions Ancient-Greek words, and the Latin/English dataset consists approximately of 123k sentence pairs with 2330k millions Latin words, whereas the Persian/English dataset consists of 64 thousand translation pairs, (23k of them are unique)

