A team of researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has created an AI algorithm capable of automatically deciphering long-lost languages, even without other advanced knowledge of the relationship to other languages.
Lost languages are more than just academic curiosity. Without them, we lack a whole body of knowledge about the people who spoke to them. Unfortunately, most have such minimal records that scientists cannot decipher them using machine translation algorithms such as Google Translate. Some do not have a “relatively” well-researched language to compare with and often do not have traditional separators, such as white space and punctuation.
The aim is to discover the relationships between “lost” languages, for which historians have found written records, but which no one has spoken for a long time.
researchers they learned a “decryption algorithm” of different linguistic constraints that arise as languages evolve in predictable ways. Then, the AI algorithm discovered language models using these constraints.
“For example, we can identify all references to people or locations in the document, which can be further researched in the light of known historical evidence,” says researcher Barzilay. “These methods of recognizing entities are commonly used in various word processing applications today and are extremely accurate, but the key research question is whether the task is feasible without training data in the ancient language.”
Therefore, the algorithm can classify words in an old language and link them to equivalents in other related languages. In other words, although it is not able to act as a Google Translate to decipher ancient English texts, it can identify the roots of ancient languages.
For example, the algorithm managed to identify precisely the language family of Iberian, an ancient language spoken by the natives of Western Europe from the seventh century to the first century BC. For example, according to AI, Iberian was not actually related to Basque, as recent research confirms.
The team hopes to eventually use their AI to decipher long-lost languages using just a few thousand words.