Please ensure Javascript is enabled for purposes of website accessibility

Using Artificial Intelligence to Progress and Evolve Global Languages

Building the Louvre of languages – A physical, digital, historical and future-oriented catalogue of communication tools

Thomas Frey //March 22, 2018//

Using Artificial Intelligence to Progress and Evolve Global Languages

Building the Louvre of languages – A physical, digital, historical and future-oriented catalogue of communication tools

Thomas Frey //March 22, 2018//

Much like the difference between seeing an online copy of the Mona Lisa or traveling to the Louvre in Paris and experiencing it first hand, it becomes an entirely different level of engagement.

In 2012, I proposed the creation of a Global Language Archive, roughly the same time the Endangered Languages Project was kicked off. But I always viewed a Global Language Archive as being more than an online effort.

The world is losing languages at a rapid clip.

More than 500 languages have less than 10 people still speaking them and many native speakers are losing the will to keep them viable.

Logically, the world would be a simpler place if we had fewer languages, yet loosing a dialect meant having a weakened record of communities that helped build the world around us.

I was puzzled.

Is saving languages really necessary? Much like animals going extinct, isn’t it just nature’s way? How will the world be a better place in 100 years if most of our 7,000 languages survive? And what exactly does archiving a language mean?


Language has been a central ingredient in forming heritage, modern culture and even our way of thinking.

We are the prime beneficiaries of the struggles, so in many ways we owe it to our ancestors for preserving a legacy. 

While we seldom consider it, most of history’s greatest stories have never been recorded, occurring among people who left no recorded version of it.

Though language is merely a vehicle, it’s also an obstacle, at times impairing our understanding of what truly happened. Without communication of our past, we struggle to understand who we are today.

How can humanity possibly know where it’s going if we don’t know where we’ve come from?

The purpose of the Global Language Archive is to preserve the legacy of those who have come before us, through the lens of languages they used to communicate with. But it needs to be more than a dusty museum filled with past recording of native speakers. It needs to be a “living museum.”

We should think of this platform as a never-ending work site for future discoveries.

What does it mean to archive a language?

Language is more than the verbal sounds that emerge from our mouths. It’s a combination of facial expressions, intonations, gestures, symbols, postures and body language used to convey intellectual concepts, verbal syntax and emotional value involved in basic human-to-human communications.

The minimum requirements for archiving a language is sufficient evidence of past forms of communication for an AI (artificially intelligent) Language Recreation Engine to sufficiently reassemble a functional language that can be taught.

Inputs will involve the collection of sufficient video, audio and written documentation for an AI Language Recreation Engine to generate a functional three-dimensional avatar capable of teaching the language to someone wanting to learn it.

While there is currently no such form of AI in existence, there is growing evidence that a language recreation engine is not only possible, but also likely to be developed soon.

Taking it a couple steps further, not only will this give us the ability to recreate language, but it will likely enable us to “fill in the gaps” and find missing words, create a written language if none exists, and do seamless translation from one language to the next.

For this reason, the process of archiving a language will involve the accumulation of sufficient remnants of a failing language so the AI Engine can take over. Each language collection will include sufficient fragments of written and spoken words, definitions, common phrases, expressions, explanations and value systems to begin the process.

Since most people can gain a function level of language proficiency with roughly 2,500 of the most common words, I’m estimating that will be the approximate range of words needed to begin the process.

If possible, the archive for each language will involve more comprehensive collections that attempt to capture lifestyles, cultures and routines involved in normal day-to-day living and communication.

Collections will include whatever is available including artwork, books, music, clothing, photographs, weapons, cookware, maps, videos and more. These will of course vary from one language to the next.

The loneliest books in the world are those written in languages that no longer exist. Yet these books hold clues to an unknown history filled with unknown value and importance that cannot yet be expressed.

For all we know, the greatest moments in human history were never recorded in any traditional fashion, and are currently inaccessible to modern people.

The Endangered Languages Project

When I first started talking about a Global Language Archive, another effort was taking shape.

The Endangered Languages Project has so far collected information on 3,410 languages. Its purpose is to be a worldwide collaborative between indigenous language organizations, linguists, institutions of higher education and key industry partners to strengthen endangered languages.

At the heart of their project is a website launched in June 2012 with funding from Google. While Google oversaw the development and launch of the site, the long-term goal was for it to be led by Justexperts in the field of language preservation.

For this reason, the project is now managed by First Peoples' Cultural Council and the Endangered Languages Catalogue/Endangered Languages Project (ELCat/ELP) team at University of Hawaiʻi at Mānoa in coordination with the Governance Council.

In the words of the Endangered Languages Project:

“Humanity today is facing a massive extinction: languages are disappearing at an unprecedented pace. And when that happens, a unique vision of the world is lost. With every language that dies we lose an enormous cultural heritage; the understanding of how humans relate to the world around us; scientific, medical and botanical knowledge; and most importantly, we lose the expression of communities’ humor, love and life. In short, we lose the testimony of centuries of life.

Languages are entities that are alive and in constant flux, and their extinction is not new; however, the pace at which languages are disappearing today has no precedent and is alarming. Over 40 percent of the world’s approximate 7,000 languages are at risk of disappearing. But today we have tools and technology at our fingertips that could become a game changer.

Users of the Endangered Languages Project website play an active role in putting their languages online by submitting information or samples in the form of text, audio, links or video files. Once uploaded to the website, users can tag their submissions by resource category to ensure they are easily searchable.

The Endangered Languages Project serves as a great first step, setting the stage for greater opportunities ahead. Several resources like Wikipedia, National Geographic, Global Oneness Project, UNESCO, and many more are attempting to draw attention to this problem in their own way.


The Endangered Languages Project puts technology in the hands of organizations and individuals working to revive struggling languages and save themselves from extinction.

Some developed hundreds of words for beads, fish, leathers and snow because those had become focal points of daily living.

Here are a few examples:

  • Voro has fewer than 50,000 native speakers and is spoken in the southeastern corner of Estonia and the Pskov Province in Russian.
  • Bisu has roughly 2,740 native speakers. In China, Bisu is spoken in one village of 240 people. In Burma, it’s spoken by 2,000 in two or three villages. In Thailand, Bisu is spoken by some members in two villages with a population of 500.
  • Bakairi is spoken by approximately 900 people in Brazil. This language has two rather divergent dialects: Eastern Bakairi, spoken by seven hundred people in seven villages, and Western Bakairi spoken by 200 people in two villages.
  • Cimbrian is spoken by fewer than 2,000 people in Italy, in the towns of Giazza, Roana, Mezzaselva, and Rotzo, and Luserna. People who speak Cimbrian also speak Italian, German, and Venetan.
  • Tjupany is an Australian language with only 10 native speakers remaining in the world.
  • Karelian is a language closely related to the Finnish language with 63,000 native speakers in Russia and Finland.
  • El Molo is spoken by roughly 700 in a small community of fishermen living in two settlements along the eastern shore of Lake Turkana, in northern Kenya.
  • Tuscarora is a dying language spoken in Ontario, Canada. Only two or three speakers of Tuscarora remain, all over the age of 80.

The Goal of the Global Language Archive

Creating a physical space that represents a focal point for language preservation brings with it tremendous opportunity. Unlike today’s cultural museums that capture physical fragments of history, the Global Language Archive will have a mission to preserve the communications, stories, and dreams of our ancestors.

Online efforts only go so far. By adding physical dimensions, human contact, audio stories and peripheral experiences, we breathe life into these otherwise single-dimensional languages.

As “last speakers” begin to dwindle, the final-person-responsibility brings with it tremendous stress and anxiety. The loss of a language means the loss of birthright, heritage and customs. It somehow breaks the connection with their ancestors and invalidates all of the accomplishments of the past, dishonoring the culture of their families.

Much of this stress can be diffused by taking these speakers through a formal preservation process that transforms them from crazy person clinging to the past to cultural expert with a deep understanding of their ancestors.

Curators of languages are different than curators of artifacts. Languages are constantly morphing tools of expression with deep emotional ties. Done correctly, the Global Language Archive will attract massive crowds from around the world and draw attention to this critically important problem. It will be a one-of-a-kind facility serving as a magnet for linguistic scientists and cultural researchers around the globe.

Once an AI Language Recreation Engine is developed, it opens the doors for entirely new kinds of research.

In this context, language itself becomes a cultural taxonomy, and with more than 7,000 languages left to preserve, it has the potential for becoming the largest museum in the world with associated universities, hotels, culture-inspired retail centers and more.

At the same time, many question still need to be answered:

  • Will we need to develop a triage system saving dying languages?
  • If you decided to learn one of the endangered languages, how would you make that decision?
  • If it becomes easy to learn a new language, how many will you want your children to know?
  • What are the revenue streams needed to sustain a Global Language Archive?
  • What’s the ideal location for this type of facility?
  • How can the entire world be recruited to support this venture?

What’s the best way to experience a language?

Yes, it is possible to experience pieces of these native tongues through a website, but having access to local experts, cultural guides and linguistic coaches takes it to a whole new level.

In our ever-expanding virtual world, it’s easy to start thinking that proximity isn’t important, but it is. Being surrounded by likeminded people at the Global Language Archive who share a common interest is very important.

Much like the difference between seeing an online copy of the Mona Lisa or traveling to the Louvre in Paris and experiencing it first hand, it becomes and entirely different level of engagement.

The Global Language Archive is envisioned to become the “Louvre of Languages.”