Imagine you are in a meeting, chatting with people from different countries and/or those who speak other languages other than your own. How would you communicate with them? Maybe you are using an auto-translate feature available on your phone or you have copied and pasted what that person has typed into an online translation engine such as Google Translate to get an idea of what is being said. Having such features at the tip of your fingers has allowed us to travel internationally with ease, share research from different countries, and form relationships with individuals from across the globe. Automated translation is a capability that has been developed and refined over several years and has allowed us to communicate and connect with various communities. This form of translation has caused data scientists, like myself, to beg the question – how did it start and where is it going?
Globalization and the Internet have changed not only how we can communicate with each other but have virtually shortened the distance so we can communicate with a wider range of people as well. If you want to meet someone new from a different place or culture than yours, it is very easy to do so on social media. However, there is still a language barrier slowly being brought down as computer translations are improved. Computer translations rely on both code and reliable recorded translations that already exist when we perform translations ourselves. It is important to understand, first, how translation started and how it has evolved to where we are now asking machines to process and complete our translations.
History of Language Translation
The first recorded translation occurred in the 3rd century BC, where the Hebrew Bible was translated into Greek so that Jewish people who were not familiar with Hebrew could still read and practice Judaism (United Translations 2018). This translation task was given the name “Septuagint” and resulted in approximately 70 translators working on the project. The Bible was later translated into Coptic, Latin, Georgian, Armenian, as well as others. In fact, many of the more notable translations in history were related to religious needs, such as the Diamond Sutra by Kumārajīva in the 4th century (Racoma, 2018) – a religious Buddhist text that is one of the most important of the faith and a copy of it is the oldest that is still around (Stanford, 2009). Manual methods of translations continued occurred for approximately 1700 years longer, until technological advancements in the 20th century allowed for the development of theory related to translation by machines.
In 1949, Warren Weaver presented machine translation proposals based on information theory and World War II codes (Mandal, 2019). In the 1950s, translation work escalated at a high rate and made this an intensive task. The Georgetown-IBM experiment, using the IBM 701 computer, in 1954 completed about 60 sentences of Russian into English, printing out a translated 2.5 lines every second. This experiment jumpstarted the era of “machine translation”. In 1966, the Automatic Language Processing Advisory Committee (ALPAC) Report concluded that quality translations would still come from humans, so it suggested stopping the funding for machine translation (Mandal, 2019). This did not stop researchers from continuing to find ways to improve machine translation, which led to early technologies like SYSTRAN, a rule-based machine translation (RBMT) tool in 1968. In the 1980s, the focus changed from rule-based translation to example-based for machine translation (EBMT). Example-based translation can be thought of as using sentences that were similar but had a slight change, such as “I left some food” and “I left some water.” There are slight differences in the two sentences and so in EBMT, there’s no need to redo the whole sentence when you only need to translate a word.
From there on through the mid-2010s, there were smaller improvements using the different and new techniques, such as the introduction of statistical models, but the next major update would come in the form of Google Translate in 2006. While this still required human translators for accurate translations, this boasted the ability of translating up to 1 million books per day—something which had never been seen before.
Case Study: Google Translate
Let’s take Google Translate as an example of how computers have affected translation. Google’s current Translator model is BERT, the Bidirectional Encoder Representations from Transformers model. BERT reads all of the words in a sequence at one time, which is what makes it “bi-directional” (Horev, 2018). Since BERT is bi-directional, this allows it read in and learn the context of a single word based on all of the words surrounding it (Horev, 2018). For example, when translating from Spanish to French, each word is processed as it relates to the other words in the sentence (Kaput, 2021) instead of from Spanish to English and English to French, as it would have been in earlier iterations of Google Translate.
The previous iteration of Google Translate used what is known as an Long Short-Term Memory networks Recurring Neural Network (LSTM-RNN). RNNs were used in the mid-2010s as a step up from statistical models. In recent years, LSTM-RNNs produce output similar to how our brains function. This is important to note because unlike prior techniques, LSTM RNNs can use information from prior cycles to “make decisions” during later cycles (Olah 2015). LSTMs are a type of RNN that allow for a model to “remember” its previous cycles, as an issue typical RNNs suffer from is short-term memory. It is important for a model to be able to remember its prior cycles to improve the quality of the translation. If a model cannot remember the prior two or three words in a sentence it is translating, that could prove to be an issue when the translated sentence is completed – the translated sentence may not make any sense. While an RNN-LSTM model can handle long sequences, it still has trouble maintaining all the information of the original sentence (Hever, 2020).
An example of an LSTM RNN translated sentence would be when comparing two objects or things to one another. Typically, when doing so, you would use “like” or “similar to” to indicate that you are comparing two things. With this model, once it reaches the first object, it will recognize that it is being compared with something else, but also think that it does not need to remember it. Another shortcoming is that RNNs do not process everything in one read. From here, Transformer models were developed using a similar process for language translation to account for these deficiencies.
Google Translate was built to break language barriers and create more accessibility in the world (Turovsky, 2016). Its functionality was a step closer than any other previous machine translation product. In the 1990s, statistical methods were prevalent in machine translation products, which is what was used behind Google Translate during its initial launch. English was the intermediary language when translating between languages that were not English. This left plenty of room for information to be lost during the translation process, but Google transitioned to using neural networks for its translation practices. This is an improvement from the statistical models previously used.
Technology, such as BERT, has been made available outside of Google Translate, which could bring on further developments in language translation and how computers interpret language. Many companies have modified BERT for their own use cases, such as the case with Facebook. Faceboook built RoBERTa, which is another natural language processing model, based on BERT as a base model. Facebook is using RoBERTa to, broadly speaking, show the potential self-supervised techniques have over traditional machine learning techniques in regards to performance (Facbook 2019). Facebook has also built TaBERT, a model for understanding queries, which they hope to apply to fact checking and verification applications at some point (2020). In another case, a model Med-BERT has been created as a way to use electronic health records to predict diseases and accelerate AI-aided healthcare (Rasmy, et.al, 2021). As a result, improvements in techniques over time have allowed for us to have better translation outputs as we have seen when comparing EBMT with the later developed Google Translate. Language translation cases can even be used in other domains, as what can be seen with Facebook’s RoBERTa and TaBERT and Rasmy’s studies on Med-BERT. The next time you are looking at a translation from your phone or in your online meetings, just remember that there’s quite a bit of history behind it.
Σχόλια