View demo

Large Language Models and Machine Translation: The era of hyper-personalization

LLM - MT Header

Roeland Hofkens Roeland Hofkens , Chief Product & Technology Officer, LanguageWire

Large Language Models (LLMs) have taken the world by storm.

Some of these models like OpenAI’s GPT-4 and Google’s PaLM2 have been trained on a multilingual data set and should – at least in theory – also be very capable at machine translation tasks.

But is this really the case? How do we unlock the full potential of Large Language Models for machine translation? In this technical deep dive, we will look at how LLM’s work in the context of machine translation and how they could be integrated in a Translation Management System (TMS).

Current machine translation models

Most of the current commercial machine translation tools such as Google Translate are based on neural models with a transformer architecture. These models are purpose-built for one task: machine translation. Out-of-the-box they already perform very well in tasks required for translating generic content. However, in specialised business contexts, they might miss the right vocabulary or use a suboptimal style.

Therefore, it is useful to customise these models with additional business data by training them to recognise your personalised terms and phrases. Using various customisation techniques, the model ‘learns’ to use your business’s tone of voice and terminology which then produces better machine translation result

Large Language Models (LLMs)

Large Language Models are usually also based on transformer architectures. However, compared to the Neural Machine Translations (NMT) models in the previous section they are trained on much larger bodies of text and contain more model parameters. LLMs contain billions of parameters versus a few hundred million in single task, bilingual NMT models. This makes LLM models more flexible and ‘smarter’ when it comes to interpreting user instructions or ‘prompts’. This new technology opens a lot of new possibilities in terms of model customisation with business data. Because this approach is so powerful, I prefer to speak in terms of “personalisation” instead of “customisation”. Let’s explore how this personalisation works.

A double approach to personalisation

When using LLMs, there are basically two approaches to fine tuning the model, so it produces better quality at inference time, the moment when it generates its response.

  • Tune the parameters (aka “weights”) of the model before use so it learns to adapt to your needs. This is a resource-intensive operation that requires AI engineers to prepare a customised version of the model.
  • Use in-context learning. This is a simpler technique that informs the model about your data and preferences as it generates its responses via an especially engineered prompt.

Let’s first investigate parameter tuning.

Updating the parameters of an LLM can be a daunting task. Remember that even small LLMs have billions of parameters. Updating them is a computationally very expensive task that is usually beyond the reach of a typical consumer as the cost and complexity of doing so are simply too high.

For machine translation purposes, we will normally start with an instruction-tuned LLM model. This is a model that has been fine-tuned to be more helpful and to follow instructions, rather than to simply predict the next words. After tuning, the model will perform better on a variety of tasks like summarization, classification, and machine translation. We will be providing more information about which model to pick in future blog posts in this series.

The instruction-tuned LLMs are a good starting point for further, customer-specific optimisations. Using an approach called Parameter Efficient Fine-Tuning or PEFT, we can fine-tune an instructed model with customer data in a short, more cost-effective way.

At LanguageWire, our preferred PEFT method is LoRA (Low-Rank Adaption), which usually involves updating ca. 1,4 – 2,0% of the model weights. This means the customisation effort is reasonable, but also surprisingly effective. As you can see in the table below, the authors of the LoRA paper conclude that LoRA can prove to be even more effective than full tuning of all the model parameters!

LLM and MT Table

For the best results from this method, we need to have access to a large amount of high-quality training data with matching source and target texts. If you have already built up a sizeable translation memory, then it can likely be used for this purpose.  The LanguageWire AI team is constantly working to identify the ideal size of the translation memory for LoRA tuning.

Now let’s move on to the second approach, in-context or few-shot learning.

In-context learning is a method where the model learns on the fly from a small number of examples introduced by a specially crafted prompt. It is also known as few-shot learning.

In the context of Machine Translation, few-shot learning works as follows:

  1. The system analyses the incoming source content. Usually this will consist of one or more sentences or segments.
  2. The system tries to find examples of similar source content fragments and their respective translations.
  3. The system creates a prompt that includes the source content to be translated and the examples of the previous translations.
  4. The LLM learns on the fly from the examples and creates a high-quality translation of the source.

Few-shot learning for MT has a positive impact on fluidity, tone-of-voice, and terminology compliance. It requires fewer examples to work with, a maximum of three to five. In fact, the efficiency doesn’t improve with greater sample sizes so it does not benefit from including all your translation memory in a single prompt. Experiments have shown that LLMs do not handle large prompt contexts very well and that the quality of the results might even deteriorate!

It is through combining the benefits of LoRA and few-shot learning together, that we can implement powerful optimisations in the Large Language Model that ultimately lead to hyper-personalised, top-quality machine translation.

Your linguistic data is the key!

None of these techniques would work without a large set of high-quality, up-to-date bilingual text corpora in various language pairs. Your Translation Memories are an ideal source for this data set.
However, before it can be used, you must consider several important aspects:

  • Quality. All the data should be top quality, preferably translated by qualified human translators and verified in a four-eyes workflow, i.e., approved by two people.
  • Noise. Not all the data in your translation memory might be relevant. Some of the data could be older, irrelevant or refer to discarded products. Regular cleanups of your translation memory, to remove irrelevant material, are important.
  • Size. You will need a certain data volume to make sure that fine-tuning works well. The larger the better if the quality has been maintained.

If you use the LanguageWire platform, the automated Translation Memory Management module takes care of these aspects for you and no manual action is needed.

In the case that you have an existing external translation memory which you would like to use with our Platform and machine translation services, our engineers can make this possible. LanguageWire engineers have created import APIs, cleanup scripts, and Language Quality Assessment Tools to help you to make most out of your most valuable linguistic asset.

The LanguageWire solution

So how do we bring all this together for a typical translation project? Let’s look at an example.

LanguageWire offers a solution that is fully integrated in our technology ecosystem. This is demonstrated in high-level steps in figure 1 below.

In this example, we have taken a simple workflow where a customer wants to translate PDF or office files. The user simply uploads the content files using the LanguageWire project portal. From there on, everything is orchestrated automatically:

  • The incoming data are analysed and transformed to an XLIFF file.
  • The system creates a pre-translation based on translation memory matching and machine translation.
  • Our community of human experts provide post-editing and proofreading.
  • In the next step the translated XLIFF is reassembled into output files again, preserving the layout.
  • Finally, the customer can download the translated files from the portal.

FIGURE 1: A simple translation project in the existing LanguageWire Platform result

FIGURE 1: A simple translation project in the existing LanguageWire Platform result

In example 2 we focus on the pre-translation step using machine translation based on LLM technology. As we can see illustrated in figure 2 below, the linguistic data of the customer plays a central role.

  • For each piece of text, the LanguageWire system finds the “K-nearest neighbours” in the translation memory. These bilingual results are used as the basis of a special few-shot learning prompt that is passed to the machine translation API of the LLM.
  • On the model layer, we have loaded a LoRA module that customises the LLM to the tone of voice and the vocabulary of the customer. Again, this is based on a data set compiled from the translation memory. We apply that data set to PEFT tuning with LoRA to create new model weights that are loaded in the model context. This tuning can be done regularly, e.g., every two weeks, to reflect new updates and content in the TM.

FIGURE 2: A translation example using a Large Language Model with a mixture of LoRA customisations and optimised in-context learning prompts.

Languagewire translation project diagram

When our specially crafted prompt is handled by the LLM, the custom weights in the LoRA module will contribute to a top-quality machine translation output. Once completed, this output then travels automatically to the next step in the process. Typically, this would be a post-editing task with a human expert in the loop for maximum final quality.

What does this mean for our customers?

In a nutshell: our customers can expect even better machine translation. The MT can adapt automatically to varying contexts, for example different business verticals, and align with the expected tone of voice and choice of words of that vertical.

Not only this will reduce the cost of post-editing, but it will also increase the delivery speed of translations. It will also open a broader scope for using the MT output directly, without human experts in the loop.

What else is LanguageWire doing with LLMs?

As we mentioned before, Large Language Models are very flexible. The LanguageWire AI team is investigating lots of other areas that could benefit from LLM technology.

We are currently researching:

Automated Language Quality Assessment. The LLM could check the translation of a human expert or the machine translation output of another model and give a quality score. This could reduce the cost of proofreading substantially. The underlying Machine Translation Quality Estimation (MTQE) technology can also be applied to other use cases.

Content authoring assistants. Using a combination of PEFT with LoRA and few-shot learning, we can personalise the LLM model to have a focus on content creation tasks. A customer could provide keywords and metadata allowing the model to generate a text that is using a business customised tone of voice and word choices.

Apply further customisation of the LLM with data from Termbases.

And there is lots more to come. Stay tuned for more blog posts in the coming weeks about AI and the future of LLMs at LanguageWire.

How can we help you?

Your journey to a powerful, seamless language management experience starts here! Tell us about your needs and we will tailor the perfect solution to your enterprise.