3 Tips to Find Foreign Languages in Your eDiscovery Data

A New Morningside Translations Article Featured on the Relativity Blog.

Relativity is the most popular eDiscovery platform for lawyers and legal professionals. We recently published a blog on Relativity’s website highlighting three important tricks for quickly identifying critical foreign language information. Check it out below.

You are an eDiscovery professional at a big law firm. You sit at your desk, log into Relativity, and sort through documents. You’re cruising right along, finding exactly what you need and even ahead of your deadline. You start to think about lunch. Should I get a $15 salad from the place across the street? Should I get pizza? Get pizza. You deserve it. Get a whole pie. You’re doing great.

Then you come across a huge cache of foreign-looking documents. You think they’re in Romanian, but you don’t actually know Romanian. You also don’t know Latvian, Lithuanian, or Polish. Could it be one of those? A bead of sweat forms on your brow, and you start to panic, running through your options in a mental catalog:

To ask one of the partners what to do, turn to page 17
To go down a 13-hour internet rabbit hole, turn to page 25
To run to the parking lot to cry in your car, turn to page 33

Don’t love those options? Luckily, a few helpful hacks can save you a ton of time and frustration after you stumble upon foreign language documents halfway through your review—or, even better, prevent this heart-sinking moment and find them at the outset.

Identifying foreign languages as early as possible in your review process is critical to achieving clear, predictable costs, preventing unnecessary delays, and constructing a sensible workflow. Here are several easy tricks to help you navigate foreign eDiscovery waters.

1. Use foreign language stop words

If you don’t have Relativity Analytics or are looking for a quick and easy way to scan your data set for a certain foreign language, then a creative use of stop words in a dtSearch might help.

Stop words, called noise words, are the most frequently used words in a given language (for example, in English: and, the, my, all, for). They are typically filtered out of a dtSearch or keyword search, as they tend to be so common that they don’t return valuable search results. However, their frequency also makes them great for finding foreign language documents.

Because it’s safe to assume that stop words can be found in just about any text, a dtSearch for a list of stop words will likely return any documents in the foreign language. If, for example, you believe your data set may contain German, then searching for German stop words will hopefully return any documents with German text.

Note that each language has its unique set of stop words, so rather than translate a list of English words, it’s best to obtain a list of them in the desired foreign language from a legal language services expert.

2. Run language identification

While the stop words hack gets the job done, it requires that you have an idea of which languages are in your data set and proves tedious if you want to search for more than one language. Full language identification analysis is preferable for data sets containing multiple languages or if you simply want to double-check for any foreign languages before proceeding with your review.

Language identification uses machine learning to detect the languages in a piece of text automatically. A feature in Relativity Analytics is that it returns the primary and up to two secondary languages in a document, along with the percentage breakdown of each language.

From here, you can leverage the language identification output to guide your next steps; build dashboards to achieve a birds’ eye overview of the number of documents, custodians, and control numbers by language; batch documents by language so they can be sent to foreign language reviewers efficiently; and then send any documents with foreign language text for machine translation so you can review the gist in English. Regardless of your approach, language identification results lay the groundwork for the rest of your review workflow.

3. Recognize that the internet is your friend—except for when it’s not

The beauty of the internet is that you can find almost anything you’re looking for with the click of a button. A simple Google search of stop words in your suspected language will net you some quick and reliable returns. For example, searching “Spanish stop words” points you to a comprehensive list of stop words in over 40 languages. The internet is pretty great, am I right? But don’t let it give you a false sense of security.

We’re all aware of the free translation tools out there. You might think that simply copying and pasting your documents into one of these free engines might be your ticket out of this language identification mess, but before you go down that road, there are a few important issues to consider:

Copying and pasting is extremely tedious, considering the volume of documents you’re likely dealing with. “Ctrl+C, Ctrl+V” isn’t really a feasible option when confronted with hundreds or thousands of documents.
Free online translation tools are not secure. It is also their property once you input text into one of these tools. In most cases, you deal with sensitive documents that shouldn’t be exposed to third parties. But, of course, you already know that.

Choosing one of the hacks above rather than a free online translation tool is a surefire way to keep your data secure and allow your team the time they need to focus on building a killer case.

So you found foreign language documents. Now what?

Now it’s time to determine whether those foreign language documents are relevant, privileged, or something else—in other words, to figure out what they say. You’ll likely want to partner with a trusted language service provider to do that. Choosing a reliable provider is a topic for another day, but here are a few quick tips to get started:

Ensure they have ISO-certified quality — Bad translations can cause confusion and cost you time and money. Defend yourself against them. Choosing a provider certified by the International Organization for Standardization is a good start.
Make sure they have extensive experience in eDiscovery — Most often, a combination of tools—such as machine translation, foreign language review, and keyword search term translation—will optimize your time and costs, so make sure your provider is familiar with all of them and how they apply to these types of projects.
Make sure they are familiar with your chosen technology. Selecting a partner who is already comfortable with your eDiscovery software can save time, boost security, and prevent headaches. Some may even have a dedicated application for your platform, like Morningside’s Relativity plugin, providing dedicated support inside the tool you already know.

With these simple workflow hacks, you have some better options to choose your own eDiscovery translation adventure. Have you used any of them before? Let us know in the comments.

Dylan Blaney

Dylan Blaney, Vice President of Business Development, Morningside, a Questel Company, is responsible for the company's global legal translation strategy and maintaining relationships with the world's largest corporate legal departments and law firms. Dylan expertly advises law firms and corporations on best legal translation practices, specifically implementing workflows and technology that reduce cost, improve quality, and accelerate deadlines. He specifically focuses on international litigation & arbitration, FCPA, patent litigation, M&A, internal corporate communication & documentation to ensure adequate global communication in the legal marketplace.

3 Tips to Find Foreign Languages in Your eDiscovery Data

A New Morningside Translations Article Featured on the Relativity Blog.

Relativity is the most popular eDiscovery platform for lawyers and legal professionals. We recently published a blog on Relativity’s website highlighting three important tricks for quickly identifying critical foreign language information. Check it out below.

1. Use foreign language stop words

2. Run language identification

3. Recognize that the internet is your friend—except for when it’s not

So you found foreign language documents. Now what?

Related Insights

What to Consider When Choosing a Translation Service

Understanding Legal Translation

The Power of Localization in Global Marketing

Get the latest insights delivered to your inbox