The Past, Present, and Future of MediaCAT: AI-powered media localization tool

Ted Bae
Frontend Team Lead
April 20, 2023

After 7 months of maturation, MediaCAT, which we launched in September 2022, was recently reborn in version 2.0 with "Projects". 7 months is so long for an early-stage startup like XL8. In this post we'd like to share why we created MediaCAT, what we had in mind when we updated it to 2.0, and how we plan to take it forward.

The Past of MediaCAT

Skroll: A testbed demonstrating XL8's outstanding translation API results

If you've been reading our blog, you know that XL8's strongest competitive advantage is its unrivaled performance in multilingual translation of media content. We knew that if we could provide an API with great translation performance to LSPs (Localization Service Providers), we could significantly reduce the time it takes for them to do initial translations.

In Q2 and Q3 2022, a committee of localization partners blind-reviewed 2,400 translated sentences across 6 genres for our new context-aware models. The figure above shows the percentage of sentences that were translated to a deliverable level without any post-editing.

In Q2 and Q3 2022, a committee of localization partners blind-reviewed 2,400 translated sentences across 6 genres for our new context-aware models. The figure above shows the percentage of sentences that were translated to a deliverable level without any post-editing.

We weren't too concerned with the UI in the early days because we weren't targeting the general public and we only focused on providing APIs. We assumed that LSP customers would only use our website to test the translation performance before integrating a proper API, so it would be enough to show just a list of the results of calls to the API. That was Skroll, the predecessor of MediaCAT.

Skroll was used until the release of MediaCAT.

Skroll was used until the release of MediaCAT.

Initially, our assumptions were somewhat correct. But things started to change as we expanded our product motto from "Help translate media content into multiple languages" to "Help create localized media content". Skroll's services have expanded both forward (Sync: extracting original subtitles from media) and backward (Dub: creating multilingual subtitled videos), and we gained a number of non-LSP customers (e.g., content providers, and content creators such as YouTubers who don't have the ability to review translations themselves). Naturally, the requirements changed a lot as well. Skroll was being used much more than before, but we couldn’t easily improve the UX because we didn't build it with those use cases in mind.

The fragmentation of media content creation tools became an opportunity

Here's what you roughly need to do to create a properly localized video.

  1. Create a video
  2. Edit the video
  3. Create template subtitles
  4. Translate the template subtitles into multiple languages
  5. Proofread the translation for each language
  6. Burn-in, dub, more effects, etc.
  7. Upload it to video delivery platform

If there was a market-dominant tool that covered most of these steps, we could just plug our API into it, but that was not the case. We've found that many of our customers use a combination of tools specialized in each step, and create their own workflows. In particular, translation and proofreading are often outsourced to freelance translators or translation agencies, so they had to add project management tools to their workflows. Each tool had its own tradeoff; some could edit the video and create the original subtitles but couldn’t translate them. Some could translate, but not in the subtitle format. Some could manage translation projects but couldn’t cover the rest of the steps. Something was not quite right.

This would have been a big problem if we had insisted on only API integration. But when we changed our perspective, we saw it as an opportunity. We thought, "Why don't we build a tool that covers the A to Z of localized media content creation?" We would be able to cover only a few steps at first, but we were confident that we could gradually expand the scope. That's how we started to build MediaCAT.

MediaCAT 1.0: Sync, Translate, Dub.

A promotion site for MediaCAT before its release.

A promotion site for MediaCAT before its release.

Design Goals of MediaCAT

The value proposition of MediaCAT is 'to reduce the time and cost of creating localized media content'. The question was, “Whose time and money?" We had a lot of discussions early in the product design and development process about who our primary target audience would be. It was important because different customers have different places where they currently spend a lot of time and money.

After much deliberation, we decided that our first target should be small to medium-sized LSPs that would have difficulty integrating directly with XL8's API. As a new entrant to the media utility tool market, we felt that these were the customers who would get the most value from XL8's strengths of first-pass translation quality. In fact, they were the ones who used Skroll most often (and were the ones who were most frustrated). We also felt that it would have higher ROI to focus on the small number of B2B customers who were already using our product (Skroll), rather than investing our resources in acquiring and onboarding new B2C customers like YouTubers.

We knew that if we wanted LSPs to reduce the time and cost of creating localized media content, we should provide more than just good transcription and translation quality. No matter how good the quality is, it's still not easy to get subtitles to a point where they can be delivered without human review. So we needed to understand the LSP’s workflow for reviewing, editing, and delivering subtitles, and we needed to embed those items into the UX of MediaCAT. At the same time, we needed to design the structure to be scalable so that we could cover a broader range of tasks in the future. Taking all of this into account, we had three main design goals for MediaCAT 1.0.

  1. Improve the experience for existing Skroll customers; Once you upload a media file or subtitle file, the rest of the localized content creation process should be seamless, with no need to re-upload.
  2. Quickly review and edit result subtitles; The initial transcription and translation by AI will inevitably contain inaccuracies. Make it easier to edit by showing a preview of the results, and include a built-in subtitle editor.
  3. Create a foundation to cover the entire process of ‘creating localized media content’; Even though we're only covering a small part of the process now, we've laid the groundwork for adding various elements that will help LSPs save time on transcription and translation, both deeply (more support for transcription and translation itself) and broadly (more support before and after transcription and translation).

Also, after the launch, the MediaCAT development team declared that "we will release at least two user-friendly feature updates every two weeks" to provide a continuously improving product to customers. Fortunately, our commitment has held up well so far. And since February 2023, we've also been notifying users about these updates in the What's New on MediaCAT.

We strive to provide at least two meaningful updates every two weeks.

We strive to provide at least two meaningful updates every two weeks.

Main Features of MediaCAT 1.0

As a result of these efforts, we were able to deliver the following features in MediaCAT 1.0.

Sync

  • It transcribes a user's media (video or audio) file to generate template subtitles. Media files can be uploaded via YouTube, Google Drive, Dropbox, or other links.
  • Users can attach a transcript and add glossaries to improve the performance of time-coding and speech recognition.
  • We've recently added a variety of subtitling support features, such as speaker diarization, speaker gender & age detection, and SDH(Subtitles for the Deaf or Hard-of-Hearing) support.
  • As of April 2023, Sync supports a total of 37 languages.

Translate

  • It translates a user’s subtitle file into multiple languages. Instead of uploading, users can use subtitles generated by Sync via the Proceed feature.
  • Some language pairs offer special options. For example, English to French translation offer Formality and Genre option. Users can choose Formality to specify the tone of voice (e.g., Informal vs Formal), or choose Genre to specify the mood (e.g., Comedy, Horror, Military). Note that all translation models released after October 2022 support the Genre option.
  • Similar to Sync, users can attach glossaries to improve translation performance.
  • As of April 2023, Translate supports a total of 42 languages.

Dub(Beta)

  • It dubs the language of the subtitle into the voice of your choice. You can also use the subtitles generated by Translate with the Proceed feature, instead of having to upload files yourself.
  • If you attach a video, we'll dub over it, and if you don't attach a video, we'll just generate an audio file.
  • As of April 2023, Dub supports a total of 24 languages.

Edit

  • Users can review the subtitles generated by Sync and Translate, and modify the timecode and content as they want.
  • As mentioned above, the AI transcription and translation results are only a rough draft. In Sync, you may have a video that is also difficult for a human to transcribe due to a number of factors; the presence of background music or noise, inaccurate pronunciation, voice overlap, etc. Transcription is better because there is a ‘right answer’, but for translation there is no such right answer. Every translator has different preferences, and in the case of a series, the context of previous translations make things harder. Therefore, even if an LSP entrusts an AI with the initial translation, the final product must be reviewed and edited by a human. This is why MediaCAT provides its own subtitle editor so that users don’t have to move outside to edit.

With the work we'd done over the few months, we felt we had more or less accomplished our three design goals. And with the foundation in place, it was time to scale to the next level. The next direction was "Projects".

MediaCAT 2.0: Projects.

It's no coincidence that the three main features of MediaCAT - Sync, Translate, and Dub - are referred to by the relatively lighthearted name of "Tasks". While each of them represents an important step in the process of creating localized media content, each Task functions independently within MediaCAT, except for the existence of the Proceed feature.

While this has been valuable enough for customers who only want to use one type of Task, it leaves a lot to be desired for those who want to use more than one Task in sequence. It's a problem that we've been fully aware of through multiple user interviews, so we thought it was time to add a new package that ties Tasks together and enhances collaboration capabilities to help create production-ready multi-lingual content.

An early concept storyboard for Projects. It depicts the communication between the Content Provider (CP) and the LSP.

An early concept storyboard for Projects. It depicts the communication between the Content Provider (CP) and the LSP.

Since we designed each Task to be assembled in a modular fashion, we didn't expect the development of the "Projects" that tied them together to take long, but we were wrong. We had a lot of new features to implement, we wanted to make the UI and UX better, and all the while, we had to keep our commitment to releasing at least two user-facing feature updates every two weeks for 1.0. It wasn't an easy journey, but after all the twists and turns, we're happy to announce the mid-April release of “Projects”.

Difference between Tasks and Projects

First, let's do a quick overview of how Tasks and Projects differ.

  • Tasks focus on speed. They're great for customers who want to pick and choose whether they want to Sync, Translate, or Dub, run it, and see the results quickly with very little editing and/or review.
  • Projects focuses on quality. They’re great for customers who need to deliver subtitles to some vendors, or who want to collaborate with multiple linguists to create high-quality, production-ready subtitles for the channels they need to broadcast.

Right now it may feel like Projects is just a bundle of Tasks. To provide specialized value in both directions, we'll continue to add speed-focused conveniences like bulk subtitle creation and UI simplification in Tasks, while continuing to build out features around quality control and collaboration in Projects. In particular, we're gearing up to make Projects even more useful for PMs at LSPs.

Main Features of MediaCAT 2.0

Since this is a blog, not a manual, we won’t go into a lot of detail, just highlight the main features. Note that ‘Team Workspace’ has been around since MediaCAT 1.0, but Tasks didn't have any special collaboration feature. Starting with Projects, we provide more collaboration support for team users.

Features for individual users (Personal Workspace)

  • Create a project by uploading a subtitle you want to translate, or by uploading a media file if you don't have the subtitle.
image
  • If you created a project from a media file, you can run Sync to generate the template subtitle. The basic usage is the same as the Sync Task, including Transcript and Glossary, but in Projects you can play around with the options in real time before you start editing.
image
  • Once the adjustments are completed, you can edit, translate, and compare between multiple languages. Editing and translating is essentially the same as it was in 1.0.
image

Collaboration features for team users (Team Workspace)

  • If you want to collaborate with other coworkers or freelance translators, you can request us to convert to a team workspace and invite them to join your team.
image
  • If you are an owner or admin of a team, you can assign team members to edit and/or review subtitles in specific languages for your project.
image
  • Team members who are assigned a language can see the project in Assigned Projects, and perform the assigned job.
image

MediaCAT 3.0: To be continued…

Although we released 2.0, we still have a long way to go to reach our original ambition of covering the A to Z of creating localized media content.

  • It's obvious that we should always keep increasing Sync, Translate, and Dub performance.
  • There's a lot of room for improvement, especially in Dub, which is still in beta after its initial release. And we need to get it into Projects as well.
  • As I mentioned in ‘Difference between Tasks and Projects’, Tasks should help you see results faster, and Projects should help you edit and review in much less time and cost. Projects also needs to be much more collaborative.
  • We also think it's important to have a dashboard that shows the process and results of all these works, and how much time and money customers have saved with MediaCAT.

We don't know when we’ll be ready for 3.0, and we also don’t know how close that 3.0 will be to our ultimate goal. But we do know that if we keep adding features every couple of weeks that our customers will recognize, we'll be a few steps closer.

One of them is Revise with AI which is coming very soon. This is an experimental feature where the AI inside our subtitle editor makes suggestions for the next revision based on what translators have revised so far. We expect this to dramatically reduce editing and review time by making it easier for translators to customize their subtitles. All major updates will be rolled out via What's New, so be sure to click it when you see the red fire dancing in the top right corner.Written by Hwidong Bae, Frontend Team Lead at XL8

Contributors
Ted Bae
Frontend Team Lead

Need more information?

Feel free to reach out to us today, and we'll get back to you within one business day.
Be sure to include your location so a member of our global sales team can assure communication in your closest time zone.