Introducing the First Self-Supervised Algorithm for Speech, Vision and Text

3 years ago 276

We’re introducing data2vec, the first high-performance self-supervised algorithm that learns in the same way for speech, vision and text.
With data2vec, we’re closer to building machines that learn about different aspects of the world around them without having to rely on labeled data.

Today, we’re announcing data2vec, the first high-performance self-supervised algorithm that learns the same way in multiple modalities, including speech, vision and text. Most machines learn exclusively from labeled data. However, through self-supervised learning, machines are able to learn about the world just by observing it and then figuring out the structure of images, speech or text. This is a more scalable and efficient approach for machines to tackle new complex tasks, such as understanding text for more spoken languages.

Self-supervised learning algorithms for images, speech, text or other modalities function in very different ways, which has limited researchers in applying them more broadly. Because an algorithm designed for understanding images can’t be directly applied to reading text, it’s difficult to push several modalities ahead at the same rate. With data2vec, we’ve developed a unified way for models to predict their own representations of the input data, regardless if it’s speech, text or audio. By focusing on these representations, a single algorithm can work with completely different types of input.

With data2vec, we’re closer to building machines that learn about different aspects of the world around them without having to rely on labeled data. This paves the way for more general self-supervised learning and brings us closer to a world where AI might use videos, articles, and audio recordings to learn about complicated subjects, such as the game of soccer or different ways to bake bread. Data2vec will also enable us to develop more adaptable AI, which we believe will be able to perform tasks beyond what’s possible today.

If you’re a researcher interested in building upon our work, you can access the open source code and release pretrained models on GitHub.

Learn more about data2vec.

The post Introducing the First Self-Supervised Algorithm for Speech, Vision and Text appeared first on Meta.

Read Entire Article