- Amazon’s ambitious multilingual NLU dataset Massive
- Using translators through Mechanical Turk to label utterances
- A single model for understanding commands in any language
- Open-source release and 77-year competition to achieve goals
- The competitive moat generated by access to data
#amazon #nlu #nlp #amazonalexa #mechanicalturk #mturk #alexa #opensource #github #speechrecognition #utterances
Resources
https://eval.ai/challenge/1697/overview
https://slator.com/amazon-unveils-long-term-goal-in-natural-language-processing/
https://analyticsindiamag.com/amazon-makes-massive-announcements-around-a-51-language-dataset/
https://arxiv.org/abs/2204.08582
https://github.com/alexa/massive
Automated Transcription
Hello hello. Hello Tyler Bryden, here wanted to talk today about an ambitious project from Amazon, released April 20th, flew under my radar a bit, but very fascinating. The name is massive, so massive data set so massively. Multi language and Lu lots of names, lots of abbreviations and here but the the vision around this is that. Imagine that all people around the world could use voice AI systems such as Alexa in their native tongues and. We’ve got a 51 language data set here, but apparently on the Earth there’s around 7000 and more languages, so the goal with this project is how can they use one single model to generalize the application across all these language and really enable interaction through voice with every person in the world. And of course, this is Amazon. They’ve got these ambitious missions, and one of the things that’s really fascinating.
About this is that they actually launching a competition with this and this competition’s deadline is May 31st, 2099. So we’ve got what, 77 years that this competition is going to last, so they know that this is an ambitious journey that they’re on and the the fascinating part about it here is that they have really released this in an open source manner, so that other people can access this and then improve upon it. And I think this is something that is continuing to be. A pattern that we are now seeing with these big companies is although they have so much power and capabilities, they need contributions from other people to accomplish these goals and and so they’ve got the status set repo ways to process the data and then some of the output in JSON here, which I mean really fascinating about how they sort of. How they’ve sort of laid this out in such a structured manner. And then what I found was really interesting was there was this idea of like local translation or localization, which is like translation.
Is the same expression in target language, whereas localization is. It’s the same, it’s not the same expression, but it’s more suitable to that specific locale, so it was it needed to be modified to make sense in that language in that in in that in that location and then there’s an unchanged version, which is no, no modification and the actual utterance was able to work and so right now many of us are used to interacting with voice assistance. Google, Alexa, Siri, and Alexa has definitely. Been a big driver of this adoption. There has been challenges I believe with adoption there was a lot of adoption and I believe there has been a little bit of slowing down. Although I believe Amazon has also pushed back about that with some numbers and I think there is a lot of work to actually get to the goal that they want where people are interacting with this and and creating value. And there are some more interesting things that I’ve seen recently. I stumbled across a Amazon Alexa knowledge which is like uploading data.
Tables, databases and then allowing people to query that through natural language to then get the answers that they’re looking for. And this is something that we’ve really thought about at speak AI, and then the ability to do that through any language and be able to convert that to an utterance of them produces an accurate response. Obviously that is a very fascinating thing and unlocks a lot of potential for human computer interaction and just a future that that might be in. One of the things that I found also very interesting that you can see through the GitHub library is that they used. They used what they say is professional translators to then label the utter utterances and they use ones from different localities to to then label it in the right way and the system that they use to do this is actually an Amazon system itself that.
Really fascinating that I’ve actually interacted with before in early versions of speak Eye, which is basically there are real people in the world signed up for this and there are tasks that then are then they’re assigned. And because they have these people sort of organized in a specific manner, they could say someone here. Someone in you know, the the east side of France. I’m making this up east side of France. Can they label this utterance?
And then that’s how they’re. That’s how they’ve actually gone through this process to then label them. So there are Amazon M Turk workers, Turk workers all throughout the world who have self organized themselves. And then Amazon is then using this. This workforce augmented by the sort of computer delivery that they’ve built, which is basically, you know, a little media player where they can play the audio and then label what has said and then that’s then built back into the data set that Amazon has released. So this is that mean. It’s a very fascinating thing and although we look at the magic of AI and machine learning, so much of this comes down to quality labeling of data sets to be able to then make strides forward in the more machine powered application of these technologies.
So we had worked with Amazon Turk in the past and then the leader I would say who had also interacted with Amazon Turf but wasn’t sufficient enough to accomplish all the goals became was it was scale, which I mean has an incredible story. Just massive growth and really you can see this entire data centric machine learning life cycle that that goes through and a lot of it is that exact same thing. Label the data so we know that Hiltons is an organization Paris is. The location and and and Paris Hilton is a person and I had recently shared a version of a named entity recognition where it was Robin Hood and Robin Hood was I was actually talking about the context of the company, but it did not understand it. It labeled it as a person and so the challenge is there when you run just automatic analysis is that that context could be lost. That nuance could be lost and so that you need some human oversight, especially in the early stages of these applications to.
Get the results that you’re looking for, so I think this is the. Continue journey, that is is underway. There are some to me, some. Uh. Like some really interesting things that Facebook’s released waived to VEC where this talking about not needing the same levels of data sets and training. I’m still actually searching for answers around that and I’ve been asking people and no one has really been able to give me the full response that I feel sufficient and satisfied and able to share here. But again, overall this this this challenge that Amazon has put out. This open source data set is is massive and has this big, ambitious goal to be able to.
Work with any language on Earth through a single model to understand the intent and then make that interaction happen through voice. So definitely very fascinating and I think it also speaks to the competitive Moat that is built when these companies are not able to just create these original data sets, but then also have the community, the audience, the credibility to then help to have people want to contribute to that community to those data sets. And as they sort of move through these. Sort of competitions. There are people presenting papers. There’ll be so much innovation wrapped around this where this data set will continue to prove, and they’ll get closer and closer to this goal and.
And with that amount of data streaming through with that amount of contributions, they then build this competitive technical mode that makes it very hard for another company, another organization to compete against, and so I think we’re going to continue to see that we’re seeing that in other companies who have built out developer communities who are contributing it to. I think of hugging face, which recently just raised $100 million. Congrats to them, but those, the the combination of the the server storage, the processing, the power there, the technical team. And then the audiences and the communities contributing to that. All that really does compound to make a strong strong Moat that is built there. And they even, quickly, that was demonstrated, I think, through hugging face valuation at 2 billion was sort of idea that they may be only around 10 million in revenue. And I can’t deny or whatever that is. But they said something also really valuable, which is usage is a precursor to revenue. I believe in that. And obviously there’s these open source.
Pieces that Amazon’s releasing for this data set, but all of this will translate into Amazon continuing to be the powerhouse of a company of a business of this efficient army that just chugs along and very interested to see where this goes. So again, this was a little bit of looking at Amazon’s massive NYU system. You know a little bit more of a niche niche topic, but definitely some some fascinating things that coming out would love to chat about this with anyone who’s interested the the and the competition and workshops that are coming along along with that how they used M Turk and translators to label these utterances and the idea of localizing it versus translating it through a no change. And then this sort of open source and competition and how that continues to grow this this data set with the vision of how can they understand? Every language on Earth through this model. So very fascinating stuff. I hope that this was interesting to you and appreciate it. As always, you tune it in. Thank you so much. Hope you have a great day.