This is is part of my live-learning series! I will be updating this post as I continue through my journey. I apologize for any grammatical errors or incoherent thoughts. This is a practice to help me share things that are valuable without falling apart from the pressure of perfection.
Deepgram helps enterprises unlock the potential of their audio data with custom-trained speech recognition for accuracy and scale. Deepgram’s deep learning speech recognition system searches for keywords in transcripts by sound and text within audio and video files, used for transcribing calls, meetings, and voicemails for searchable notes, providing businesses with a fast, high-accuracy speech-to-text on an easy-to-use API.
The company was founded in 2015 and is headquartered in San Francisco, California.
– The largest Series B round raised by a speech AI company
– Alkeon Capital Management provided a significant portion of the capital for this round as well
– Returning investors include Blackrock, Tiger Global, Wing VC, Citi Ventures, SAP.io, InQTel, Nvidia, Tiger Global, and Y Combinator
– What is next for Deepgram in terms of development
– Competing against big incumbents including Microsft, Google, Amazon
– Deepgram has transcribed over a trillion words
Deepgram Completes $72M Series B Round to Define the Future of Speech Understanding – Deepgram Blog ⚡️
We Raised $25 Million from Tiger Global and Others to Unlock the Power of Voice Data and Fuel the World’s Big Ideas – Deepgram Blog ⚡️
Alkeon Capital Management
Deepgram – Automated Speech Recognition (ASR)
Pricing & Plans – Deepgram
Compare Google STT Alternatives – Speech to Text API Alternative | Deepgram
(1) Deepgram (@DeepgramAI) / Twitter
Deepgram on Twitter: “We completed our $72M Series B with $47M in new funding to define the future of AI Speech Understanding 💥 Details here: https://t.co/WdJaqvsBnB Or keep reading 🧵” / Twitter
Deepgram Raises $47M to Define the Future of AI Speech Understanding
Speech Recognition AI Startup Deepgram Closes $72M Funding Round – Voicebot.ai
Deepgram Raises $47M to Define the Future of AI Speech Understanding – Benzinga
Deepgram extends Series B with $47M | PitchBook
Speech recognition startup pulls in a respectable series B round of funding | Biometric Update
Automated Speech Recognition Firm Deepgram Bags $72 Million
Deepgram Raises $47M to Define the Future of AI Speech Understanding
Transcription-as-a-service startup Deepgram lands $47M in funding – SiliconANGLE
(99+) Scott Stephenson | LinkedIn
Scott Stephenson – Co-Founder & CEO @ Deepgram – Crunchbase Person Profile
Deepgram – Crunchbase Company Profile & Funding
deepgram,series b,scott stephenson,speech recognition,speech to text,speech apis,speech recognition apis,symbl ai,assembly ai,google transcribe,microsoft azure transcription,amazon transcribe,software,startups,artificial intelligence,automatic speech recognition,machine learning
OK. Hello. Hello. Tyler Bryden here. I hope everything’s going well today. Wanna take a look at significant raise in the speech recognition AI space. And if you know anything about me, you know I love this space. This company called Deep Grand that completes a $72 million series B. If I read you know what, they’re sort of messaging on crunch bases and crunch base maybe lags behind a little bit. Deep Graham helps enterprises unlock the potential of their audio data with custom training, speech recognition for accuracy and scale. 8 grams deep learning speech recognition system searches for keywords and transcripts by sound and text within audio and video files used for transcribing calls, meetings and voicemails for searchable notes, providing businesses with the fast, high accuracy speech to text on an easy to use API. The company was founded in 2015 and is headquartered in San Francisco. I’ve been following this company for a long time. I have used their API. I have played around with their speech recognition system. This is not something that.
Use on an ongoing basis a couple of problems with specifically sort of the integration layer. But overall the actual technology is certainly an incredible sort of feat that they’ve pulled off and it’s in a market that there are a lot of incumbents like when people think of speech recognition. If I go actually to the bottom of this screen, I can look at the comparisons here and we’ve got Google, Amazon, Microsoft, Nuance and there’s some legacy companies like speech matics and then also some well funded companies like assembly. Eye symbol AI who are going down this path and people are certainly you know very interested in the space. We talked about this idea and this is something that that speak I hear we work on work with all the time. This idea of like dark data and so much of it being just like essential for business insights but unable to you know be able to mine it and get those insights and the companies really sort of struggling with that so.
From that perspective, what they’ve really focused on and I think they’re going to go beyond that is but at a core level is this idea of accuracy and cost. And in both those we have seen you know those sort of promises delivered on it is definitely inexpensive. You know it’s still competing. If you’re say you’re not a developer and you don’t want to interact with an API, you’re still competing. I think it’s probably other AI might be outcompeting.
And the actual lowest cost and then the accuracy level and we’ve done tests work for example, run trans Amazon transcribe versus Azure versus Google speech text versus deep gram and seeing the output and from a sort of spelling grammar. And then I would say also the punctuation and formatting of that content. I’ve really seen deep gram be successful in in that and what they’ve also done and this is what they talk about in the article here as they’ve started to add a lot more. On top, and this is what we’ve also looked at, speak, I like transcription speech to text, speech recognition. It, it helps you solve one problem, but then it just opens up a Pandora’s box of more problems, which is just way too much data to be able to handle. And with all this sort of new information, you need other ways to process that. And so this extraction layer that they have built out, you know, releasing is definitely a valuable one. Now a couple things.
On this actual raise, part of it was announced in 2021, which I do remember, I’ll pull that up 25 million. It’s obviously a very distressed time that companies are sort of raising money right now. So I’m interested to know again like was this a case of? The cream rising to the top and this made sense and their growth is so insane. They’ve processed up two trillion words that company, you know investors with all this sort of dry powder had nowhere else to really put their money in. So it flowed to deep gram and Congrats for being the, you know the cream of the crop in that regards. And so we’ve got Elkin Capital Management who led that round and so you know pretty simple website here but obviously has a ton of capital and then they have some pretty big.
I knew they did, but like in pretty big returning investors, so BlackRock, Tiger Global, City Ventures, NVIDIA, Tegra Global, Why Combinator. So that’s a huge, you know, repertoire of great investors as as part of this. Now one of the things that was super interesting I think here. So talk about sort of these phase changes or the way that they’re looking at it. So Phase Zero invented a way to structure and train an end to end deep learning model to process.
We started phase one, so that was phase zero. Phase One improved the system produced transcripts with near human accuracy. No, there’s I think and they put in brackets here. There’s a lot of challenges with this, with this, with no speech recognition engine I have, I seen it be flawless, but it’s, you know, pretty incredible where we’re at in terms of that. And then phase two with the AI generated transcripts more legible. They started formatting, so numbers, punctuation, speaker separation and then.
You sort of formatting that and that’s where I’ve seen a lot of success in in sort of the system that they produced and now this is super interesting phase three. So these are you know as speaking I we work with I would say the more non-technical developers but within the same space and the way that they’re moving in phase three is definitely. Aligned with the problems that we’re seeing in the space and the requests that we’re getting asked. So language detection and translation. So for example, challenges where people are switching languages between, you know, switching between languages mid conversation, that’s a big problem within speech to text. And then just actually taking one and then translating it actually into another is a big problem.
Sentiment analysis and we have sentiment analysis within speak, it’s fantastic. But a lot of that sentiment analysis wait is on what’s being said and not how it’s being said. So they specifically identify tonal signals that reveal sentiment and so if they can achieve that accurately that is a huge asset. We’ve seen over and over again automatic summarization. This one’s very sort of common thing that we’ve seen in the space. A couple companies Assembly I from a developer perspective, I’ve worked a lot on this.
Other from a more consumer version of this of sort of making a summary of your meeting, but overall great. You have a conversation, you have many conversations, whatever these assets are, some sort of summarization is valuable to streamline the process of review or making decisions. And so again as a piece of alignment that we’ve seen in this is this next one here is. Which is basically speaker identification. This is actually one of the biggest problems we’re seeing that, you know, just make so much sense to be able to solve now problems with it, which is so that the problem is speaker identification and so.
The idea is that and again now Otter is doing this on a consumer level very well. But from from our perspective, a lot of the sort of speech recognition systems are sort of disengaged with this process. And so if you record a conversation and then you transcribe it and then you record another transcription, there is no sort of connection between you as a speaker in those two uploads. And what they’re now doing is basically registering a speaker profile and identity that is a unique speaker across many files. And so the idea here is if you have 100 conversations and you.
You know, you label yourself once, it would actually have that sort of signature, a fingerprint of who you are and would allow you to then automatically update the speaker across all those files, which from a, you know, my research perspective, if it’s a moderator versus people who are in focus groups. If it’s an internal, you know, who’s internal meetings and you want to see who is speaking, you know, a lot of applications of this that opens up, you know? Really powerful sort of analysis and filtering and summarization and solves. You know, to me what is one of the bigger problems in this space. So I really do like where they’re moving for this phase three, the last one being topic detection. I think this is another one assembly symbol.
We’re working on something that we’ve worked on with a lot of the. Sort of topic detection on a more general purpose engine being relatively generalized, not that specific and in that case not that valuable. There are systems that allow you to do it from a more like training like training, you know personalized model perspective or maybe that company is within a specific industry. I think of a great company called Chatter Research which is doing a topic identification specifically related to retail. So you know in that case it’s so tailored.
Super valuable. And if they can bring topic detection from a a more generalized layer to being specific across industries, super valuable. There. A lot of companies sort of say they’re doing topic detection, but really they’re just doing keywords and phrases extraction. So some of these features currently available in beta, beta, beta.
And then other ones are coming soon and then they just sort of have. A little sort of summary here which is nice. So they’re CEO is apparently a physics like dark matter particle detection. So that’s super interesting. Super interesting to know that they’ve been working on this since 2015 and then see the level of scale that they’re at now I’ve got you know some links if you’re interested in following them if you you know want to learn more always some good sort of resources and and and things to consider I think they are you know with this raise they’re definitely.
I mean I’ve seen also their sales team work there, they’re formidable, their continual sort of continuously working there, you know, passionate about the problems they’re solving. And I think all of this makes sense why deep Gram was able to complete the $72 million series B round. And I think there’s some some great Nuggets in here about competing against big companies getting in, you know, investors on board and getting them to repeat especially in a market like this. And then where are the. Future of sort of speech recognition and then what comes out of speech recognition with natural language processing where we’re where we’re headed and if you can look at these pieces here a lot of things start to give you opportunities that you can execute on. So I hope you got some insight out of this. I Congrats diagram excited to continue following the progress hopefully getting to use you in some more ways at some point and appreciate you know your focus and.
Ashion around a space that I obviously very passionate about myself. Thank you.
Thanks so much for checking this out. I’ll be back to improve this content for you.
Interested in Microsoft (Temporarily) Bans OpenAi’s ChatGPT For Employees? Check out the latest video and resources from Tyler Bryden on Microsoft (Temporarily) Bans OpenAi’s ChatGPT For Employees!