Speech Recognition Top Human Parity?
Speech Recognition Achieve Human Parity?
According to some, not a chance. For quite a few reasons, in fact.
First, speech recognition systems are nowhere near reaching human parity at this time. Remember, most are still in the development stage, mainly laboratory tested, and still completely inadequate at dealing with REAL recordings: with background noises, recording interferences, multiple speakers, pronunciation issues, stuttering, mixed broad accents, and even daily-changing jargon and slang. Speech Recognition and Speech To text services and businesses out there are realizing their models and service promises are still too early to provide transcriptions at the level of accuracy currently provided by human transcribers. In fact, if you look at the services out there closely, most acknowledge their position to provide more general transcript recordings of around 70% accuracy at best, than accurate transcripts of around 90%+ representation of the spoken word.
At this time, the main premise for speech recognition systems still being inaccurate are their continued dependency on clear audio, mostly single speaker environment or channels and a data library for the dialect and expressions used. So unless there exists a clear-as-a-bell recording with one speaker talking in a crystal clear voice, enunciating perfectly (preferably a Midwestern USA speaker), talking alone and very s-l-o-w-l-y, then speech recognition systems still need massive improvements.
The Step Between – SRT and Human Transcription?
Perhaps they will get there one day with pure or AI driven self-learning automated speech to text solutions. But in the meantime, a logical stepping stone is the solution that will improve SR systems while also coaxing businesses into the idea of an eventual shift from human to software – an adaptive, if not responsive, hybrid speech recognition solution.
We suggest a pairing of humans with developing speech recognition systems as such a pairing allows for the benefits of both: the lightning speed and massive volumes that an SR system can transcribe, with the brains and language expertise of humans. In such a hybrid solution, the humans are ‘quick checkers’ rather than transcribers, correcting the SR systems output and thus the output is bigger and faster than humans transcribing alone.