Speech Recognition 2021 – Getting Real Accuracy?
Has Machine Speech Recognition Achieved Human Parity?
According to some, not a chance.
First, speech recognition (SR) systems are still nowhere near reaching human parity at this time. Remember, most are still in the development stage, mainly laboratory tested, and still completely inadequate at dealing with REAL recordings: with background noises, recording interferences, multiple speakers, pronunciation issues, stuttering, mixed broad accents, and even daily-changing jargon and slang. Speech Recognition and Speech To Text (STT) services and businesses out there are realizing their models and service promises are still too early to provide transcriptions at the level of accuracy broadly provided by human transcribers.
Accuracy and Trust Are Key Challenges
Most automatic transcription services acknowledge that their accuracy ranges vastly between 70% – 90% accuracy at best. This remains at least 9% less than the standard level of accuracy expected by a professional human transcription service, such as Waywithwords.net, where an accuracy of 99%+ is expected. Even if machine transcription achieves 95% accuracy, it still fails to achieve a point of “trust” by clients who require transcription for the record. The common approach now by machine transcription vendors is to offer a “polished” transcript (essentially still using humans).
Perhaps Automatic Speech Recognition will achieve human parity?
We can never say no. With the exponential growth of artificial intelligence solutions, the time will come where speech recognition achieves parity with humans. For now, the main challenge for automatic SR technologies is their dependency on file quality and speaker environment. Unless there is a fairly clear recording with speakers enunciating words correctly, then there is still a way to go. On the other hand, the speed of service for real-time transcription is also a key development for STT. Even if automatic transcription providers claim almost live-time transcription, human and hybrid transcription providers still provide options based on an express to overnight transcription solution.
The Step Between – SRT and Human Transcription?
It seems that one day there will be pure or AI-driven self-learning automated speech to text solutions. For now, though, some Speech To Text services choose to combine machine transcription/speech recognition with humans to offer “polished” transcripts. This approach provides an eventual shift from human to software by adaptive means, i.e. a hybrid speech recognition service. Therefore the challenge today seems to be more about finding a balance between both processes. Add to this, the pressure on technology companies to now manage “people” required to sustain essentially automated solutions.
At nibity.com, we support the approach to pair humans with an appropriate speech recognition system. Pairing people with automated machines will provide benefits such as allowing the service to process high volumes of audio or video at lightning speed. On the other hand, using humans can also ensure the continued value and trust by clients of the service in the product – the transcript.