Site icon Evil Genius

A.I.Deology 2: Speech Recognition on the Raspberry Pi

In our last post, we gave our pis a customizable voice using Espeak.

Now we continue our path to a Raspberry pi voice assistant by giving our pi the ability to listen to voice commands.

Gear

So what we’ll need now is An “Ear” for our pi to be able to listen to us. More specifically a usb microphone. There are many options out there for a mic but whatever you decide on, be sure to note its sample rate and chunk size as we’ll need them for our code later on.

My mic of choice is The Kinobo – usb 2.0 mini mic (Sample rate: 44100, chunk size: 512)

At about the size of a coin with an excellent price point and Pretty solid quality for it’s size, its a nice out of sight, out of mind listening device.

I also use The Logitech HD Pro Webcam C920 (Sample rate: 16000, chunk size: 128) especially for voice commanded projects involving computer vision.

More expensive and much bigger than the mini mic for sure but With two high quality mics built in with a curve to cover a better area, its certainly functions at a higher quality and range. Not to mention its nice camera.

Raspberry Pi Speech-to-Text Programming

As for a python library for speech-to-text(listening), I’ve tried jasper, and a few others but for some reason couldn’t get them to work or they were way too complicated for what i needed them to be. So i went for SpeechRecognition. Its very straight forward in its use without having to do Anything crazy just to install it. It is pain free and efficient.

First, be sure your pi is up to date:

$ sudo apt-get update
$ sudo apt-get -y upgrade
$ sudo apt-get -y dist-upgrade

Now for the Python Speech Recognition library:

$ pip install SpeechRecognition

Voice Log Example

To get a barebones start, this Voice Logger is a nice little example to help you understand speechRecognitions uses and custom options.

With it, You can adjust listening time for when and how it will stop listening, you can adjust its ambience which can help your program decide if you are in an indoor or outdoor setting. Its a good way to get a feel for the library (more on custom settings). Then the program saves your raw audio in a wav file to be played back later.

True Speech Recognition with Python

Now that we “get” the settings and can record our speech, we would apply a speech-to-text service to allow our pi to actually understand and interpret our speech.

This would open doors to things like a.i. chatbots and voice command.

SpeechRecognition can thankfully be seemlessly used as a front end for various free services such as:

Google speech (my personal default). It’s a fast and free way to get up and running and yields pleasantly surprising results for day to day use especially with a solid internet connection and mic. That said, the only downside to me is that it’s an online only service.

And PocketSphinx which conversely is horrible out of the box but if you can get it going, it can be used offline, and can even be trained for better results and includes a Voice Recognition feature meaning that it can be tailored to your specific voice.

And a few more notable services including wit.ai, and SnowBoy Hotword Detection which I’m quite curious about as it looks to be an efficient and trainable offline system.

Finding the Right Listener

Google Speech works perfectly as a default service to get you started but later on, you may be interested in experimenting with other options. This simple example will give you everything you need to get a solid feel for the library’s options. It transcribes your speech and prints what each particular recognition service thinks you said. Which is be handy for comparing services. And always remember that the quality of microphone works hand in hand with these options.

Voice Command Example

From here, we would merge our Espeak text-to-speech capabilities with our new SpeechRecognition Speech-to-text library to get us this! Our first VoiceComander!

I should note that the “time.sleep()” function is VERY important to preventing your pi from listening to itself as a command. Especially if you plan on using longer responses. Which is why its there right before it starts listening. Also it can be more practical to use print(“listening”) instead of say(“listening”) as your indicator or even a sound effect.

And with that, we now have ourselves our very own voice commanded raspberry pi to make our lives a little more intriguing. With it, one can easily mix and match this functionality to be applied to Servos for robotic projects, dc motors for talking cars, sensors, a.i. or anything else our wild imaginations can come up with.

Please like and share, comment if you have and questions and please consider a donation if you found this post useful. Your gratitude aids and abets my tech obsession and helps me get more gadgets to research so i can share more knowledge with the world. Thank you for finding me. Cheers! 🙂

Exit mobile version