Getting the Raspberry Pi to Talk and Listen
This post is all about giving the Raspberry Pi Ears and a voice.
You see, I don’t just want a regular computer I want something more interesting than that. I wanted my computer to listen to my voice and respond with it’s own. And I want it to have at least limited “emotions” for which to express with it’s voice.
Ok! The bare bones setup for what we would need:
Raspberry pi: of course. I’ve tested this on the 3b,3b+ and the zero(w) so I’m sure it’ll work with any pi.
the anything goes computer makes a perfect brain. If you’ve ever thought about, there’s probably a way to manifest your crazy ideas through one of these.
The Sound is superb!(perfect for loud music) though not as portable as I’d like it to be, I’ll make that sacrifice for this one 🙂
My first pick for a good ear. so small it’s almost invisible. Pretty good quality for it’s size too. Sample rate:44100. An out of sight, out of mind listening device.
but my favorite webcam has two excellent mics built in curving to cover a better area and if your ever looking into computer vision, this is all you’ll ever need for your projects. Sample rate of 16000
You dont have to use these two mics specifically, but make sure you get the sample rate of the mic you will actually use as it comes in handy later on with coding.
Giving the pi a voice with Espeak
So easpeak’s probably the quickest, cleanest way to grant your Raspberry Pi a voice. I love it for it’s straight forward usability and it’s hefty customization options like accent, talk speed and voice pitch and even allows for the use of the (fabled?) Mbrola voices.
Pretty easy to set up and here’s a nice lil’ script from one of their examples to give you everything you need to know on using it.
Raspberry Pi Speech Recognition
As for listening, SpeechRecognition is probably the best voice recognition library I think. Tried jasper, and a few others but for some reason couldn’t get them to work or they were way too complicated for what they were, at least for me.
SpeechRecognition is awesome for its options. As it easily connects to various speech recognition services for seemless use in your programming.
My favorite options are:
Google speech for simplicity and pretty decent results for casual use but you have to be online to use it.
Sphinx which conversely is horrible out of the box but can be used offline AND can even be trained for better results and can be taylored to your voice though the process is a little involved, shall we say.
Here’s a nice little example to help you understand and build from.
Total voice command
Merge the two, add some pre programmed knowledge names etc. and from there on, you just write some simple programs or import existing ones to have your computer listen to your commands. Build it little by little and you eventually get a completely voice interactive personal computer.
Mine looks kinda like this.
And here’s its GitHub reflection.
Emotions: A Touch of A.I.
I still wanted to take my system further by adding “emotions” and a certain degree of independent thought.
What I did for a basic emotional system was add 2 files to it’s memory: one that stores positive words and one for negative. Each time it recognizes a positive word, it’s “mood” goes up a point changing the way it speaks, talk speed, voice tone etc. And goes down when negative altering it’s qualities in that direction.
This would open up the possibility of it’s responses being reflective of how the machine itself feels. An upgrade that makes the system more interesting by granting it a small heart. But to me, it was still a cold machine.
By now our systems got ears and a voice. But how about we take it even FURTHER and see if we can plug a brain or even a “soul” into this thing!?
Can I get my Raspberry Pi to learn from experience? Can I get it to decide for ITSELF what is positive or negative and effectively apply this to my voice command system?
Then I remembered!
A few years back I ran into a “simple” chatbot script by Jake Speiran that grows it’s vocabulary and improves it’s vernacular with pure experience it was pretty fun before but by now, I’m just about a good enough coder to be able to merge it with other scripts to give my programs a third dimension of experiential language learning.
Slap on espeak and Speech recognition and I finally have what I need to complete a pseudo a.i. that not only functions as a voice commanded computer but also develops it’s own personality based on it’s social interactions.
Note how accurate it is even with the loud music playing in the background which also explains any delay in response. It responds faster when it’s quiet on account of the amount of noise it has to differentiate.
It starts off with nothing then slowly starts trying to apply what it learns in a sentence. Sometimes the things it says can really surprise you and if you talk to it enough without it talking to anyone else, it starts acting almost as a reflection of your subconscious mind! It’s so crazy!
I ran with the idea and modified the program to randomly generate voice traits and it’s own name from “birth” so that each a.i. is unique from it’s vernacular to it’s voice..imagine everybody having one..
You have everything you need to put that system together in the links I provided but, if you can’t seem to work it out yourself or if you just want a ready made script to back engineer, you can have the code i use for a small donation(I worked hard on it!)
Mine can currently connect to all my Raspberry pi projects and act with the specified configuration in mind as long as I tell it what “body” it’s in:”you’re in the robot”,”you’re in the car” etc. I can use the same sd card for all my projects and use the program almost as an operating system with options for online/offline recognition and voice/manual input.
And with this you now have a fully functioning voice commanded Raspberry pi. From here you can expand and personalize your system to include pretty much anything your pi can do such as servos for voice commanded robots or sensors to expand your computers awareness
Which can better open up the possibility of things like interactive A.I. or environmental safety.
Hope you learned something 🙂
Please show your support by commenting and sharing.