It happens a few times a day: you come across something you’ve not seen before and your instinct is to pull out your phone and Google it. But you need to type a description before you get the answer you’re looking for and we don’t have time for that kind of tedium. Why Google things for yourself when you can build a pair of googly eyes to Google things for you? That’s what YouTuber Kevin McAleer did, and he even added a speaker so the eyes can tell you what they’re looking at.
We are choosing (as does Kevin) to overlook the fact that this project actually utilises OpenAI API rather than Google, because “Googling googly eyes” sounds way better than “OpenAI information-sourcing googly eyes.”
Hardware
- Raspberry Pi Zero 2 W
- Raspberry Pi Camera
- Pimoroni Inventor HAT Mini (plus two motors and two of these encoders)
- Mini oval speaker
You’ll also need to 3D print some parts, and Kevin has helpfully made the files for them available. Jump to this section of the project overview to grab STL files for each of the four parts you’ll need to build your own Googling googly eyes.
Assembly
The Inventor HAT sits directly on top of the Pi Zero, and the encoders connect to the HAT. There’s a speaker connector on the HAT to which you can connect the mini speaker. The 3D printed eyes need to be carefully superglued to allow for movement and avoid glue leakage onto the hardware. Kevin has laid out an eleven-step process for how all the hardware and 3D printed parts go together. Good luck.
It’s all in the eyes
The Pimoroni Inventor HAT Mini Python library gives the eyes the power of movement. The HAT is a most nifty piece of hardware that can drive motors, servos, and audio for Raspberry Pi, so definitely consider it next time you’ve a mechanical creation you want to bring to life.
Motors connected to the HAT physically rotate the eyes, and the encoders enable accurate positioning. The eyes swivel and appear to lock on to a subject, but it’s actually the third eye, a Raspberry Pi camera sat between the two swivelly eyes, that does the looking. The camera relays what it can see to the Pi Zero, which then uses OpenAI image capture and captioning software to describe it. Then the Pi tells the mini speaker to announce what the googly eyes are looking at using text-to-speech synthesis.
The software is pretty cool in that you can direct exactly how you’d like the scene to be described. You can select a particular tone of voice, or ascribe a certain personality, to make your Googling googly eyes unique to your tastes. I’d go for an Attenborough-like tone to elevate the everyday scenes my Googling eyes would happen upon: “Here, we see a scruffily dressed blogger advancing towards a poorly stocked fridge for the thirteenth time today in search of a Capri-Sun or some other such childlike beverage to quench their thirst”. That would do wonders for my productivity and self esteem.