The internet’s a weird place. We already knew that, yet it keeps finding new ways to amaze me.
Someone thought it would be a novel idea to incorporate Alexa with a novelty electronic fish (Big Mouth Billy Bass). Now you can ask a fish for the current weather and the fish can tell you if it’s a good day to catch its brethren.
I take that back. The world is a weird place. And I embrace it and want to leave my mark on it.
So, when given an opportunity to build something similar, a robotic version of Clippy was the only natural solution. People have re-fallen in love with Clippy.
Ya girl got business cards 😎📎â˜ï¸ pic.twitter.com/1nNBYvlZ1t
— Chloe Condon 🎀 (@ChloeCondon) February 21, 2019
However, one thing was missing. A proper, physical manifestation of our favorite sentient paperclip. This is the story of that journey.
RoboClippy Mark I
RoboClippy Mark I was cute, but definitively not a paperclip. I could get the eyebrows to wiggle, but only manually.
RoboClippy Mark II
Upon creating RoboClippy Mark II, I realized that the concept of “Uncanny Valley” applies to both humans and paperclips.
RoboClippy Mark III
In RoboClippy Mark III, I realized that animatronics done wrong is nightmare fuel.
But I made progress! The eyebrows articulate (not realistically), but it doesn’t look natural or have fine motor control.
Since I want to use motors to control LEGO, the LEGO Mindstorms sounded perfect! Alas, I encountered a number of issues interacting with their LEGO power supplies.
- LEGO Mindstorm motors use 9V; however, I wanted to power everything from USB which only uses 5V
- LEGO Mindstorm connectors have weird wiring and I wanted more standard cabling
There’s a whole field of hobby electronics and motors, so I went down that path. I soon learned that not all motors are created equal. Servo motors are good for broad movements where you apply voltage to tell the motor “turn left†or “turn rightâ€. Stepper motors are good for fine movements where you apply voltage to tell the motor “Go to position Xâ€.
After some experimentation, stepper motors became the natural fit. The catch is that they use a special protocol to control them called PWM. And if you want to control multiple motors, you want to use a different protocol (I2C). In my case, I wanted to control 3 motors (mouth, left eyebrow, right eyebrow). This was getting more complex than I expected, but I was learning a lot and excited.
RoboClippy Mark IV
RoboClippy Mark IV was a technological breakthrough. With the help of my good friends at Bricks and Minifigs, Plano we had a working prototype which looked realistic and could articulate it’s mouth.
Remember Uncanny Valley? Without the eyebrows, it looks … off.
So, how are we doing so far? We’ve got a great structure, the mouth articulates, and the eyebrows articulate! However, it’s lacking a “soulâ€. We want it to move the mouth when speaking, we want it to simulate Clippy’s voice and we want to use the eyebrows to emote.
Enter Azure Cognitive Services. There are many services it offers, and in this case I’ll be using Azure Speech to Text and Text to Speech so that I can give RoboClippy a voice and to listen to what people are saying.
Now for the next problem: Determining when RoboClippy is speaking. It seems intuitive to have RoboClippy’s mouth move when the audio is playing and stop when it’s complete, right? Uncanny Valley wins again. If you see someone’s mouth moving when they’re not speaking (e.g. at the end of a sentence), it doesn’t look right.
So, the next option is to calculate voltage off of the soundcard/speaker, right? Again, there’s more nuances to be discovered. Sound is a wave, so measuring at any point only gets you a snapshot. Also most microphones measure -2.5V to 2.5V and the Arduino can only detect 0 – 5V, so we’re missing half the data! A Step-up Converter fixed that problem, but added additional complexity.
RoboClippy Mark V
Now we’re at RoboClippy Mark V. I’m using an Arduino to measure the sound and control the servos. All of the communication happens on my laptop. It’s powered and controlled by the USB. Unfortunately, the results were really flakey and since it drew power from the laptop, there was a potential of the motors drawing too much current and frying it. It also took about 5-10 minutes to setup and get right each time. Major progress! But not very practical.
Inspired by the Big Mouth Billy Bass, I built a @LEGO #Clippy #robot using @Azure Huge thanks to @geekpatrol and Bricks and Minifigs, Plano pic.twitter.com/Fay08GUs4I
— Tommy Falgout (@lastcoolname) October 1, 2018
My local Makerspace had some Google AIY Voice Kits which I experimented with. It had a speaker, a microphone, and a cardboard case, all you needed was to supply your own Raspberry Pi. This was exactly the packaging I needed to contain RoboClippy’s brains.
The last step is making RoboClippy “talkâ€. But some interesting questions arose:
- How do I know when to start listening? Wait for a user to press a button? Not a great experience.
- How do I know when to stop listening? Again, not a great experience.
- What’s the quickest way to respond? Perform S2T & T2S locally? Use a service?
- How can I best utilize Azure? This is Clippy, so using MS products makes sense.
Thankfully, someone wrote an OSS library to solve many of these problems. I also learned something about Alexa/Cortana/OK Google that I wasn’t aware of. Keyword detection (aka Hotwork detection).
Anyone with an Amazon Dot/Echo is worried that Alexa/Amazon is always listening in on us. Keyword detection is training an AI model to “wake-up†and do something when it hears specific pitches/frequencies. You can even create you own keyword! Enter Snowboy, a service from Kitt.AI for making your own keyword. This allows RoboClippy to wake-up upon that specific pitch/frequency and then start “really” listening on the microphone. Thankfully, the same OSS library supported Snowboy so this was surprisingly easy to incorporate. You can even contribute to the “Hello Clippy” keyword.
RoboClippy Mark VI
Now, witness the power of this fully armed and operational RoboClippy.
- Our RoboClippy is now MUCH more extensible.
- Google hardware for microphone and speakers
- Microsoft Azure for Text2Speech, Speech2Text, Natural Language Processing
- I2C to PWM for motor controls
- RaspberryPi for orchestration
- Power + control (ssh + Python) can be done remotely
- 5 easy-to-connect wires (4 for I2C, 1 USB for power)
- Written in Python
- Available as Open Source
Build your own Robo-Clippy
To build your own, you will need:
- Raspberry Pi 3B+ – You might be able to use an older version, but I’m not sure how Snowboy will work since it can be resource intensive.
- LEGO – Exact parts TBD, but mostly: black 1-width bricks; black plates; grey 2-width bricks, grey plates.
- Azure Cognitive Services – T2S/S2T – You can sign up for a free account on Azure.
- Google AIY Voice Kit
- I2C PWM Driver – This might not be essential as the Google AIY Voice Kit allows you to connect multiple servos directly to the HAT. By the time I got to this stage, I had already integrated the board and I prefer to keep it this way because it makes cabling easier.
- Multi-color breadboard jumper wires – These are used for connecting the RPi HAT to the Servos.
- 3 Micro Servo Motor
- Snowboy – Hotword/keyword detection
- Python 3 – Glue code
In an upcoming blog, I will detail the steps necessary to create your own. If you can’t wait and want to start working on one now, feel free to email me at tommy at this domain.
If you’re interested in seeing my presentation on this story, you can view it here:
Special thanks:
- Special thanks to Jason and Andrea of Bricks and Minifigs, Plano who helped design LEGO Clippy
- Greg Miller who helped me understand the properties of sound via oscilloscope
- Nina Zakharenko who was the catalyst for this blog post
- Chloe Condon who is an even bigger Clippy fan than myself and helped fuel this social rebirth