3D Robotics

Just a few years ago, this was PhD-level stuff at Stanford, and now you can do at home for $100. These are awesome times for robotics. From Lukas Biewald at O'Reilly:

Object recognition is one of the most exciting areas in machine learning right now. Computers have been able to recognize objects like faces or cats reliably for quite a while, but recognizing arbitrary objects within a larger image has been the Holy Grail of artificial intelligence. Maybe the real surprise is that human brains recognize objects so well. We effortlessly convert photons bouncing off objects at slightly different frequencies into a spectacularly rich set of information about the world around us. Machine learning still struggles with these simple tasks, but in the past few years, it’s gotten much better.

Deep learning and a large public training data set called ImageNet has made an impressive amount of progress toward object recognition.TensorFlow is a well-known framework that makes it very easy to implement deep learning algorithms on a variety of architectures. TensorFlow is especially good at taking advantage of GPUs, which in turn are also very good at running deep learning algorithms.

Building my robot

I wanted to build a robot that could recognize objects. Years of experience building computer programs and doing test-driven development have turned me into a menace working on physical projects. In the real world, testing your buggy device can burn down your house, or at least fry your motor and force you to wait a couple of days for replacement parts to arrive.

Architecture of the object-recognizing robotFigure 1. Architecture of the object-recognizing robot. Image courtesy of Lukas Biewald.

The new third generation Raspberry Pi is perfect for this kind of project. It costs $36 on Amazon.com and has WiFi, a quad core CPU, and a gigabyte of RAM. A $6 microSD card can load Raspberian, which is basically Debian. See Figure 1 for an overview of how all the components worked together, and see Figure 2 for a photo of the Pi.

Raspberry PiFigure 2. Raspberry Pi running in my garage. Image courtesy of Lukas Biewald.

I love the cheap robot chassis that Sain Smart makes for around $11. The chassis turns by spinning the wheels at different speeds, which works surprisingly well (see Figure 3).

Robot chassisFigure 3. Robot chassis. Image courtesy of Lukas Biewald.

The one place I spent more money when cheaper options were available is the Adafruit motor hat (see Figure 4). The DC motors run at a higher current than the Raspberry Pi can provide, so a separate controller is necessary, and the Adafruit motor hat is super convenient. Using the motor hat required a tiny bit of soldering, but the hardware is extremely forgiving, and Adafruit provides a nice library and tutorial to control the motors over i2C. Initially, I used cheaper motor controllers, but I accidentally fried my Pi, so I decided to order a better quality replacement.

Raspberry Pi with motor hat and cameraFigure 4. Raspberry Pi with motor hat and camera. Image courtesy of Lukas Biewald.

$15 camera attaches right into the Raspberry Pi and provides a real-time video feed I can use to recognize objects. There are tons of awesome cameras available. I like the infrared cameras that offer night vision.

The Raspberry Pi needs about 2 amps of current, but 3 amps is safer with the speaker we’re going to plug into it. iPhone battery chargers work awesomely for this task. Small chargers don’t actually output enough amps and can cause problems, but the Lumsing power bank works great and costs $18.

A couple of HC-SR04 sonar sensors help the robot avoid crashing into things—you can buy five for $11.

I added the cheapest USB speakers I could find, and used a bunch of zip ties, hot glue, and foam board to keep everything together. As an added bonus, I cut up some of the packaging materials the electronics came with and drew on them to give the robots some personality. I should note here that I actually built two robots (see Figure 5) because I was experimenting with different chassis, cameras, sonar placement, software, and so forth, and ended up buying enough parts for two versions.

My 4WD robot and her 2WD older brotherFigure 5. My 4WD robot (right) and his 2WD older sister. Image courtesy of Lukas Biewald.

Once the robot is assembled, it’s time to make it smart. There are a milliontutorials for getting started with a Raspberry Pi online. If you’ve used Linux, everything should be very familiar.

For streaming the camera, the RPi Cam Web interface works great. It’s super configurable and by default puts the latest image from the camera in a RAM disk at /dev/shm/mjpeg/cam.jpg.

If you want to stream the camera data to a webpage (very useful for debugging), you can install Nginx, an extremely fast open source webserver/proxy. I configured Nginx to pass requests for the camera image directly to the file location and everything else to my webserver.

http {    server {       location / {             proxy_pass http://unix:/home/pi/drive.sock;          }             location /cam.jpg {                 root /dev/shm/mjpeg;          }    } } 

I then built a simple Python webserver to spin the wheels of the robot based on keyboard commands that made for a nifty remote control car.

As a side note, it’s fun to play with the sonar and the driving system to build a car that can maneuver around obstacles.

Programming my robot

Finally, it’s time to install TensorFlow. There are a couple of ways to do the installation, but TensorFlow actually comes with a makefile that lets you build it right on the system. The steps take a few hours and have quite a few dependencies, but they worked great for me.

TensorFlow comes with a prebuilt model called “inception” that performs object recognition. You can follow the tutorial to get it running.

Running tensorflow/contrib/pi_examples/label_image/gen/bin/label_image on an image from the camera will output the top five guesses. The model works surprisingly well on a wide range of inputs, but it’s clearly missing an accurate “prior,” or a sense of what things it’s likely to see, and there are quite a lot of objects missing from the training data. For example, it consistently recognizes my laptop, even at funny angles, but if I point it at my basket of loose wires it consistently decides that it’s looking at a toaster. If the camera is blocked and it gets a dark or blurry image it usually decides that it’s looking at nematodes—clearly an artifact of the data it was trained on.

Robot plugged inFigure 6. Robot plugged into my keyboard and monitor. Image courtesy of Lukas Biewald.

Finally, I connected the output to the Flite open source software package that does text to speech, so the robot can tell everyone what it’s seeing (see Figure 6).

Testing my robot

Here are my two homemade robots running deep learning to do object recognition.

Final thoughts

From 2003 to 2005, I worked in the Stanford Robotics lab, where the robots cost hundreds of thousands of dollars and couldn’t perform object recognition nearly as well as my robots. I’m excited to put this software on my drone and never have to look for my keys again.

I’d also like to acknowledge all the people that helped with this fun project. My neighbors, Chris Van Dyke and Shruti Gandhi, helped give the robot a friendly personality. My friend, Ed McCullough, dramatically improved the hardware design and taught me the value of hot glue and foam board. Pete Warden, who works at Google, helped get TensorFlow compiling properly on the Raspberry Pi and provided amazing customer support.

Article image: Eye of Providence. (source: Bureau of Engraving and Printing on Wikimedia Commons).
E-mail me when people leave their comments –

You need to be a member of diydrones to add comments!

Join diydrones


  • I spent a week-end making it work on my Odroid XU4 : It take around 3 seconds to recognize an usual object.

    I had better speed with DeepBelief , but still it is lacking power for real time drone stuff. Hopefully Intel will release the newly acquired  Movidious's Myriad VPU (The same chip that's inside the DJI MAVIC) and then we might get interesting stuff going on in the DIY world.

  • Sounds like tensorflow was another Goog project that went nowhere, so they open sourced it.  The image recognition is a demo in the tensorflow distribution.  Another triumph for Inc Magazine's top 30 under 30.

  • I am afraid sonar has nothing to do with object identification (pattern recognition, shape, volume recognition).

    Sonar is a single point distance meter.

    To recognize remote 3D objects all you need is 3D scanner to generate depth maps to extract patterns, objects.

    Ok, a single camera can act and serve as a 3D scanner is freely moved in 3-axis (3D gimbal).

    Since you don't get gimbal, all you need is twin camera to generate depth maps on-the-go (live) and have GPU to postprocess pattern recognition to extract objects.

    To recognize objects you need smart digital library of 3D objects to be recognized showing topological features and colors, size, volume.

    30 years ago Stanford was hot for AI Journal, AI Conferences.

    Today, fully rebotized assembly lines in Asia, Pacific region ( Japan, China ..)

    manufacture millions of products daily  and 3D camera vision systems, object, pattern recognition is so common today.

    30 years ago NIH funded development of medical image processing, pattern recognition, so CT, USG units come delivered with such computer vision technology already embedded.

    If you still have problms to understand how triangulation works in 3D single camera vision, just close your left or right eye

    and watch the world around (in close proximity first).

    Og, no 3D vision if your head is still.

    Just start swinging your head to left and right and your smart mind begins to generate depth maps, recognize objects, extract objects from 2D still image.

    $100 Kinect can do that job either.

    There is no need to reinvent 3D vision, object recognition already in operation for 30 years.

    My $100 robot vaccum cleaner exactly comes with 3 pairs of sonars, 360- vision camera, 3 IR level sensors, IR sender/ receiver to track charging station and features sophisticated algorithms to go back and forth visiting bottle-neck like spaces, finally finding charging station with 99% success rate.

    What is described is $1000 materials cost project requiring 3-6 months of hard mental efforts to work.

    At the same time you can just put twin-camera smartphone into your 4WD, run 3D vision app and have obstacles identified monitored via Bluetooth to 4WD Pilot directly to avoid them.

    Twin camera smartphone is really hot and works fine.

    As said above, twin camera can be replaced by a single camera swinging sideways to let triangulation based 3D vision algorithm to work.

    Great future awaits developers of autonomous model cars.

This reply was deleted.