Robust GPS-Agnostic Navigation - Visual Planning, Estimation and Control for MAVs

With all the talk of integrating drones into civilian airspace, there is need for better safety and GPS-agnostic navigation methods - visual navigation and obstacle avoidance is paramount to integrating drones (micro-aerial-vehicles, MAVs) into our cities and elsewhere, because external navigation aids cannot be relied on in every situation, and neither can pilot experience.

Visual navigation is the solution to these challenges, and we present an aerial robot designed from scratch and ground up to meet these requirements. We demonstrate several facets of the system, including visual-inertial SLAM for state estimation, dense realtime volumetric mapping, obstacle avoidance and continuous path-planning.

In search of better extensibility, and better fitness for the research goals I had pitted myself towards, I started working as a part of the open-source PX4 Autopilot development community in 2013. Aware of the limitations in the field, I started Artemis, my research project on visual navigation for aerial vehicles in 2014. Today, I'm happy to present our intermediary (and satisfying!) results.

At its very core, Project Artemis is a research project which aims to provide improved navigation solutions for UAVs.

All of the Artemis MAVs follow the same basic, distributed  system architecture. There is a high-level onboard computer, and a low-level embedded flight controller, typically a PX4 autopilot board, or similar derivative. The middleware of choice on the high-level companion computer is ROS (Robot Operating System) and the PX4 Middleware on the deeply embedded controller.


Visual Navigation

Multiple cameras provide proprioceptive information about the environment, used for mapping and localisation. Forward stereo cameras are used to compute depth images in realtime.


All cameras are synchronised in time with respect to each other, and to the IMU (Inertial Measurement Unit) of the flight controller. Depth images are inserted into a volumetric mapping framework based on an Octree representation, and a 3D map of the environment is built incrementally onboard the vehicle.

We also use a SLAM (Simultaneous Localisation and Mapping) technique on our robot. The system continuously constructs a sparse map of the environment which is optimised in the background. Visual SLAM is globally consistent, and centimetre-level accurate unlike GPS, and works both indoors and outdoors. Tight fusion with time-synchronised inertial measurements greatly increases robustness and accuracy.


State Estimation


The system is designed to navigate using all available sensors in the environment, which includes both GPS and vision outdoors and pure vision indoors. Since sensor availability is not guaranteed, a modular sensor fusion approach using a hybrid Kalman filter with fault detection is used to maintain a robust state estimate. Motivation to use all the information from all the sensors is that even if a particular subset or module were to fail, the overall system performance would not be compromised.

Obstacle Avoidance

The global volumetric map is used to continuously compute a collision-free trajectory for the vehicle. In operator-assist mode, the motion planner only intervenes if the operator’s high-level position commands could lead to a possible collision. In autonomous modes, the planner computes optimal trajectories based on a next-best-view analysis in order to optimise 3D reconstruction. The planner sends its commands to the minimum-snap trajectory controller running on the low-level flight controller, which computes motor outputs.

It is important to point out that this can be achieved *today* with open-source systems, albeit with some perseverance and experience. Better documentation on how to achieve a relatively close reproduction of our results is underway. It will be made available soon via the UASys website ( and the PX4 Autopilot developer wiki (

Our open-sourced portions of the software stack are available here :

I will also be presenting a talk on Project Artemis and our software stack at Embedded Linux Conference at San Diego, CA. Please attend if you'd like to get an in-depth view into the system's workings! The presentation will aim to accelerate the introduction to the current state of the aerial vehicle market, and the several limitations that it faces due to limited technological breakthroughs in terms of consumer systems. Newcomers and existing developers / system integrators will get a chance to understand these limitations, and what embedded Linux systems can do for the field, including but not limited to visual (GPS-denied) navigation, mapping, obstacle avoidance and high-definition video streaming. The talk also aims to encourage the current open-source development communities, and how they can contribute better to improving the current state-of-the-art, be it with cross-middleware interoperability, modular and reusable software design or inexpensive and extensible hardware design.

Slides are available here :

Learn more about my session at and register to attend at

Stay updated! -
Wesite :
GitHub :
Instagram :
Twitter :
Facebook :



Views: 4362

Comment by vorney thomas on March 31, 2016 at 4:24pm
Wow!good job.I really like issues related to vision navigation. And your works is worth to follow to dig deeply.

Comment by Moderator_Bo on March 31, 2016 at 5:49pm

Amazing, even the diy stereo-camera. I'm just wondering, could a person use something like a minoru webcamera to achieve stereo depth too. as they are relatively cheap devices.

thanks, great job

Comment by Kabir on March 31, 2016 at 5:58pm

Thanks Vorney!

Comment by Kabir on March 31, 2016 at 6:15pm

Bo, Thanks!

Yeah, the DIY stereo rig took quite some time to get perfected. Its a multi-fold problem, and needed quite some work:

1. The pose transform between the cameras need to be estimated absolutely perfectly for reconstruction, otherwise the dense depth estimation cannot be maintained. I used 2 approaches here - First, the rig itself went redesigns till I finally decided on carbon fiber, which gives that absolute rigidity. Secondly, The relative transforms between the IMU/Cameras are estimated adaptively and online, so the system recalibrates itself continuously in-air.

2. Global shutter cameras - I'm fairly sure that the Minoru doesn't have global shutter sensors, and the visual-inertial estimator needs perfect image frames to estimate the pose, otherwise the estimation breaks down from image artifacts like rolling shutter.

3. Baseline - For effective sense-and-avoid, the stereo camera system needs to have a sufficient range which isn't possible because of the tiny baseline on the Minoru. Our rig has an optimally calculated baseline of 15 cm for the camera sensor size used and the lenses.

4. IMU sync - The cameras on Artemis are triggered from the PX4 IMU, in order to get tightly synchronised inertial measurements and image frames, which the SLAM system needs to properly integrate the priors into the map. A large percentage of commercial stereo systems sync intra-camera, but cannot take an external trigger signal, which limits the usefulness.

That said, there are (expensive!) commercial offerings which can achieve close-ish results like the ZED camera (but no IMU sync possible, so that kills a a load of estimation performance) or these cameras : 

Making your own stereo system is fairly easy once you have seen someone do it, and respected the tight building constraints (rigidity, etc) and is inexpensive (Around $300 total if you can get surplus machine vision cameras like the Point Grey FireFly MV). For $300, you can have your own IMU-synced stereo rig going with a Pixhawk.

Comment by Moderator_Bo on March 31, 2016 at 6:39pm

Whoa, I'm really interested now! How do you combine IMU sync with image capture? scripting? Did you use RGBDSLAM code at all? I'm hoping to try imu integration with a Jetson. Is the IMU sync done post-process, or in real-time?

Comment by Kabir on March 31, 2016 at 7:21pm
The cameras used can be triggered to start light integration with a external trigger pulse. The entire SLAM system runs onboard in realtime, so yes it's all done realtime. There's zero offboard/offline processing here.

A precision timesync between the onboard computer and PX4 system allow us to match image frames and IMU data. Drops can also be detected inside the sync driver, so it's robust.
I pushed all my timesync/trigger/IMU sync code to PX4 native stack upstream, so you'll have the baseline framework ready if you're willing t use PX4.
I also provide a reference implementation of the camera-IMU sync driver here :
Comment by benbojangles on March 31, 2016 at 8:32pm

Well that's amazing work, please tell me you did this on your own or group work?

Comment by Kabir on March 31, 2016 at 9:01pm
Haha, yes I did this alone :)
Comment by Patrick Poirier on April 1, 2016 at 7:58am

Great work, nice integration with mavros: triggerClient_ = n_.serviceClient<mavros_msgs::CommandTriggerControl>("/mavros/cmd/trigger_control");

Question Does using  UDP messaging function is considered as stable and precise triggering mecanism   ?

Comment by vorney thomas on April 1, 2016 at 9:37am

where could i find the published paper described your great work? or could you post the name of code library integrated or used in your workflow?


You need to be a member of DIY Drones to add comments!

Join DIY Drones

Season Two of the Trust Time Trial (T3) Contest 
A list of all T3 contests is here. The current round, the Vertical Horizontal one, is here

© 2020   Created by Chris Anderson.   Powered by

Badges  |  Report an Issue  |  Terms of Service