VLG Group
Group Meetings
Y. LeCun's website
CS at Courant
Courant Institute

DAVE: Autonomous Off-Road Vehicle Control
using End-to-End Learning

The purpose of the DAVE project was to be a proof-of-concept in preparation of the LAGR project ("Learning Applied to Ground Robots") sponsored by the US government. The success of the DAVE project contributed to the decision to launch the LAGR project, which started in December 2004.

We built a small off-road robot that uses an end-to-end learning system to avoid obstacles solely from visual input. The DAVE robot has two cameras with analog video transmitter. The video is transmitted to a remote computer that collects the data, runs the automatic driving system, and controls the robot through radio control.

A convolutional network takes the left and right camera images (YUV components) and is trained to directly predict the steering angle of a human driver. Several hours of binocular video were collected together with the steering angle of a human driver who navigated the robot around obstacles in a wide variety of outdoor settings. The convolutional net was trained end-to-end in supervised mode to map raw YUV image pairs to steering angles provided by the human driver. 1,500 short video sequences were collected. From these sequences, roughly 95,000 frames were selected for training, and 31,800 frames (from different sequences) for independent testing. Convergence of the learning procedure was obtained after 11 passes through the training set (which took approximately 4 days of CPU time on a 3.0GHz Xeon machine running Linux Red Hat 8).

The input to the network consisted of a stereo pair of images from the left and right cameras in YUV (luminance/chrominance) representation at a resolution of 149 by 58 pixels. The convolutional network had a total of 3.15 Million connexions, and 71,900 trainable parameters.

The entire software system was implemented in the LISP-like Lush language. Once trained, the driving system runs at roughly 10 frames per second on a 1.4GHz AMD Athlon processor.

The DAVE robot


: note: these videos are all compatible with mplayer, Kaffeine, and other media players on Linux.

The convolutional net plays backseat driver to a human operator. This video shows the internal state of the convolutional network.
[WMV 4.5MB]
A clip of the robot driving itself through a cluttered backyard (viewed from the robot's cameras)
[MPEG 11.0MB]
Same run a s above, viewed from the outside.
[MPEG 11.4MB]
The robot drives itself and avoids moving obstacles (legs!)
[WMV 2.8MB]
The robot drives itself through another cluttered backyard. It avoids a car, a backhoe, and finds a narrow space between a trailer and another obstacle.
[MPEG 9.4MB]
Avoiding the legs of a picnic table.
[MPEG 3.4MB]
Dealing with highly noisy images
[MPEG 5.0MB]
Avoiding a shrub, and going right toward the bright sun.
[MPEG 4.3MB]
A 180 is in order to avoid that tree, that fence, that bike, and that other tree.
[MPEG 5.1MB]
DAVE is startled by the sun at first, but it avoids the obstacle once the sum disappears behind it.
[MPEG 5.2MB]
DAVE turns right in time to avoid that white pole in the middle of the backyard.
[MPEG 9.2MB]
Avoiding fences and trees.
[MPEG 8.4MB]
Another one of those "busy backyard" sequences.
[MPEG 2.8MB]
Yet another one of those "busy backyard" sequences.
[MPEG 8.1MB]
Yet another clip from the same backyard, just to show that DAVE didn't succeed by chance.
[MPEG 6.7MB]

The picture below shows the left and right images, the steering angle predicted by the system, and the states of the various layers of the convolutional networks.

The data collection setup.

The pictures below show samples of input images, together with the steering angles produced by the human driver and by DAVE system.