If you’d like to develop your Convolutional Neural Networks using just the Compute Library and a Raspberry Pi, this step-by-step guide will show you how… and it comes complete with all the tools you’ll need to get up and running.
If you follow all the steps outlined here indeed, by the end of the post you’ll be up and running with one of the first Deep Convolutional Neural Networks (CNN) designed to recognize 1000 different objects: AlexNet!
If you haven’t read my previous blog on how to apply a cartoon effect with the Compute Library, I’d suggest starting with that. It’s a simple example, but it will give you all the information you need to compile or cross-compile the library for Raspberry Pi.
In addition to some basic knowledge of the Compute Library, this tutorial assumes some knowledge of a CNN; you don’t need to be an expert, just have an idea of the main functions.
Everything else can be found in the following .7z file, which contains:
Please download the required files to your host machine (Debian based) or to your Raspberry Pi:
[CTAToken URL = "https://developer.arm.com/-/media/43359E999DEF433BAF63523C529D21AD.ashx?revision=c1a232fa-f328-451f-9bd6-250b83511e01" target="_blank" text="compute_library_alexnet files" class ="green"]
Within the folder "alexnet_tutorial" you should have everything for this tutorial.
The requirements for your Raspberry Pi and host machine are:
In release 17.09 of the Compute Library, we introduced an important feature to make life easier for developers, and anyone else benchmarking the library: the graph API.
The graph API’s primary function is to reduce the boilerplate code, but it can also reduce errors in your code and improve its readability. It’s simple and easy-to-use, with a stream interface that’s designed to be similar to other C++ objects.
At the current stage, the graph API only supports the ML functions (i.e. convolution, fully connected, activation, pooling...) and can only be used if the library has been compiled with both NEON and OpenCL enabled (neon=1 and opencl=1).
Note: if your platform doesn't have OpenCL don't worry (i.e. Raspberry Pi), the Graph API will automatically fall back onto using NEON, however you do need to compile the Compute Library with both NEON and OpenCL enabled.
In terms of building blocks, the graph API represents the third computation block, together with core and runtime. In terms of hierarchy, the graph API lies just above the runtime, which in turn lies above the core block.
In 2012, AlexNet shot to fame when it won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), an annual challenge that aims to evaluate algorithms for object detection and image classification.
The ILSVRC evaluates the success of image classification solutions is using two important metrics: the top-5 and top-1 errors. Given a set of N images (usually called “test images”) and mapped a target class for each one:
For both, the top error is calculated as, "the number of times the predicted class does not match the target class, divided by the total number of test images". In other words, a lower score is better.
The authors of AlexNet – Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever of the SuperVision group – achieved a top-5 error around 16%, which was a staggeringly good result back in 2012. To put it into context, until that year no one had been able to go under 20%. AlexNet was also more than 10% better than the runner up.
After 2012, more accurate and deeper CNNs began to proliferate, as the graph below shows
AlexNet is made up of eight trainable layers: five convolution layers and three fully connected layers. All the trainable layers are followed by a ReLu activation function, except for the last fully connected layer, where the Softmax function is used.
Besides the trainable layers, the network also has:
If you look at the table above, you’ll notice that some convolution layers are actually ‘grouping convolutions’ – an efficient engineering trick that allows the acceleration of the network over two GPUs, without sacrificing accuracy.
If the group size is set to two, the first half of the filters will be connected just to the first half of the input feature maps; the second half will connect to the second half.
The grouping convolution not only allows you to spread the workload over multiple GPUs, it also reduces the number of MACs needed for the layer by half
A C++ implementation of AlexNet using the graph API is proposed in examples/graph_alexnet.cpp.
To run the AlexNet example we need four command line arguments:
./graph_alexnet <target> <cnn_data> <input_image> <labels>
Where:
With the following sections I am going to describe the key aspects of this example.
In order to use the graph API we need to include three header files:
// Contains the definitions for the graph #include "arm_compute/graph/Graph.h" // Contains the definitions for the nodes (convolution, pooling, fully connected) #include "arm_compute/graph/Nodes.h" // Contains the utility functions for the graph such as the accessors for the input, trainable and output nodes. The accessors will be presented when we are going to talk about the graph. #include "utils/GraphUtils.h"
A pre-processing stage is needed for preparing the input RGB image before feeding the network, so we’re going to subtract the channel means from each individual colour channel. This operation will centre the red, green and blue channels around the origin.
For simplicity, we’ve already hard-coded the mean values to use in the example:
constexpr float mean_r = 122.68f; /* Mean value to subtract from red channel */ constexpr float mean_g = 116.67f; /* Mean value to subtract from green channel */ constexpr float mean_b = 104.01f; /* Mean value to subtract from blue channel */
If you’ve not heard of mean subtraction pre-processing before, have a look at the Compute Image Mean section on the Caffe website
The body of the network is described through the graph API.
The graph consists of three main parts:
As you will notice from the example, the Tensor objects (input and output) and all the trainable layers accept an input function called "accessor".
If you are curious to know how the accessor works, take a look at utils GraphUtils.h where you can find a few ready-to-use accessors for your Tensor objects and trainable layers.
Now it is time to turn on your Raspberry Pi and test AlexNet with the same images.
Note: the following steps assume you are in the home directory of your Raspberry Pi or host machine.
On your Raspberry Pi enter the following commands
# Install unzip sudo apt-get install unzip # Download the zip file with the AlexNet model, input images and labels wget <url to archive> # Create a new folder mkdir assets_alexnet # Unzip unzip compute_library_alexnet.zip -d assets_alexnet
If you are compiling natively on your Raspberry Pi, use the following instructions. If you’re cross-compiling, see the appropriate section below.
On your Raspberry Pi:
# Clone Compute Library git clone https://github.com/Arm-software/ComputeLibrary.git # Enter ComputeLibrary folder cd ComputeLibrary # Native build the library and the examples scons Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 build=native –j2
Once the library has been compiled where are ready to classify our go-kart!
export LD_LIBRARY_PATH=build/ PATH_ASSETS=../assets_alexnet ./build/examples/graph_alexnet 0 $PATH_ASSETS $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt
If you’re cross-compiling, on your host machine:
# Clone Compute Library git clone https://github.com/Arm-software/ComputeLibrary.git # Enter ComputeLibrary folder cd ComputeLibrary # Build the library and the examples scons Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 os=linux arch=armv7a -j4 # Copy the example and dynamic libraries on the Raspberry Pi scp build/example/graph_alexnet build/libarm_compute.so build/libarm_compute_core.so build/libarm_compute_graph.so <username_raspberrypi>@<ip_addr_raspberrypi>:Desktop
where:
Open the SSH session from your host machine:
ssh <username_raspberrypi>@<ip_addr_raspberrypi>
Within the SSH session:
cd Desktop export LD_LIBRARY_PATH=build/ PATH_ASSETS=../assets_alexnet ./build/examples/graph_alexnet 0 $PATH_ASSETS $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt
Whether or not you’re building the library natively, the output should look like this:
Congratulations – you got there! I hope you had fun and, more importantly, I hope this will help you to develop even more exciting and performant intelligent vision solutions on Arm.
Ciao for now!
Gian Marco
To find this tutorial, and many other resources, visit the Machine Learning Developer Community.
I don't know whether you solve the problem. I got a same problem. But when I add "--" in front of each parameter then I can see the prediction result. Something like this { data=$PATH_ASSETS --> --data=$PATH_ASSETS }
help help help!
when i test, my test is passed, but i don't get top 5 predictions. what's the matter????
./build/examples/graph_alexnet data=$PATH_ASSETS image= $PATH_ASSETS/go_kart.ppm labels $PATH_ASSETS/labels.txt
./build/examples/graph_alexnet
Threads : 1Target : NEONData type : F32Data layout : NHWCTuner enabled? : falseTuner file : Fast math enabled? : false
Test passed
Using today version of ACL the command line looks a bit different
./build/examples/graph_alexnet --data=$PATH_ASSETS --image=$PATH_ASSETS/go_kart.ppm --labels=$PATH_ASSETS/labels.txt
Like ArthurLiu above, I had the same problem when compiling 18.05 with both Neon and OpenCL (my platform doesn't have OpenCL). I get the same error when executing graph_alexnet. If I compile with only Neon, first, the graphs are correctly generated and I can execute graph_alexnet. I start to believe that your initial introduction of the post when you explained that compilation needs to be done with both Neon and OpenCL (even if you don't have OpenCL) to get and make work the graphs is perhaps not any more true with 18.05.
What is the speed of image classification with alexnet on rpi + compute library?