Skip to ContentSkip to Navigation
Research Bernoulli Institute Autonomous Perceptive Systems Research

PhD project: Comparative study between deep learning and bag of visual words for animal recognition


Name: Emmanuel Okafor

Supervisors:
Prof. dr. L.R.B. (Lambert) Schomaker
Dr. M.A. (Marco) Wiering

Summary of PhD project:

The main objective of this project is to use both deep convolutional neural network (CNN) and classical feature descriptors for recognizing different animal images.

The steps involved in actualizing this objective are explained below

1. Development of Modified Versions of Deep Convolutional Neural Networks

The CNN creates feature maps by summing up the convolved grid of a vector-valued input to the kernel with a bank of filters to a given layer. Then a non-linear rectified linear unit (ReLU) is used for computing the activations of the convolved feature maps. The new feature map obtained from the ReLU is normalized using local response normalization (LRN). The output from the normalization is further computed with the use of a spatial pooling strategy (maximum or average pooling). Then, the use of dropout regularization scheme is used to initialize some unused weights to zero and this activity most often takes place within the fully connected layers before the classification layer. Finally the use of softmax activation function is used for classifying image labels within the fully connected layer. In this experiment, the modified versions of AlexNet and GoogleNet architectures are used and require the reduction in the number of neurons in some of the architectural layers. A schematic block diagram illustrating the modified AlexNet architecture is shown in Fig 1.

Fig 1: Block Diagram Illustrating the Classification of a Lion using AlexNet Architecture
Fig 1: Block Diagram Illustrating the Classification of a Lion using AlexNet Architecture

2. Development of Variants of Bag of Visual Words

The variants of bag of visual word (BOW) such as BOW and Histogram of Oriented Gradient (HOG-BOW) are studied in these experiments. The BOW setup involves the extraction of patches of features from an image and construction of a codebook using an unsupervised learning algorithm such as K-means clustering. Finally, the extraction of feature vectors by the BOW approach can be achieved using a soft assignment scheme or sparse ensemble learning methods. The output activation of the final feature vector from the training images and its corresponding labels are fed into a regularized Loss 2-Support Vector Machine (L2-SVM), with the view of generating a model that can be used to evaluate on the testing images within a given animal dataset. A pictorial illustration of the BOW setup is shown in Fig 2.

Fig 2: Block Diagram Illustrating the Bag of Visual Words (BOW) with an SVM Classifier on Wild-Anim Dataset
Fig 2: Block Diagram Illustrating the Bag of Visual Words (BOW) with an SVM Classifier on Wild-Anim Dataset
Last modified:26 January 2024 3.42 p.m.