Exploring the Usefulness of Adding Auxiliary Preprocessed Image Layers With Convolutional Neural Networks

Jordan Goetze
Dr. Anne Denton

Dept. of Computer Science
North Dakota State University
Fargo, North Dakota 58103

jordan.goetze@ndsu.edu
anne.denton@ndsu.edu

Terminology

Per-pixel image classifications

Useful for:

Scene labeling
Inferring relationships in an image.

Terminology

Land-Use Classification

Potential uses:

Approximating crop yields by year
Tracking changes in land use
Tracking changes in forestry and vegetation

Terminology

Orthoimagery: "Fixes" various displacements such as building tilt and scale variations caused by terrain relief.

Terminology

Convolutional Neural Network (CNN)

Useful for:

Scene labeling
Image classification

Terminology

Auxiliary Preprocessed Image Layer

Example: Normalized Difference Vegetation Index(NDVI)

$ NDVI = \frac{NIR-RED}{NIR+RED} $

Introduction

Introduction: What do CNNs learn?

Are there certain kinds of patterns they cannot learn, or learn too slowly to be effective?

Does the inclusion of image layers that are generated from the original source image help or hurt the model?

Introduction: Technology, Imagery and Data

Cost of satellites and drones is decreasing.
Cost of orthoimagery decreasing
Amount of availiable orthoimagry increasing
Amount and quality of labeled or annotated orthoimagery has not kept pace

Introduction: Availiable Data

National Agricultural Imagery Program (NAIP):

Provides imagery spanning the majority of the continental United States
1 meter Ground Sample Distance (GSD)
Red, Green, Blue, and NIR image layers

Introduction: Availiable Data

National Agricultural Statistics Service (NASS) Land-Use Classifications:

Continental United States
Low resolution accuracy compared to NAIP imagery.
- 1 NASS pixel represents a 50 square meter area in the NAIP Imagery
Poor quality classifications

Introduction: NASS - Mislabeled Pixels

Introduction: NASS - Clipped Organic Features

Introduction: NASS - Poor Representation of Fine Features

Introduction: Goal

Test whether the inclusion of several different auxilary image layers, generated from the base RGB, NIR, and NDVI layers, serves to help or hurt the classification accuracy and qualitative quality of the model's classifications.

Previous Work

Previous Work: Selecting a CNN Model

Focus on roads or buildings.
Little research into identifying agricultural features
Per-pixel classifications of orthoimagery fall under the realm of scene recognition.
- Orthoimagery features tend to lack well defined boundaries
The SegNet model provides a relatively efficient and effective approach to scene recognition.

Previous Work: SegNet

Deep Convolutional Encoder-Decoder Network
Produces good results when applied to CamVid dataset.

Previous Work: SegNet Demo

Data Set Preprocessing

Data Set Preprocessing: Data Sets

Images: NAIP Imagery
Ground Truths: NASS Land-use classifications

Images clipped into 256x256 swatches

Data Set Preprocessing: Data Sets

Ground Truth data simplified into two classes: Water, and Not-Water

Not-Water	Water
93%	7%

Data Set Preprocessing: Gradient Image Layer

Using a sliding window, a gradient is computed by taking the scaled value between 0 and 255 of the largest difference in pixel intensity.

The sliding window operates over one of the original image layers to produce an intensity value for each pixel in the image, excluding a small border.

Data Set Preprocessing: Gradient Image Layer

Input Image Layer: NDVI
Window Size: 8x8 pixels
Output Image Size: 248x248 pixels

Data Set Preprocessing: Regression Image Layer

Using a sliding window, take the slope of the line calculated by taking the linear-regression between two bands.

The sliding window operates over two of the original image layers to produce an intensity value for each pixel in the image, excluding a small border.

Data Set Preprocessing: Regression Image Layer

Implementation based off of the paper Multi-scalar Analysis of Geospacial Agricultural Data for Sustainabiliy which introduces a means of allowing larger sliding windows without the computational cost of scanning for them.

Data Set Preprocessing: Regression Image Layer

Input Image Layer 1: Red
Input Image Layer 2: NIR
Window Size: 8x8 pixels
Output Image Size: 248x248 pixels

Data Set Preprocessing: Data Set Size

Dataset of 2,000 images.

Generating the aux images takes ~1.5 hours per type.

Training model takes 4-6 hours for 3 epochs.

Model

Model: SegNet

Kernel Size 7x7

Model: SegNet

Max pool + Indice Unraveling

Example with 4 down-sample & up-sample layers

Model: Custom SegNet Variant

Kernel Size 3x3

3 down-sample & up-sample layers

Model Variants

Control (No Aux Image Layer): Our control model. The base Custom SegNet Variant previously described. Takes RGB, NIR, and NDVI image layers as input.
Gradient Model: A modified version of the control that takes an additional layer generated via the Gradient process previously described.
Regression Model: A modified version of the control that takes an additional layer generated via the Regression process previously described.

Training & Evaluation

Training

Trained on 90% of availiable image swatches
- 1,800 image swatches
Batches of 15
3 Epochs
Checkpoints are saved every 100 steps

Evaluation

Evaluation is done on the remaining 10% of availiable image swatches
- 200 image swatches
The checkpoint with the highest evaluation accuracy is selected

Special Note

Training and Testing sets are generated once and then remain the same for all models.

Analysis

Analysis: Evaluation Accuracy

Model Type	Accuracy
Control (No Aux Image Layer)	92.3680%
Regression	86.0488%
Gradient	93.3380%

93% of the data set is Not-Water

Analysis: Per-Class Evaluation Accuracy

Model Type	Not-Water Accuracy	Water Accuracy
Control (No Aux Image Layer)	96.8892%	37.6272%
Regression	89.7641%	40.8237%
Gradient	98.5329%	30.2602%

93% of the data set is Not-Water

Analysis: Gradient Model

Per-class Not-Water accuracy at the cost of per-class water accuracy.

Analysis: Regression Model

Per-class Water Accuracy at the cost of per-class not-water accuracy.

Analysis: Regression Model

The regression model's water classifications tend to respond strongly to places where there is lots of vegetation along the coast of a water body.

Analysis: Regression Model

Control	Underestimates water areas
Gradient	Further underestimates water areas
Regression	Vastly overestimates water areas