View on GitHub

Vision_with_Lost_Glasses

(Computer) Vision with Lost Glasses

(Computer) Vision with Lost Glasses

Modelling how the brain deals with noisy input

Open In Colab

Objective

Imagine you lost your spectacle and the world around you is completely blurred out. As you stumble around, you see a small animal walk towards you. Can you figure out what it is? Probably yes right?

In this situation, or in foggy/night-time conditions, visual input is of poor quality; images are blurred and have low contrast and yet our brains manage to recognize it. Is it possible to model the process? Does previous experience help?

Introduction

Computer vision is one of the main areas where Artificial Neural Networks are frequently deployed to classify images. How these networks deal with various aberrations in the images is an ongoing field of research.

In this project, we use the ASIRRA dataset that contains images of dogs and cats to test the binary classification efficiency of a downscaled AlexNet, one of the most famous Artificial Neural Network used in computer vision using cross-entropy as loss function, ADAM optimizer, and early stopping to prevent overfitting and minimize training duration.

This network is trained on different variations of the dataset which are pathophysiologically analogous to various visual distortions of the human visual system like full or partial color blindness, blurriness. We use gaussian noise (of radius 2 and 5) and decrease the resolution to 64x64 pixels to simulate vision with lost glasses, color removal as an analog for partial color blindness, greyscale for full-color blindness, and some additional artificial errors such as salt and pepper, and speckle.

We also see the heat activation maps of the various error types to get a visual idea of the feature extraction being done in the network for various errors. We expect to see differences in accuracy and the number of epochs taken to reach suitable accuracy for different errors and understand why this Artificial Neural Network may perform better on specific variations.

The Asirra dataset

The CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) or HIP (Human Interactive Proof) challenge is the motivation behind the creation of this dataset.

Asirra (Animal Species Image Recognition for Restricting Access) is a HIP that works by asking users to identify photographs of cats and dogs. This task is difficult for computers, but studies have shown that people can accomplish it quickly and accurately.

Reference: Dataset can be found here (https://www.kaggle.com/c/dogs-vs-cats)

Project Model

Deep Learning Models of the Ventral Visual Stream

Lets try to model the visual categorisation task using AlexNet as model of the ventral visual stream.

Create Test Neural Network (AlexNet, 2012 with Batch Normalization and a downscaling factor)

We add Batch Normalization to make training faster and add a scaling factor to test the architechture in a relatively smaller and computationally feasible model. The parameter downscale scales down the number of channels in the network by the factor given by the value. For computational feasibility on Google Colab we use an Alexnet downscaled by a factor of 2.

Hypothesis

Training patterns

Accuracy

Gaussian Noise - radius 5

Blue color deletion

Intermediate layer output

Blue color blindness - intermediate layer output

Performance

Performance of Naive Learner on different noise

Performance of Expert Learner on different noise

Naive Learner vs Expert Learner

Conclusion

Thank You :)

Sai Ganesh N