Computer Vision in iOS – CoreML+Keras+MNIST

Hello world! It has been quite a while since my last blog on Object Recognition on iPhone – Computer Vision in iOS – Object Recognition. I have been experimenting a lot on YOLO implementation on iPhone 7 and got lost in time. I will be discussing about how to implement YOLO (Object Detection) in my next blog but this blog, though just number recognition, will help you to understand how to write your own custom network from scratch using Keras and convert it to CoreML model. Since you will be learning and experimenting a lot of new things, I felt it is better to stick with a simple network with predictable results than working with deep(errrr….) networks.

Problem Statement:

Given a 28×28 image of hand written digit, find the model that can predict the digit with high accuracy.

Pipeline Setup:

Before reading this blog further, you require a machine with MacOS 10.13, iOS 11 and Xcode 9.

We need to setup a working environment on our machines for training, testing and converting the custom deep learning models to CoreML models. If you read the documentation of coremltools – link – they suggest to use virtualenv. I personally recommend using Anaconda over virtualenv. If you prefer to use Anaconda, check this past blog of mine which will help you to go through a step-by-step process of setting up a conda environment for deep learning on Mac machines – TensorFlow Diaries- Intro and Setup. At present, Apple’s coremltools require Python 2.7 for environment setup. Open Terminal and type the following commands for setting up the environment.

$ conda create -n coreml python=2.7
$ source activate coreml
(coreml) $ conda install pandas matplotlib jupyter notebook scipy scikit-learn opencv
(coreml) $ pip install tensorflow==1.1
(coreml) $ pip install keras==2.0.4
(coreml) $ pip install h5py
(coreml) $ pip install coremltools

Designing & Training the network:

For this part of the code, you can either create a python file and follow along or check the jupyter notebook I wrote for code+documentation.

  • First let us import some necessary libraries, and make sure that keras backend in TensorFlow.
import numpy as np

import keras

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.utils import np_utils

# (Making sure) Set backend as tensorflow
from keras import backend as K
K.set_image_dim_ordering('tf')
  • Now let us prepare the dataset for training and testing.
# Define some variables
num_rows = 28
num_cols = 28
num_channels = 1
num_classes = 10

# Import data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(X_train.shape[0], num_rows, num_cols, num_channels).astype(np.float32) / 255
X_test = X_test.reshape(X_test.shape[0], num_rows, num_cols, num_channels).astype(np.float32) / 255

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
  • Design the model for training.
# Model
model = Sequential()

model.add(Conv2D(32, (5, 5), input_shape=(28, 28, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(128, (1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
  • Train the model.
# Training
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
  • Prepare model for inference by removing dropout layers.
# Prepare model for inference
for k in model.layers:
    if type(k) is keras.layers.Dropout:
        model.layers.remove(k)
  • Finally save the model.
model.save('mnistCNN.h5')

Keras to CoreML:

To convert your model from Keras to CoreML, we need to do few more additional steps. Our deep learning model expects a 28×28 normalised grayscale image, and gives probabilities for the class predictions as output. Also, let us add little more information to our model such as license, author etc.

import coremltools

output_labels = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
scale = 1/255.
coreml_model = coremltools.converters.keras.convert('./mnistCNN.h5',
                                                   input_names='image',
                                                   image_input_names='image',
                                                   output_names='output',
                                                   class_labels=output_labels,
                                                   image_scale=scale)

coreml_model.author = 'Sri Raghu Malireddi'
coreml_model.license = 'MIT'
coreml_model.short_description = 'Model to classify hand written digit'

coreml_model.input_description['image'] = 'Grayscale image of hand written digit'
coreml_model.output_description['output'] = 'Predicted digit'

coreml_model.save('mnistCNN.mlmodel')
  • By executing the above code, you should observe a file named ‘mnistCNN.mlmodel’ in your current directory.

Congratulations! You have designed your first CoreML model. With this information, you can design any custom model using Keras and convert it into CoreML model.

iOS app:

Most of the contents from here are focused towards app development and I will be explaining only few important things. If you want to go through a step-by-step process of pipeline setup for using CoreML in an iOS app, then I would suggest you to visit my previous blog – Computer Vision in iOS – Object Recognition – before reading further. The whole code is available online at – github repoSimilar to Object Recognition app, I added a custom view named DrawView for writing digits through finger swipe (most of the code for this view has been taken with inspiration from Apple’s Metal example projects). I added two buttons named ‘Clear’ and ‘Detect’ whose names represent their functionality. As we discussed in our previous blog, CoreML requires image in CVPixelBuffer format, so I added the helper code that converts it into required format. If you use Vision API of Apple, it can take care of all this complex conversions among image formats automatically but that consumes an additional 20% CPU when compared to the method I propose. This 20% CPU usage will matter when you are designing a heavy ML oriented real-time application 😛 .Here are the results of the working prototype of my app-

This slideshow requires JavaScript.

Source Code:

If you like this blog and want to play with the app, the code for this app is available here – iOS-CoreML-MNIST.

 

One thought on “Computer Vision in iOS – CoreML+Keras+MNIST

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s