Faster Style Transfer – PyTorch & CuDNN

In our previous blog, we showed how to create our own mini Deep Learning pipeline to train some models using PyTorch. MNIST is pretty cool to rapidly prototype and test low level ideas in Deep Learning! It also uses very minimal compute resources to train and test models with MNIST. In this blog, let us move from MNIST dataset and try solving some other interesting challenges in Deep Learning, such as Style Transfer, and look at some stats on how we can achieve real-time inference on Desktop using only Python.

NOTE: If you would like to dive into code right way, the code for this project is available at – LINK.

For solving Artistic Style Transfer using PyTorch, let us use some data set with larger magnitude such as MS-COCO. Since there are many interesting blogs/articles online explaining Style Transfer, I would like to focus this blog more on certain tweaks to get efficient performance both in terms of Training and Inference. The rest of this blog is organized as follows – We will quickly go through the naive definition of Style Transfer, then we will use the code provided by the PyTorch examples and convert it into the pipeline we discussed in the ‘Intro To PyTorch’ blog, we will then quickly train the model with minimal hyper-parameter tuning and save the trained model. Finally we load this saved model in inference mode and use webcam feed to perform (may be) real-time style transfer.

What is Style Transfer?

Artistic Style Transfer

As the picture says it all, in the style transfer application we will train a network to convert the input (content) image into the desired style. If you are interested to learn more about this, please feel free to read the PyTorch tutorial on Neural Style.

Model Training

  • Clone the repo: StyleTransfer-PyTorch
  • Download the dataset from MS-COCO website and put it in data/ directory and use styles of your choice.
  • Your directory structure should look like this before training.
├── (Entry point to the application)
├──  (Net class for init/train/test DL models)
├── models/ (Directory containing different DL arch.)
|   ├──
|   ├── (Style Transfer Network)
|   ├── 
|   └── ...
├── loaders/ (Directory containing data/model loaders)
|   ├──
|   ├── (DataLoader class for loading data)
|   └── (ModelLoader class for loading models)
├── data/ (Directory containing data)
|   ├── coco/
|   |   └──train2014/ (Content images!)
|   └── styles/ (Directory for styles)
├── LICENSE  (License of your choice)
└── documentation for Setup, Running & Results)
  • Create the Anaconda environment using the instructions given in – Intro To PyTorch.
  • Run – python --phase train, to train the model.

Few things to consider for faster training –

  • Batch Size: You can play with the train_batch_size argument for faster training.
  • Workers for Data Loading: (NOTE: It doesn’t work in Windows). You can increase the num_workers argument to increase the number of concurrent workers for data pre-processing.
  • (Optional) Image Pre-processing backend: (NOTE: Might involve a hectic setup in Windows. Alternative is to use OpenCV instead of Pillow.) In PyTorch, we use the TorchVision module to ease the image pre-processing. By default, TorchVision uses Pillow backend. You can replace it with Pillow-SIMD to make your image pre-processing faster.
  • CuDNN backend: Make sure to set your backend to CuDNN if you are running your training on an Nvidia GPU. Also, setup CuDNN benchmark flag to True, for most optimal performance in both training and inference. The following code block will do the magic.
if torch.cuda.is_available():
    # Sanity check - Empty CUDA Cache
    # Enforce CUDNN backend
    torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.enabled = True

On an RTX 2060 GPU, for training 2 epochs with a batch-size of 6 it roughly takes around 40 mins.

Please note that the above metrics are for Windows 10 OS. In my case, the bottleneck part is the data loading. You can easily speed up the DataLoader by either adding num_workers > 1 or by storing mini-batches of pre-processed intermediate tensor representations of images in pickle or h5py format. Since the scope of the DataLoader topic is outside the contents covered in this blog, we can review it in the future.

Model Inference

Here comes my favorite part! Deep Learning models (real-time) inference is probably one of the least targeted areas in open-source blogs. Honestly, I have seen very less documentation around this topic online. Since, this area requires a little bit attention, considering the growing adoption of Deep Learning models in production, I thought it would be a good idea to discuss about this topic. In the following sections, we will start with a very simple inference pipeline and try to tweak every individual element of this inference pipeline to get the desired real-time experience. Since the topics that will be covered can be new to a lot of people, I will try to keep it concise and clear. Further, to keep things simple, I am implementing everything only using Python.

Rule #1: If you are using laptops instead of Desktop or Cloud VMs, please make sure that it is connected to a constant power source and running in High Performance mode. Running the laptop on battery might lead to shortage in power supply to the GPUs and this can lead to throttling of GPU.

Simple Inference Pipeline – Webcam

Let us write a simple inference pipeline using Webcam feed as input to the style transfer model. The inference pipeline for this scenario should look like follows:

Initialize Webcam
while True:
    FETCH frame from webcam
    PREPROCESS the frame
    INFERENCE by passing frame into DL Model
    RENDER results on screen

The final code for this section is available at – LINK.

1. Initial Inference Pipeline

# Load model in eval mode

# Setup content transform
content_transform = transforms.Compose([
    transforms.Lambda(lambda x: x.mul(255))

# Initialize the camera
camera = cv2.VideoCapture(0)

with torch.no_grad():
    while True:
        # Fetch
        _, frame =
        # Preprocess
        content_image = content_transform(frame)
        content_image = content_image.unsqueeze(0).cuda()
        # Predict
        output = model(content_image)
        # Postprocess
        output = output.cpu().detach()[0].clamp(0, 255).numpy().transpose(1,2,0).astype("uint8")
        # Render results
        cv2.imshow('Frame', output)
        k = cv2.waitKey(1)
        if k==27:

If you run the above block, you will achieve a performance of roughly 15 FPS (i.e., roughly 63.5 milliseconds per frame). There are few ways to optimize this, either go on optimizing the model architecture by trying several types of layer combinations, layer fusions etc., or first check how optimized is your end-to-end pipeline. In the blog, we will emphasis on optimizing the end-to-end pipeline. A detailed analysis of all the experiments are provided in the Jupyter Notebook – LINK.

2. Optimize Preprocessing

In preprocessing, we convert the webcam frame from UInt8 HWC format to tensor representation (Float32 NHWC format, where N is the number of examples, H = Height of the image, W = Width of the image, C = Number of Channels in the image). In the original implementation, we use the TorchVision transforms with Pillow backend to achieve this. This implementation roughly takes 8.6 milliseconds per frame. We can speed up some of these Ops by replacing Pillow with OpenCV+Numpy Ops. Upon further investigation, we observed that converting UInt8 to Float32 is the most costly step in the preprocessing phase. But transferring Float32 to GPU is faster than transferring UInt8.

Rule #2: Minimize the Data Transfers between CPU and GPU. It is necessary to note that GPU clock cycles are slower when compared to the CPU’s. So, design your computations wisely!

Upon testing a few ideas, we concluded that transferring the UInt8 to GPU + converting it to Float32 on GPU is way faster when compared to converting UInt8 to Float32 on CPU and then transfering the Float32 to GPU. The following code block summarizes this idea.

# Preprocess
frame = frame.swapaxes(1, 2).swapaxes(0, 1) # HWC -> CHW
frame = frame[np.newaxis, :, :, :]          # CHW -> Numpy NCHW
content_image = torch.from_numpy(frame)     # Numpy -> Torch Tensor
content_image = content_image.cuda()        # CPU (UInt8) -> GPU (Byte)
content_image = content_image.type(torch.cuda.FloatTensor)

A detailed explanation of above ops with runtimes can be seen in the following notebook. In the next section, we will see how we optimized the post-processing phase of the pipeline.

3. Optimize Post-processing

Thanks to Python, our entire post-processing can be written in a single line of code output = output.cpu().detach()[0].clamp(0, 255).numpy().transpose(1,2,0).astype("uint8"). Here is what we are doing in this line of code:

  • cpu() – Copy Float32 Tensor from GPU to CPU
  • detach()[0] – Don’t track its gradients (refer this) and convert it from Tensor to CHW representation.
  • clamp(0, 255) – Clamp the values of following Array to get image pixel values.
  • numpy() – Convert it from Torch to Numpy Array.
  • transpose(1,2,0) – Just a variant of swapaxes which converts CHW to HWC.
  • astype("uint8") – Just a Numpy’s way to change data-type of the Array.

This initial implementation roughly takes around 4 milliseconds. Well, that might seem fast but if we can design the ops properly, we can actually get this entire post-processing run at 0.5 milliseconds. Here is how we can do this – clamp() is a per-element operation which can be trivially parallelized! Simply push this Op to GPU. From preprocessing optimization, we have seen that type-conversions are faster on GPU than CPU. So, simply convert the tensor from Float to Byte on GPU before transferring it to CPU. Finally, do the rest of the ops on CPU. Here is the summary of what we said in a single line of code – output = output.clamp(0, 255).type(torch.cuda.ByteTensor).cpu().detach()[0].numpy().transpose(1,2,0).

4. Async Webcam Frame Extraction

Know your hardware limits! A typical consumer webcam can provide frames only at 30 FPS. There is no way you can speed this up. But, by carefully designing this webcam frame extraction on a separate thread, you can potentially remove the overhead involved in fetching frames from webcam. That can save us up to 33.3 milliseconds which can be used for other costly operations such as model inference. You can see the VideoCaptureAsync for understanding the implementation. There are many other ways to implement this asynchronous webcam frame extraction but this article, LINK, does a really good job in explaining.

Rule #3: Try to keep data loading on a separate thread. You can use that time for heavy compute such as model inference!

5. Keeping it all together

Here is the final reference implementation combining all the things we discussed above. This implementation can speed up our inference from 15.7 FPS to 21.3 FPS, which might not like a big speed-up but we have optimized almost every phase of inference pipeline, except the model architecture optimizations. I consider model optimizations as a bit advanced topic and leaving it for future blogs.

# Load model in eval mode

# Setup content transform
content_transform = transforms.Compose([
    transforms.Lambda(lambda x: x.mul(255))

# Initialize the camera - Async
camera = VideoCaptureAsync(0)

with torch.no_grad():
    while True:
        # Fetch
        _, frame =
        # Preprocess (Optimized)
        frame = frame.swapaxes(1, 2).swapaxes(0, 1)
        frame = frame[np.newaxis, :, :, :]
        content_image = torch.from_numpy(frame)
        content_image = content_image.cuda()
        content_image = content_image.type(torch.cuda.FloatTensor)
        # Predict
        output = model(content_image)
        # Postprocess - Optimized
        output = output.clamp(0, 255).type(torch.cuda.ByteTensor).cpu().detach()[0].numpy().transpose(1,2,0)
        # Render results
        cv2.imshow('Frame', output)
        k = cv2.waitKey(1)
        if k==27:

You can further optimize this implementation by using either CUDA Streams, or asyncio, but I am leaving those topics for the user to explore. Also, if you can design your model architecture in a way that you get the most benefit out of CuDNN, you can easily achieve upto >47 FPS. I am leaving this as an exercise to the readers to experiment. The following video shows the style transfer application running real-time. Please use “HD” before playing, as wordpress’s default video encoding is deteriorating the quality.

Real-time Style Transfer (640×480 @ 47 FPS) [NOTE: Play in HD for better experience]

Thank you for reading this post, if you want to stay up-to-date with my future articles, you can subscribe by entering your email below.

Success! You're on the list.

The source code for this blog is available at LINK.


OpenCV in Android – An Introduction (Part 2/2)

In my previous post, I explained how to integrate OpenCV on Android. In this post, let us integrate camera into our app to do some live testing in future. If you are visiting this blog for the first time, I recommend you to read OpenCV in Android – An Introduction (Part 1/2) before reading the current blog. By the end of this blog you will be having your basic app ready for testing any of your Computer Vision Algorithms on the images that you acquire from camera!

  • In order to use camera in our app, we need to give permissions for our app to access camera in the mobile. Open ‘app/src/main/AndroidManifest.xml’ and add the following lines of code.
    <uses-permission android:name="android.permission.CAMERA" />

    <supports-screens android:resizeable="true"
        android:anyDensity="true" />

        android:required="false" />
        android:required="false" />
        android:required="false" />
        android:required="false" />
  • Let us add a button in our main activity to navigate to a new activity that uses a camera. Add the following code to ‘src/main/res/layout/activity_main.xml’
        android:text="OpenCV Camera"
  • After adding the button, create an intent to a new activity named ‘OpenCVCamera’ in your  MainActivity class by adding the following code.
        // Button to call OpenCV Camera Activity
        Button cameraInit = (Button) findViewById(;
        cameraInit.setOnClickListener(new View.OnClickListener() {
            public void onClick(View v) {
                Intent i = new Intent(getApplicationContext(),OpenCVCamera.class);
  • Now add a new Empty Activity by Right Click -> New -> Activity -> Empty Activity. Name the activity as OpenCVCamera. Edit the layout of your new activity to add camera view by using the code below.
<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android=""



  • Now add the following code into your file to see some action. After adding the following code try running the app on your device. I will explain the specifics in the later part of this blog.
package com.example.sriraghu95.opencvandroid_anintroduction;

import android.os.Bundle;
import android.util.Log;
import android.view.SurfaceView;
import android.view.WindowManager;

import org.opencv.core.Mat;

public class OpenCVCamera extends AppCompatActivity implements CameraBridgeViewBase.CvCameraViewListener2 {

    private static final String TAG = "OpenCVCamera";
    private CameraBridgeViewBase cameraBridgeViewBase;

    private BaseLoaderCallback baseLoaderCallback = new BaseLoaderCallback(this) {
        public void onManagerConnected(int status) {
            switch (status) {
                case LoaderCallbackInterface.SUCCESS:

    protected void onCreate(Bundle savedInstanceState) {
        cameraBridgeViewBase = (CameraBridgeViewBase) findViewById(;

    public void onResume(){
        if (!OpenCVLoader.initDebug()) {
            Log.d(TAG, "Internal OpenCV library not found. Using OpenCV Manager for initialization");
            OpenCVLoader.initAsync(OpenCVLoader.OPENCV_VERSION_3_1_0, this, baseLoaderCallback);
        } else {
            Log.d(TAG, "OpenCV library found inside package. Using it!");
    public void onCameraViewStarted(int width, int height) {


    public void onCameraViewStopped() {


    public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {
        return inputFrame.rgba();
  • If everything works fine, your screen should like the figure below. If your app shows a warning related to Camera Permissions, try going to settings and make sure that the camera permissions for the app is enabled. 🙂
  • But what is exaclty happening here? First you imported some necessary android and OpenCV classes for your app. To allow OpenCV to communicate with android camera functionalities, we implmented CvCameraViewListener2. The variable ‘CameraBridgeViewBase cameraBridgeViewBase’ acts as a bridge between camera and OpenCV. BaseLoaderCallback will give us information about whether OpenCV is loaded in our app or not. We also need some helper functions onResume, onCameraViewStarted, onCameraViewStopped and onCameraFrame to handle the events of the app.
  • With this you are ready with the basic set up of your development environment for Computer Vision application development in Android. I made some final edits to the app to make the camera view into Full Screen Activity and added some more event handlers. The code for the same can be accessed through the following github repo – LINK !

What’s next? In the next blog, I will discuss about how we can write our own custom C++ code for doing fun computer vision experiments using OpenCV on Android!

Wanna say thanks?

Like this blog? Found this blog useful and you feel that you learnt something at the end? Feel free to buy me a coffee 🙂 A lot of these blogs wouldn’t have been completed without the caffeine in my veins 😎


OpenCV in Android – An Introduction (Part 1/2)

Hello world! I am very excited to write this particular blog on the setup of OpenCV in Android Studio. There are many solutions there online which include setting up OpenCV using Eclipse, Android NDK etc but I didn’t find a single reliable source for doing the same setup using Android Studio. So, we (Me and V.Avinash) finally came up with a feasible solution with which you can setup Native Development setup in Android environment for designing Computer Vision applications using OpenCV and C++!!!

A quick intro about me, I am a Computer Vision enthusiast with nearly 4 years of theoretical and practical experience in the field. That said, I am quite good at implementing CV algorithms on Matlab and Python. But with years, the same field has been developing rapidly from the mere academic interest to industrial interest. But most of the standard algorithms in this field are not really optimized to run in real-time (60 FPS) or not designed specifically for the mobile platform. This has caught my interest and I have been working on this since the Summer 2016. I think about various techniques and hacks for optimizing the existing algorithms for mobile platform and how to acquire (and play with) 3D data from the 2D camera during my free time from being a research assistant.

Before starting this project, I am assuming that you already have basic setup of Android Studio up and running on your machines and you have decent experience working on it.

  • If you don’t already have Android Studio, you can download and install it from the following link.
  • Once you have the Android Studio up and running, you can download OpenCV for Android from the following link. After downloading, extract the contents from the zip file and move it to a specific location. Let it be ‘/Users/user-name/OpenCV-android-sdk’. I am currently using Android Studio v2.2.3 and OpenCV v3.2
  • Now start the Android Studio and click on ‘Start a new Android Studio project’. This will open a new window. Specify your ‘Application Name’, ‘Company Domain’ and ‘Project Location’. Make sure you select the checkbox ‘Include C++ Support‘. Now click Next!
  • In the ‘Targeted Android Devices’ window, select ‘Phone and Tablet’ with Minimum SDK: ‘API 21: Android 5.0 (Lollipop)’. Click Next.screen-shot-2017-02-27-at-6-07-56-pm
  • In the Activity selection window select ‘Empty Activity’ and click Next.screen-shot-2017-02-27-at-6-34-08-pm
  • In the Activity customization window leave everything as it is without any edits and click Next.screen-shot-2017-02-27-at-6-37-17-pm
  • In the Customize C++ Support, select C++ Standard: Toolchain Default and leave all the other checkboxes unchecked (for now, but you are free to experiment) and click Finish!
  • The Android Studio will take some time to load the project with necessary settings. Since you are developing an app that depends on Camera of your mobile, you can’t test these apps on an emulator. You need to connect your Android Phone (with developer options enabled) to computer and select the device when you pressed the debug option. After running the application, you should see the following on your mobile if everything works fine!screenshot_20170227-184805
  • At this point of the project you have your basic native-development (C++ support) enabled in your app. Now let us start integrating OpenCV into your application.
  • Click on File -> New -> Import Module. In the pop-up window, give path to your ‘OpenCV-android-sdk/sdk/java’ directory and click on OK. You can find the module name as ‘openCVLibrary320’ and click Next, Finish to complete the importing.
  • Now, go to “openCVLibrary320/build.gradle” and change the following variables to those in the “app/build.gradle”: compileSdkVersion, buildToolsVersion, minSdkVersion, and targetSdkVersion. Sync the project after editing the gradle files. My “openCVLibrary320/build.gradle” file looks like this!
apply plugin: ''

android {
    compileSdkVersion 25
    buildToolsVersion "25.0.2"

    defaultConfig {
        minSdkVersion 21
        targetSdkVersion 25

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.txt'
  • Add a new folder named ‘jniLibs’ to “app/src/main/” by right click -> New -> Directory. Copy and paste the directories in the ‘OpenCV-android-sdk/sdk/native/libs/’ to jniLibs folder in your app. After the import, remove all *.a files from your imported directories. At the end, you should have 7 directories with files in them.Screen Shot 2017-02-28 at 8.08.39 AM.png
  • Now go to ‘app/CMakeLists.txt’ and link the OpenCV by doing the following (Refer to those lines following the [EDIT] block for quick changes):
# library. You should either keep the default value or only pass a
# value of 3.4.0 or lower.

cmake_minimum_required(VERSION 3.4.1)

# [EDIT] Set Path to OpenCV and include the directories
# pathToOpenCV is just an example to how to write in Mac.
# General format: /Users/user-name/OpenCV-android-sdk/sdk/native
set(pathToOpenCV /Users/sriraghu95/OpenCV-android-sdk/sdk/native)

# Creates and names a library, sets it as either STATIC
# or SHARED, and provides the relative paths to its source code.
# You can define multiple libraries, and CMake builds it for you.
# Gradle automatically packages shared libraries with your APK.

add_library( # Sets the name of the library.

             # Sets the library as a shared library.

             # Provides a relative path to your source file(s).
             # Associated headers in the same location as their source
             # file are automatically included.
             src/main/cpp/native-lib.cpp )

# [EDIT] Similar to above lines, add the OpenCV library
add_library( lib_opencv SHARED IMPORTED )
set_target_properties(lib_opencv PROPERTIES IMPORTED_LOCATION /Users/sriraghu95/Documents/Projects/ComputerVision/OpenCVAndroid-AnIntroduction/app/src/main/jniLibs/${ANDROID_ABI}/

# Searches for a specified prebuilt library and stores the path as a
# variable. Because system libraries are included in the search path by
# default, you only need to specify the name of the public NDK library
# you want to add. CMake verifies that the library exists before
# completing its build.

find_library( # Sets the name of the path variable.

              # Specifies the name of the NDK library that
              # you want CMake to locate.
              log )

# Specifies libraries CMake should link to your target library. You
# can link multiple libraries, such as libraries you define in the
# build script, prebuilt third-party libraries, or system libraries.

target_link_libraries( # Specifies the target library.

                       # Links the target library to the log library
                       # included in the NDK.
                       ${log-lib} lib_opencv) #EDIT

  • Edit the ‘app/build.gradle’ set the cppFlags and refer to jniLibs source directories and some other minor changes, you can refer to the code below and replicate the same for your project. All new changes made on the pre-existing code are followed by comments “//EDIT”.
apply plugin: ''

android {
    compileSdkVersion 25
    buildToolsVersion "25.0.2"
    defaultConfig {
        applicationId "com.example.sriraghu95.opencvandroid_anintroduction"
        minSdkVersion 21
        targetSdkVersion 25
        versionCode 1
        versionName "1.0"
        testInstrumentationRunner ""
        externalNativeBuild {
            cmake {
                cppFlags "-std=c++11 -frtti -fexceptions" //EDIT
                abiFilters 'x86', 'x86_64', 'armeabi', 'armeabi-v7a', 'arm64-v8a', 'mips', 'mips64' //EDIT

    sourceSets {
        main {
            jniLibs.srcDirs = ['/Users/sriraghu95/Documents/Projects/ComputerVision/OpenCVAndroid-AnIntroduction/app/src/main/jniLibs'] //EDIT: Use your custom location to jniLibs. Path given is only for example purposes.

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), ''
    externalNativeBuild {
        cmake {
            path "CMakeLists.txt"

dependencies {
    compile fileTree(dir: 'libs', include: ['*.jar'])
    androidTestCompile('', {
        exclude group: '', module: 'support-annotations'
    compile ''
    testCompile 'junit:junit:4.12'
    compile project(':openCVLibrary320') //EDIT
  • Once you are done with all the above steps, do sync the gradle and go to src/main/cpp/native-lib.cpp . To make sure that the project setup is done properly, start including OpenCV files in native-lib.cpp and it should not raise any errors.

extern "C"
        JNIEnv *env,
        jobject /* this */) {
    std::string hello = "Hello from C++";
    return env->NewStringUTF(hello.c_str());
  • Now make sure all your gradle files are in perfect sync and Rebuild the project once to check there are no errors in your setup.

By the end of this blog, we finished setting up OpenCV in your android project. This is a pre-requisite for any type of android application you want to build using OpenCV. Considering that there will be two types of possibilities using OpenCV in your application: a) Doing processing on images from your own personal library on mobiles and b) Doing real-time processing on live-feed from camera, I think this is best place to stop this part of the blog.

In my next post, I will be focusing on how to use camera in your application and do some simple processing on the data that you acquire from it.

Next: OpenCV in Android – An Introduction (Part 2/2)

Source Code: Link

[New] Android Application: Link

Wanna say thanks?

Like this blog? Found this blog useful and you feel that you learnt something at the end? Feel free to buy me a coffee 🙂 A lot of these blogs wouldn’t have been completed without the caffeine in my veins 😎


OpenCV in iOS – Face Detection

Hi, after a quite break I am back to my blog to post some new things related to optimized computer vision algorithms on mobile platform. I have been experimenting with android recently to come up with an easiest setup for OpenCV to start developing (I will be posting about it in my next blog). In this post, I will be explaining how to do Face detection in almost real time using OpenCV’s Haar cascades. This is not an advanced tutorial on detection/object recognition but it will help you to start working on your custom classification problems. Let us dive in!

A quick note before diving in, this blog expects that you have already read my previous blogs on OpenCV in iOS (An Introduction, The Camera)so that you can have the starter code up and running.

In this blog post, we are going to detect the faces and eyes from live video stream of your iOS device’s camera. Now start following the steps mentioned below!

  1. Import necessary frameworks into the project: opencv2, AVFoundation, Accelerate, CoreGraphics, CoreImage, QuartzCore, AssetsLibrary, CoreMedia, and UIKit frameworks.
  2. Rename ViewController.m to to start coding in Objective-C++.
  3. Add necessary haarcascade files from ‘<opencv-folder>/data/haarcascades/’ directory into your supporting files directory of Project. You can do this by right-click on Supporting Files and select ‘Add files to <your-project name>’
  4. Open and start adding the following lines of code for enabling Objective-C++ and let us also define some colors to draw to identify faces and eyes on the image.screen-shot-2017-02-24-at-3-30-57-pm
  5. Now you need to edit the ViewController interface to initialise the parameters for live view, OpenCV wrappers to get camera access through AVFoundation and Cascade Classifiers.screen-shot-2017-02-24-at-3-34-34-pm
  6. In the ViewController implementation’s viewDidLoad method write the following code to setup the OpenCV view.screen-shot-2017-02-24-at-3-40-07-pm
  7. The tricky part is reading the Cascade classifiers inside the project. Follow the steps suggested below to do the same and start the videoCamera!screen-shot-2017-02-24-at-3-45-32-pm
  8. Once the videoCamera is started, each image has to be processed inside the processImage method! Screen Shot 2017-02-25 at 7.27.15 PM.png
  9. Now the code is complete! Please note that I am not covering specific math topics behind the Haar-Cascades detection as I feel there are so many blogs out there which can explain it really good. For code related to this blog, you can contact me via E-mail (Contact). The screenshot of the execution of my code is placed below!

    Screen Shot

OpenCV in iOS – The Camera

Hello everyone, this is my second blog post on ‘OpenCV in iOS’ series. Before starting this tutorial, it is recommended that you complete the ‘OpenCV in iOS – An Introduction‘ tutorial. In this blog post, I will be explaing how to use the camera inside your iOS app. For setting up the application in Xcode, please complete till step 6 in ‘OpenCV in iOS – An Introduction‘ tutorial before you proceed to the below mentioned steps!

  1. In this app, we need some additional frameworks to include in our project. They are listed as follows –
    • Accelerate.framework
    • AssetsLibrary.framework
    • AVFoundation.framework
    • CoreGraphics.framework
    • CoreImage.framework
    • CoreMedia.framework
    • opencv2.framework
    • QuartzCore.framework
    • UIKit.framework
  2. We already know how to add ‘opencv2.framework‘ from the previous blog post. I will go through the process of how to add one of the above mentioned frameworks (e.g: AVFoundation.framework), likewise you can add the rest. To add ‘AVFoundation.framework‘, go to ‘Linked Frameworks and Libraires‘ and click on the ‘+’ sign. Choose the ‘AVFoundation.framework‘ and click on ‘Add‘.

    Screen Shot 2016-07-23 at 11.36.53 pm

  3. Now your project navigator area should like this.

    Screen Shot 2016-07-24 at 10.36.20 pm

  4. It’s time to make our hands dirty! 🙂 Open ‘ViewController.h‘ and write the following lines of code.

    Screen Shot 2016-07-24 at 10.38.20 pm

  5. Now go to ‘‘ file and add some lines to include C++ code along with the Objective-C code.

    Screen Shot 2016-07-24 at 10.55.23 pm

  6. Let us initialise some variables for getting the camera access and for live output from camera.

    Screen Shot 2016-07-24 at 10.58.52 pm

  7. Now setup the live view such that it fills the whole app screen.

    Screen Shot 2016-07-24 at 5.36.54 pm.png

  8. Initialise the Camera parameters and start capturing! 

    Screen Shot 2016-07-24 at 11.02.01 pm.png

  9. But wait! we still have to do one more step before actually testing our app. If you observe the line “@implementation ViewController”, you will find a warning “Method ‘processImage:’ in protocol ‘CvVideoCameraDelegate’ not implemented”. To know more about CvVideoCameraDelegate, refer this link. Coming back to our tutorial, we have to add the following lines of code to overcome that warning.

    Screen Shot 2016-07-24 at 11.19.11 pm

  10. And now we are ready to run the app! In this application, we have to use the iPad/iPhone to run and test our application because we have to access the camera of the device. Now we can see the live view of our camera! 🙂 


  11.  Lets give some basic instaTouch! to our app 😉 .Add the following lines of code in the ‘processImage‘ method and run the application on your device.

    This slideshow requires JavaScript.

With this we are coming to an end of this tutorial! 🙂 We have learnt how to access Camera inside the app and apply some live operations on the video. Though this is a basic tutorial, this will act as a precursor for many Augmented Reality type applications! 😀 We will try to get into next level of development of Computer Vision apps in our next tutorial! Still then stay tuned… 🙂 Feel free to comment your suggestions/doubts related to this tutorial.

The SOURCE CODE for the following tutorial is available at the following GITHUB LINK.


OpenCV in iOS – An Introduction

Hello World! This is my first official blog post related to Computer Vision. I guess the title gives you the glimpse of what we are trying to achieve in this session.

What is OpenCV?

OpenCV is an open sourced BSD licensed computer vision library which is available on all major platforms (like Android, iOS, Linux, Mac OSX, Windows) and is primarily written in C++ (with bindings available for Python, Java and even MATLAB). You can check the documentation of OpenCV at

In this session, we are trying to design a basic iOS app which uses OpenCV for the image processing part of the app. I will be using the XCode v7.2.1 and OpenCV v2.4.13.

  1. Download opencv2.framework from the following link. Unzip the downloaded file and keep it in your workspace.
  2. Now open Xcode and start the new project by clicking on ‘Create a new Xcode project‘ in the left column of the following image.

    Screen Shot 2016-07-21 at 6.04.27 pm

  3. This will take you to a new window where you can select the template for your new project. Make sure that you select the ‘Single View Application‘ under the iOS -> Application and click ‘Next‘.

    Screen Shot 2016-07-21 at 6.11.46 pm

  4. Choose the name you want to keep to the application and fill it in the ‘Product Name‘. Choose the ‘Language‘ as Objective-C and ‘Devices‘ as Universal. Now click ‘Next‘ and choose the location of your workspace and click on ‘Create‘. From now onwards, I will refer to the project folder location as <project_folder>.
  5. Now move the unzipped version of opencv2.framework to “<project_folder>/” . Now go to settings of your Xcode project and select General -> Linked Frameworks and Libraries -> click on ‘+’ sign (to add a new framework to the project) -> Click on ‘Add Other…’ -> browse to “<workspace>/<project_folder>/opencv2.framework” -> Select it and click ‘Open’. Now your project navigator area will look like this.

    Screen Shot 2016-07-23 at 10.51.06 am.png

  6. In this tutorial, we will be using Objective-C and C++ to design the app. For this task to achive we have to rename ‘ViewController.m‘ file to ‘‘. This simple name convention will notify Xcode that we will be mixing Objective-C and C++.

    Screen Shot 2016-07-23 at 11.01.45 am.png

  7. Now lets start the Coding part 🙂 ! Right now, your ‘‘ file should look like this.

    Screen Shot 2016-07-23 at 11.08.36 am

  8. Now add the following code, so that we can mix C++ code in here.

    Screen Shot 2016-07-23 at 12.42.24 pm

  9. Let us setup the view.

    Screen Shot 2016-07-23 at 12.44.33 pm

  10. Now let us implement the ViewController part. We have to setup the imageView such that it takes the entire App screen. Let us also add a small part of code to correct the aspect ratio of the image.

    Screen Shot 2016-07-23 at 12.55.33 pm

  11. Now add our imageView as a sub view.

    Screen Shot 2016-07-23 at 1.04.24 pm

  12. Before moving to next part, copy an image of your choice to “<workspace>/<project_folder>/” location and go to project navigator area in Xcode and select ‘Supporting Files‘ and click on ‘Add Files to “<your_project_name>”… ‘. Navigate to <project_folder> location and select the image and click on ‘Add‘ button. You can find the image under the ‘Supporting Files‘ folder.

    Screen Shot 2016-07-23 at 1.07.06 pm

  13. Now, let us write code to read the image and display it on the screen. If the image file is not present, let us display some error message.

    Screen Shot 2016-07-23 at 1.20.16 pm

  14. Let us run the application and see the results. I am using iPhone 6s Plus emulator to check my results. Voila! it worked 😀 (I am attaching the screenshot of the emulator). If it didn’t show anything on the screen you can check the messages in the Debug area of Xcode.

    Simulator Screen Shot 23-Jul-2016 1.25.01 pm

  15. But wait, we didn’t use any OpenCV functionality till now! Before applying any of the OpenCV functions we have to convert the image from UIImage to OpenCV’s Mat datatype. And for displaying the image on the screen we have to convert from Mat to UIImage again.
    1. Convert the image from RGB to Grayscale:

      This slideshow requires JavaScript.

    2. Apply Gaussian Blur to the above Grayscale image: (you can observe the blurred version of the above result here)

      This slideshow requires JavaScript.

    3. Apply Canny Edge detection on the above result:

      This slideshow requires JavaScript.


With this we are coming to an end of the very first tutorial of ‘OpenCV in iOS’. The SOURCE CODE for the following tutorial is available at the following GITHUB LINK. I hope you enjoyed this tutorial. 🙂

Where to go next?
After this tutorial, you can check my next blog post about how to use the camera inside your app using OpenCV’s CvVideoCameraDelegate at OpenCV in iOS – The Camera.