Blog

Computer Vision in iOS – Object Recognition

Problem Statement: Given an image, can a machine accurately predict what is there in that image?

Why is this so hard? If I show an image to a human and ask him/her what is there in that image, (s)he can predict exactly what objects are present in the image, where is that picture taken, what is the speciality of the image, (if people are present in the image) what is the action being done by them and what are they going to do etc. For a computer, a picture is nothing but a bunch of numbers. Hence, it can’t easily understand the semantics of it as a human does. Even after telling this if the question – Why is it so hard? – is ringing in your head, then let me ask you to write an algorithm to detect (just) cat!

Having basic assumptions – every cat has two ears, an oval face with whiskers on it, a cylindrical body, four legs and a curvy tail! Perfect 🙂 We have our initial assumptions to start writing code! Assume we have written the code (per say, 50 lines of if-else statements) to find primitives in an image which when combined form a cat that looks nearly as shown in the figure below (PS: Don’t laugh 😛 )

Screen Shot 2017-06-08 at 8.00.50 PM

Ok let us test the performance on some real world images. Can our algorithm accurately predict the cat in this picture?

tabby-cat-names

If you think the answer is yes, I would suggest you to think again. If you carefully observe the cat image with primitive shapes, we have actually coded to find the cat that is turning towards its left. Ok! No worries! Write exact same if-else conditions for a cat turning towards its right 😎 . Just an extra 50 lines of conditions. Good! Now we have the cat detector! Can we detect the cat in this image? 😛

maxresdefault

Well, the answer is no 😦 . So, for tackling these type of problems we move from basic conditionals to Machine Learning/Deep Learning. Machine Learning is a field where machines learn how to do some specific tasks which only humans are capable of doing it before. Deep Learning is a subset of Machine Learning in which we train very deep neural network architectures. A lot of researchers have already solved this problem and there are some popular neural network architectures which do this specific task.

The real problem lies in importing this network into a mobile architecture and making it run real-time. This is not an easy task. First of all convolutions in a CNN is a costly step and the size of the neural network (forget about it 😛 ). The industries like Google, Apple etc and few research labs have put heavy focus on optimizing the size and performance of neural networks and at last we are having some decent results making neural networks work with decent speed on mobile phones. Still there is a lot of amazing research that needs to be done in this field. After Apple’s WWDC-’17 keynote, the whole app development for solving this particular problem has turned from a 1 year effort to single night effort. Enough of theory and facts, let us dive into the code!

For following this blog from here you need to have the following things ready:

  1. MacOS 10.13 (a.k.a MacOS High Sierra)
  2. Xcode 9
  3. iOS 11 on your iPhone/iPad.
  4. Download pre-trained Inception-v3 model from Apple’s developer website – https://developer.apple.com/machine-learning/
  5. (Optional) Follow my previous blog to setup camera in your app – Computer Vision in iOS – Core Camera

Once you have satisfied all the above requirements, let us move to adding Machine Learning model into our app.

  • First of all, create a new Xcode ‘Single View App’ Project, select language as ‘Swift’ and set you project name and wait for Xcode to create project. Go to Build Settings of the app and change the Swift Compiler – Language – Swift Language version from Swift 4 to Swift 3.2.

Screen Shot 2017-06-12 at 10.54.28 AM

  • In this particular project, I am moving from my traditional CameraBuffer pipeline to a newer one to make the object detection run constantly at 30 FPS asynchronously. We are using this approach to make sure that the user won’t feel any lag in the system (Hence, better user experience!). First add a new Swift file name ‘PreviewView.swift’ and add the following code to it.
import UIKit
import AVFoundation

class PreviewView: UIView {
    var videoPreviewLayer: AVCaptureVideoPreviewLayer {
        return layer as! AVCaptureVideoPreviewLayer
    }

    var session: AVCaptureSession? {
        get {
            return videoPreviewLayer.session
        }
        set {
            videoPreviewLayer.session = newValue
        }
    }

    override class var layerClass: AnyClass {
        return AVCaptureVideoPreviewLayer.self
    }
}
  • Now let us add camera functionality to our app. If you followed my previous blog under optional pre-requisite. Most of the content here will look pretty obvious and easy. First go to Main.storyboard and add ‘View’ as a child object to existing View.

Screen Shot 2017-06-15 at 7.28.33 AM

  • After dragging and dropping into the existing View, go to ‘Show the Identity Inspector’ in the right side inspector of Xcode and under ‘Custom Class’ change class  from UIView to ‘PreviewView’. If you recall, PreviewView is nothing but the new swift file we added in one of the previous steps in which we inherit few properties from UIView.

Screen Shot 2017-06-15 at 7.44.09 AM

  • Make the View full screen with its content mode to ‘Aspect Fill’ and add a Label View under it as a child to see the prediction classes. Add IBOutlets to both View and LabelView in ViewController.swift file.
  • Your current ViewController.swift file should look like this –
import UIKit

class ViewController: UIViewController {

    @IBOutlet weak var previewView: PreviewView!
    @IBOutlet weak var predictLabel: UILabel!

    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.
    }

    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }

}
  • Le us initialise some parameters for session. The session should use frames from camera, it should start running when the view appears and stop running when the view disappears. Also we need to make sure that we have permissions to use camera and if permissions were not given, we should ask for permission before session starts. Hence, we should make the following changes to our code!
import UIKit
import AVFoundation

class ViewController: UIViewController {

    @IBOutlet weak var previewView: PreviewView!
    @IBOutlet weak var predictLabel: UILabel!

    // Session - Initialization
    private let session = AVCaptureSession()
    private var isSessionRunning = false
    private let sessionQueue = DispatchQueue(label: "session queue", attributes: [], target: nil)
    private var permissionGranted = false

    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.

        // Set some features for PreviewView
        previewView.videoPreviewLayer.videoGravity = AVLayerVideoGravityResizeAspectFill
        previewView.session = session

        // Check for permissions
        checkPermission()

        // Configure Session in session queue
        sessionQueue.async { [unowned self] in
            self.configureSession()
        }
    }

    // Check for camera permissions
    private func checkPermission() {
        switch AVCaptureDevice.authorizationStatus(forMediaType: AVMediaTypeVideo) {
        case .authorized:
            permissionGranted = true
        case .notDetermined:
            requestPermission()
        default:
            permissionGranted = false
        }
    }

    // Request permission if not given
    private func requestPermission() {
        sessionQueue.suspend()
        AVCaptureDevice.requestAccess(forMediaType: AVMediaTypeVideo) { [unowned self] granted in
            self.permissionGranted = granted
            self.sessionQueue.resume()
        }
    }

    // Start session
    override func viewWillAppear(_ animated: Bool) {
        super.viewWillAppear(animated)

        sessionQueue.async {
            self.session.startRunning()
            self.isSessionRunning = self.session.isRunning
        }
    }

    // Stop session
    override func viewWillDisappear(_ animated: Bool) {
        sessionQueue.async { [unowned self] in
            if self.permissionGranted {
                self.session.stopRunning()
                self.isSessionRunning = self.session.isRunning
            }
        }
        super.viewWillDisappear(animated)
    }

    // Configure session properties
    private func configureSession() {
        guard permissionGranted else { return }

        session.beginConfiguration()
        session.sessionPreset = AVCaptureSessionPreset1280x720

        guard let captureDevice = AVCaptureDevice.defaultDevice(withDeviceType: .builtInWideAngleCamera, mediaType: AVMediaTypeVideo, position: .back) else { return }
        guard let captureDeviceInput = try? AVCaptureDeviceInput(device: captureDevice) else { return }
        guard session.canAddInput(captureDeviceInput) else { return }
        session.addInput(captureDeviceInput)

        let videoOutput = AVCaptureVideoDataOutput()

        videoOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "sample buffer"))
        videoOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : kCVPixelFormatType_32BGRA]
        videoOutput.alwaysDiscardsLateVideoFrames = true
        guard session.canAddOutput(videoOutput) else { return }
        session.addOutput(videoOutput)

        session.commitConfiguration()
        videoOutput.setSampleBufferDelegate(self, queue: sessionQueue)
    }

    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }

}

extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
    func captureOutput(_ output: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {

    }
}
  • Don’t forget to add ‘Privacy-Camera Usage Description’ in Info.plist and run the app on your device. The app should show camera frames on screen with just 5% CPU usage 😉 Not bad! Now, let us add Inception v3 model to our app.
  • If you didn’t download the Inception v3 model yet, download it from the link provided above. By this step, you will be having a file named ‘Inceptionv3.mlmodel’.

Screen Shot 2017-06-12 at 11.01.47 AM

  • Drag and drop the ‘Inceptionv3.mlmodel’ file into your Xcode Project. After importing the model into your project, click on the model and this is how your ‘*.mlmodel’ file looks like in Xcode.

Screen Shot 2017-06-12 at 11.05.28 AM

  • What information does ‘*.mlmodel’ file convey? At the starting of the file, you can observe some information about the file such as name of the file, size of it, author and license information, and description about the network. Then comes the  ‘Model Evaluation Parameters’ which explains us what should be the input of the model and how our output looks like. Now let us setup our ViewController.swift file to send images into the model for predictions.
  • Apple has made Machine Learning very easy through its CoreML Framework. All we have to do is ‘import CoreML’ and initialise model variable with ‘*.mlmodel’ file name.
import UIKit
import AVFoundation
import CoreML

class ViewController: UIViewController {

    @IBOutlet weak var previewView: PreviewView!
    @IBOutlet weak var predictLabel: UILabel!

    // Session - Initialization
    private let session = AVCaptureSession()
    private var isSessionRunning = false
    private let sessionQueue = DispatchQueue(label: "session queue", attributes: [], target: nil)
    private var permissionGranted = false

    // Model
    let model = Inceptionv3()

    override func viewDidLoad() { //...
  • The fun part begins now 🙂 . If we consider every Machine Learning/Deep Learning model as a black box (i.e., we don’t know what is happening inside), then all we should care about is given certain inputs to the black box, are we getting desired outputs? (PC: Wikipedia). But, we can’t any type of input to the model and expect desired output. If the model is trained for a 1D signal, then input should be tweaked to 1D before sending into the model. If the model is trained for 2D (e.g.: CNNs), then input should be a 2D signal. The dimensions and size of the input should match with the model’s input parameters.

Blackbox3D-withGraphs

  • The Inception v3 model takes input a 3 channel RGB image of size 299x299x3. So, we should resize our image before passing it into the model. Add the following code at the end of the ViewController.swift file that will resize the image to our desired dimensions 😉 .
extension UIImage {
    func resize(_ size: CGSize)-> UIImage? {
        UIGraphicsBeginImageContext(size)
        draw(in: CGRect(x: 0, y: 0, width: size.width, height: size.height))
        let image = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
        return image
    }
}
  • In order to pass the image into the CoreML model, we need to convert it from UIImage format to CVPixelBuffer. For doing the same, I am adding some Objective-C code and linking it to swift code using a Bridging Header. If you have no clue about the Bridging header and combing objective-C with Swift code, I would suggest to check out this blog – Computer Vision in iOS – Swift+OpenCV
  • ImageConverter.h
#import <Foundation/Foundation.h>
#import <AVFoundation/AVFoundation.h>

@interface ImageConverter : NSObject

+ (CVPixelBufferRef) pixelBufferFromImage: (CGImageRef) image;

@end
  • ImageConverter.m
#import "ImageConverter.h"

@implementation ImageConverter
+ (CVPixelBufferRef)pixelBufferFromImage:(CGImageRef)image {

    CGSize frameSize = CGSizeMake(CGImageGetWidth(image), CGImageGetHeight(image));
    CVPixelBufferRef pixelBuffer = NULL;
    CVReturn status = CVPixelBufferCreate(kCFAllocatorDefault, frameSize.width, frameSize.height, kCVPixelFormatType_32BGRA, nil, &pixelBuffer);
    if (status != kCVReturnSuccess) {
        return NULL;
    }

    CVPixelBufferLockBaseAddress(pixelBuffer, 0);
    void *data = CVPixelBufferGetBaseAddress(pixelBuffer);
    CGColorSpaceRef rgbColorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef context = CGBitmapContextCreate(data, frameSize.width, frameSize.height, 8, CVPixelBufferGetBytesPerRow(pixelBuffer), rgbColorSpace, (CGBitmapInfo) kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedFirst);
    CGContextDrawImage(context, CGRectMake(0, 0, CGImageGetWidth(image), CGImageGetHeight(image)), image);

    CGColorSpaceRelease(rgbColorSpace);
    CGContextRelease(context);
    CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);

    return pixelBuffer;
}
@end
  • iOS-CoreML-Inceptionv3-Bridging-Header.h
//
//  Use this file to import your target's public headers that you would like to expose to Swift.
//

#import "ImageConverter.h"
  • Now make some final changes to the file ViewController.swift. In this step, we resize the input image, convert it into CVPixelBuffer and pass it to CoreML model to predict the results.
extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
    func captureOutput(_ output: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
        guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
        let ciImage = CIImage(cvImageBuffer: imageBuffer)
        guard let uiImage = UIImage(ciImage: ciImage).resize(modelInputSize),
            let cgImage = uiImage.cgImage,
            let pixelBuffer = ImageConverter.pixelBuffer(from: cgImage)?.takeRetainedValue(),
            let output = try? model.prediction(image: pixelBuffer) else {
                return
        }
        DispatchQueue.main.async {
            self.predictLabel.text = output.classLabel
        }
    }
}
  • Here are some results of the app running on iPhone 7.
  • The results look convincing, but I should not judge the results as the network is not trained by me. What I care for is the performance of the app on the mobile phone! With the current implementation of the pipeline, while profiling the application, the CPU usage of the app is always <30%. Thanks to CoreML as the whole Deep Learning computations have been moved to GPU and the only task of CPU is to do some basic Image Processing and pass the image into the GPU, and fetch predictions from there. There is still a lot of scope to improve the coding style of the app, and I welcome any suggestions/advice from you. 🙂 

    Source code:

    If you like this blog and want to play with the app, the code for this app is available here – iOS-CoreML-Inceptionv3

Computer Vision in iOS – Swift+OpenCV

Hello all, I realised that it has been quite a while since I posted my last blog –  Computer Vision in iOS – Core Camera. In my last blog, I discussed about how we can setup Camera in our app without using OpenCV. Since the app has been designed in Swift 3, it is very easy for many budding iOS developers to understand what is going on in that code. I thought of going a step further and design some basic image processing algorithms from scratch. After designing few algorithms, I realised that it is quite hard for me to explain even simple RGB to grayscale conversion without scaring the readers. So, I thought of taking a few steps back and integrate OpenCV into the swift version of our Computer Vision app in hope that it can help the readers during their speed prototyping of proof-of-concepts. But many people have already discussed about how to integrate OpenCV into Swift based apps. The main purpose of this blog post is to introduce you to the data structure of the image and to explain why we are implementing certain things the way they are.

Before starting this blog, it is advised that you read this blog on setting up Core Camera using Swift.

  • Start by creating a new Xcode Project, select Single View Application. Name your project and organisation, set language as Swift.
  • For removing some constraints related to UI/UX and since most of the real-time performance apps in Vision either fix to Portrait or Landscape Left/Right orientation through out its usage, go to General -> Deployment Info and uncheck all unnecessary orientations of the app.

Screen Shot 2017-06-04 at 1.25.13 PM

  • Go to Main.storyboard and add the Image View to your app by drag-and-drop from the following menu to the storyboard.

Screen Shot 2017-06-04 at 1.29.39 PM

  • Go to “Show the Size Inspector” on the top-right corner and make the following changes.

Screen Shot 2017-06-04 at 1.35.40 PM

  • Now add some constraints to the Image View.

Screen Shot 2017-06-04 at 1.37.37 PM

  • After the above settings, you can observe that the Image View fills the whole screen on the app. Now go to ‘Show the attributes inspector’ on the top right corner and change ‘Content Mode’ from Scale To Fill to ‘Aspect Fill’.

Screen Shot 2017-06-04 at 1.40.18 PM

  • Now add an IBOutlet to the ImageView in ViewController.swift file. Also add the new swift file named ‘CameraBuffer.swift’ file and copy paste the code shown in the previous blog. Also change your ViewController.swift file as shown in previous blog. Now if you run your app, you can see a portrait mode camera app with ~30 FPS. (Note: Don’t forget to add permissions to use camera in Info.plist).
  • Let us dive into adding OpenCV into our app. First let us add the OpenCV Framework into our app. If you are following my blogs from starting, it should be easy for you.
  • Let us get into some theoretical discussion. (Disclaimer: It is totally fine to skip this bullet point if you only want the app working). What is an Image? From the signals and systems perspective, an Image is defined as a 2D discrete signal where each pixel signifies a value between 0-255 representing a specific gray level (0 represents black and 255 corresponds to white). To understand this better refer to the picture shown below (PC: Link). Now you might be wondering what is adding color to the image if each pixel is storing only the gray values. If you observe any documentation online you can see that the color image is actually referred as RGB image or RGBA image. The R,G, B in RGB image refers to the Red, Green and Blue Channels of the image and where each channel corresponds to the 2D grayscale signal with values between 0-255. The A channel in RGBA image represents the alpha channel or the opacity of that pixel. In OpenCV, the image is generally represented as a Matrix in BGR or BGRA format. In our code, we are getting access to the every single frame captured by camera in UIImage format. Hence, in order to do any image processing on these images we have to convert them from UIImage to cv::Mat and do all the processing that is required and send them back as UIImage to view it on the screen.

lincoln_pixel_values

1

  • Add a new file -> ‘Cocoa Touch Class’, name it ‘OpenCVWrapper’ and set language to Objective-C. Click Next and select Create. When it prompted to create bridging header click on the ‘Create Bridging Header’ button. Now you can observe that there are 3 files created with names: OpenCVWrapper.h, OpenCVWrapper.mm, and -Bridging-Header.h. Open ‘-Bridging-Header.h’ and add the following line: #import “OpenCVWrapper.h”
  • Go to ‘OpenCVWrapper.h’ file and add the following lines of code. In this tutorial, let us do the simple RGB to Grayscale conversion.
#import <Foundation/Foundation.h>
#import <UIKit/UIKit.h>

@interface OpenCVWrapper : NSObject

- (UIImage *) makeGray: (UIImage *) image;

@end

  • Rename OpenCVWrapper.m to “OpenCVWrapper.mm” for C++ support and add the following code.
#import "OpenCVWrapper.h"

// import necessary headers
#import <opencv2/core.hpp>
#import <opencv2/imgcodecs/ios.h>
#import <opencv2/imgproc/imgproc.hpp>

using namespace cv;

@implementation OpenCVWrapper

- (UIImage *) makeGray: (UIImage *) image {
    // Convert UIImage to cv::Mat
    Mat inputImage; UIImageToMat(image, inputImage);
    // If input image has only one channel, then return image.
    if (inputImage.channels() == 1) return image;
    // Convert the default OpenCV's BGR format to GrayScale.
    Mat gray; cvtColor(inputImage, gray, CV_BGR2GRAY);
    // Convert the GrayScale OpenCV Mat to UIImage and return it.
    return MatToUIImage(gray);
}

@end

  • Now make some final changes to the ViewController.mm to see the grey scale image on screen.
import UIKit

class ViewController: UIViewController, CameraBufferDelegate {

    var cameraBuffer: CameraBuffer!
    let opencvWrapper = OpenCVWrapper();
    @IBOutlet weak var imageView: UIImageView!

    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.
        cameraBuffer = CameraBuffer()
        cameraBuffer.delegate = self
    }

    func captured(image: UIImage) {
        imageView.image = opencvWrapper.makeGray(image)
    }

    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }

}
  • Here are the final screenshots of the working app. Hope you enjoyed this blog post. 🙂

Computer Vision in iOS – Core Camera

Computer Vision on mobile is fun! Here are the few reasons why I personally love computer vision on mobile when compared to traditional desktop based systems.

  1. You need not have to buy a web camera or high resolution camera which should be connected to computer through USB cable.
  2. You generally connect your webcam through a USB cable, so the application you are designing is restricted for testing only inside the circumference of the circle whose radius == length of the cable 😛 .
  3. If you want your system portable you might have to buy a Raspberry Pi or Arduino and connect your webcam to it for doing some processing on the frames it fetches. (My roommates & besties during my bachelors has done some extensive coding on microprocessors and micro controllers, and I clearly know how hard it is.)
  4. If I want to escape the above mentioned step and still want to make my system portable, I literally have to carry the CPU with me 😛

While discussing about the disadvantages of doing CV algorithms on traditional desktop systems you might be already inferring the advantages of mobile based pipelines. Mobiles are easily portable, it is fully equipped with CPU, GPU and various DSP modules which can be utilised based upon the application, and it has a high resolution camera 😉 The only disadvantage with the current mobile computer vision is that you can’t directly take the algorithm you designed that works almost real-time on a computer on to a mobile and expect the same results. Optimisation plays a key role in mobile computer vision. Mobile battery is limited, hence energy usage of your algorithm matters! If you are designing a heavy CV based system, you can’t schedule the whole operations on CPU. You might need to come up with some new strategies that can reduce the CPU usage!

By halting the discussion I started for no specific reason 🙄 , let us get into the topic that this blog is actually dedicated to 😀 .

In this blog, I will be designing an application using Swift and initialise the camera without using the OpenCV. The main idea has been taken inspiration from the following article by Boris Ohayon. In this blog I am developing over his idea and customising it for the applications that I will be designing in future. At any point of this blog, if you are clueless about the Camera pipeline, you can read the article (link provided above) and follow along this tutorial.

  • Without wasting any more time create a new ‘Single View Application’ with your desired product name and set the language as ‘Swift’.
  • Add an Image View in the Main.storyboard and reference it in ViewController.swift.
  • Create a new file named CameraBuffer.swift and add the following code
import UIKit
import AVFoundation

protocol CameraBufferDelegate: class {
    func captured(image: UIImage)
}

class CameraBuffer: NSObject, AVCaptureVideoDataOutputSampleBufferDelegate {
    // Initialise some variables
    private var permissionGranted = false
    private let sessionQueue = DispatchQueue(label: "session queue")

    private var position = AVCaptureDevicePosition.back
    private let quality = AVCaptureSessionPreset640x480
    private let captureSession = AVCaptureSession()
    private let context = CIContext()

    weak var delegate: CameraBufferDelegate?

    override init() {
        super.init()
        checkPermission()
        sessionQueue.async { [unowned self] in
            self.configureSession()
            self.captureSession.startRunning()
        }
    }

    private func checkPermission() {
        switch AVCaptureDevice.authorizationStatus(forMediaType: AVMediaTypeVideo) {
        case .authorized:
            permissionGranted = true
        case .notDetermined:
            requestPermission()
        default:
            permissionGranted = false
        }
    }

    private func requestPermission() {
        sessionQueue.suspend()
        AVCaptureDevice.requestAccess(forMediaType: AVMediaTypeVideo) { [unowned self] granted in
            self.permissionGranted = granted
            self.sessionQueue.resume()
        }
    }

    private func configureSession() {
        guard permissionGranted else { return }
        captureSession.sessionPreset = quality
        guard let captureDevice = selectCaptureDevice() else { return }
        guard let captureDeviceInput = try? AVCaptureDeviceInput(device: captureDevice) else { return }
        guard captureSession.canAddInput(captureDeviceInput) else { return }
        captureSession.addInput(captureDeviceInput)

        do {
            var finalFormat = AVCaptureDeviceFormat()
            var maxFps: Double = 0
            let maxFpsDesired: Double = 0 //Set it at own risk of CPU Usage
            for vFormat in captureDevice.formats {
                var ranges      = (vFormat as AnyObject).videoSupportedFrameRateRanges as!  [AVFrameRateRange]
                let frameRates  = ranges[0]
                
                if frameRates.maxFrameRate >= maxFps && frameRates.maxFrameRate <= maxFpsDesired {
                    maxFps = frameRates.maxFrameRate
                    finalFormat = vFormat as! AVCaptureDeviceFormat
                }
            }
            if maxFps != 0 {
                let timeValue = Int64(1200.0 / maxFps)
                let timeScale: Int32 = 1200
                try captureDevice.lockForConfiguration()
                captureDevice.activeFormat = finalFormat
                captureDevice.activeVideoMinFrameDuration = CMTimeMake(timeValue, timeScale)
                captureDevice.activeVideoMaxFrameDuration = CMTimeMake(timeValue, timeScale)
                captureDevice.focusMode = AVCaptureFocusMode.autoFocus
                captureDevice.unlockForConfiguration()
            }
            print(maxFps)
        }
        catch {
            print("Something was wrong")
        }
        
        let videoOutput = AVCaptureVideoDataOutput()
        videoOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "sample buffer"))
        guard captureSession.canAddOutput(videoOutput) else { return }
        captureSession.addOutput(videoOutput)
        guard let connection = videoOutput.connection(withMediaType: AVFoundation.AVMediaTypeVideo) else { return }
        guard connection.isVideoOrientationSupported else { return }
        guard connection.isVideoMirroringSupported else { return }
        connection.videoOrientation = .portrait
        connection.isVideoMirrored = position == .front
    }
    
    private func selectCaptureDevice() -> AVCaptureDevice? {
        return AVCaptureDevice.defaultDevice(withDeviceType: .builtInWideAngleCamera, mediaType: AVMediaTypeVideo, position: position)
    }
    
    private func imageFromSampleBuffer(sampleBuffer: CMSampleBuffer) -> UIImage? {
        guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return nil }
        let ciImage = CIImage(cvPixelBuffer: imageBuffer)
        guard let cgImage = context.createCGImage(ciImage, from: ciImage.extent) else { return nil }
        return UIImage(cgImage: cgImage)
    }
    
    func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
        guard let uiImage = imageFromSampleBuffer(sampleBuffer: sampleBuffer) else { return }
        DispatchQueue.main.async { [unowned self] in
            self.delegate?.captured(image: uiImage)
        }
    }
}
  • And the ViewController.swift file should like this:
import UIKit

class ViewController: UIViewController, CameraBufferDelegate {

    var cameraBuffer: CameraBuffer!

    @IBOutlet weak var imageView: UIImageView!

    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.
        cameraBuffer = CameraBuffer()
        cameraBuffer.delegate = self
    }

    func captured(image: UIImage) {
        imageView.image = image
    }

    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }

}
  • After this, you can build and run the app on your mobile and see that it works like a charm.
  • So what new thing(s) did I implement in the above code? I converted the code into Swift 3.0 supporting format, added a block through which you can set your FPS from 30 to as high as 240. And did rigorous tests to make sure that the camera pipeline will never go beyond the 10% CPU Usage on the iPhone for any realistic application.
  • If your application needs higher FPS, you can set it by changing the variable ‘maxFPSDesired. But change it only if you need FPS greater than 30. By default, the FPS will be between 24-30 (fluctuating) and if you want to force the FPS to a fixed number, it won’t be exactly equal to the number you fix and also the CPU usage increases drastically. But if the application you want to try doesn’t have any other costly computations, you can play with higher FPS.
  • How to count FPS of your app? You can go fancy and code the FPS counter to use in your app. I would suggest you to run the app in profiling mode and choose ‘Core Animation’ in Instruments to check the FPS of your app 😉

TensorFlow Diaries- Intro and Setup

Why a blog that focuses on Computer Vision suddenly shifted to Machine Learning? This might be ringing in the heads of many active readers of my blog. So let me start my blog by answering the above question 😉

It is quite funny to note that Computer Vision was actually started as a summer project in MIT (1966) by Prof. Marvin Minsky. He laid out some tasks to are to be completed as a part of the summer project and it has been what the field is working on for the past 40-50 years. In Computer Vision, we are trying to teach computers how to see and perceive the environment like a human does. This includes recognising objects, understanding complex scenes, locomotion in an environment avoiding obstacles, 3D reconstruction etc., and a lot of these require Machine Learning algorithms along with 3D Geometry and applied mathematics. So, Computer Vision actually requires knowledge of Machine Learning! Even if you observe the top conference publications in Computer Vision such as CVPR and ICCV, you can see a lot of papers using Machine Learning and Deep Learning tasks to produce some state of the results. Rest is History! Assuming you all know what Deep Learning is, let us dive into the Intro to TensorFlow and how to setup your machine.

Why TensorFlow?

You might be wondering why I chose this specific library over other existing libraries such as Keras, Theano, Torch etc. Though TensorFlow is launched as an open-source software recently into the market, it has been accepted by the community in huge numbers. Many people are using it in their development environments/pipeline to design and train custom networks for solving complex real-world problems, researchers have also started using it because of its flexibility to design and test new types of networks, some universities have already included TensorFlow in their Machine Learning course curriculum and so on. The reason why I chose TensorFlow is the following: Once I design the network in Python+TensorFlow, I can easily export it to any platform or production level software (in my case, Android and iOS apps) and test them in real time.

But for doing any of those fancy tasks, the first thing to begin with is setting up your development environment for TensorFlow. I have carefully tested every possible scenario and documenting the best and easiest way to setup environment by keeping in view that you will be playing with the state-of-the-art networks.

In order to proceed further, you should be having a Linux based system with Ubuntu 16.04 OS on it. Sorry Windows and Mac users, you can also follow the steps given below but I can’t guarantee any solutions to any problems that might rise during installation. 🙂

  • Install Anaconda: Download Anaconda for Python 3.6 version (64 Bit installer) from this link and follow the instructions on their webpage to install it on your OS. After the whole process is completed you might have to restart your system to check that installation has been successfully completed. Open Terminal and type ‘python’ and check the output:
$ python
Python 3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>>
  • The main advantage of Anaconda is that it comes with a list of packages during the time of installation. And we can also create separate working environment for every project that we work so that we can overcome the package version conflict. So let us start creating the environment for our Deep Learning projects. Open terminal and create an environment named ‘dl’ (or you can use any name of your choice). Note: To use TensorFlow you need to use Python 3.5. The speciality of Anaconda is that you can specify your python version for the environment while creating it (even if your anaconda version is Python 3.6)
$ conda create -n dl python=3.5
  • Enter environment and download the necessary packages:
$ source activate dl
(dl)$ conda install pandas matplotlib jupyter notebook scipy scikit-learn
  • If you observe, I didn’t include ‘numpy’ in the list of packages that I installed in the environment. This is because TensorFlow installs the Numpy along with its installation. Now installing TensorFlow is two types. You can either install TensorFlow with CPU support or GPU support. Enter the below command in the terminal to install TensorFlow CPU version:
(dl)$ pip install --upgrade \
https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp35-cp35m-linux_x86_64.whl
  • If you want to install TensorFlow GPU version, please make sure that you have the CUDA 8.0 and CuDNN 5.1 setup in your machine along with their environment variables. Then write the following command instead of the above command:
(dl)$ pip install --upgrade \
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-linux_x86_64.whl
  • With this you will have the system ready for your Machine Learning/Deep Learning development. For Computer Vision + Machine Learning, you might need to install OpenCV package in python. Follow these instructions in case you want to develop computer vision applications using OpenCV and Python.
(dl)$ conda install -c https://conda.binstar.org/menpo opencv3
  • For people who love playing with depth sensors and want to do real-time deep learning, stay tuned to my next blog post on librealsense and Intel Realsense cameras. For starters, if you have already installed librealsense from source, you can integrate inside python using the pyrealsense package. You can easily integrate it in your anaconda environment using the following command.
(dl)$ pip install pyrealsense
  • Finally you can exit from your environment using ‘source deactivate’ and enter it using ‘source activate ‘ commands.

In this blog, you have created your work environment for working on Machine Learning using TensorFlow. In my next series of blogs, I will give you introduction to some basic networks in Computer Vision so that you can learn how to code networks in TensorFlow along with implementing Deep Learning to Computer Vision.

OpenCV in Android – Native Development (C++)

Hello World!

In my previous blogs, I introduced you to how you can setup the OpenCV for doing Computer Vision Applications in Android platform. In this post, let us do some C++ coding to develop our own custom filters to apply on the images we capture. In case if you are visiting this blog for the first time, and want to learn how to setup development pipeline for Android platform, you can visit my previous blogs (links provided below):

  1. OpenCV in Android – An Introduction (Part 1/2)
  2. OpenCV in Android – An Introduction (Part 2/2)

If you don’t want to read those blogs and start diving into developing computer vision applications, then you can download the source code for this blog from the following link: https://github.com/r4ghu/OpenCVAndroid-AnIntroduction.

Get ready to code!

  • Let us first warm-up our-selves by implementing a very simple edge detection filter.
  • Create a new empty activity named ‘EdgeDetection’ and add most of the code from the OpenCVCamera.java file into it. At the end, the EdgeDetection.java file should look like this:
package com.example.sriraghu95.opencvandroid_anintroduction;

import android.support.v7.app.AppCompatActivity;
import android.os.Bundle;
import android.util.Log;
import android.view.SurfaceView;
import android.view.WindowManager;

import org.opencv.android.BaseLoaderCallback;
import org.opencv.android.CameraBridgeViewBase;
import org.opencv.android.LoaderCallbackInterface;
import org.opencv.android.OpenCVLoader;
import org.opencv.core.Mat;

public class EdgeDetection extends AppCompatActivity implements CameraBridgeViewBase.CvCameraViewListener2 {

    private static final String TAG = "EdgeDetection";
    private CameraBridgeViewBase cameraBridgeViewBase;

    private BaseLoaderCallback baseLoaderCallback = new BaseLoaderCallback(this) {
        @Override
        public void onManagerConnected(int status) {
            switch (status) {
                case LoaderCallbackInterface.SUCCESS:
                    cameraBridgeViewBase.enableView();
                    break;
                default:
                    super.onManagerConnected(status);
                    break;
            }
        }
    };

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        getWindow().addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);
        setContentView(R.layout.activity_opencv_camera);
        cameraBridgeViewBase = (CameraBridgeViewBase) findViewById(R.id.camera_view);
        cameraBridgeViewBase.setVisibility(SurfaceView.VISIBLE);
        cameraBridgeViewBase.setCvCameraViewListener(this);
    }

    @Override
    public void onPause() {
        super.onPause();
        if (cameraBridgeViewBase != null)
            cameraBridgeViewBase.disableView();
    }

    @Override
    public void onResume(){
        super.onResume();
        if (!OpenCVLoader.initDebug()) {
            Log.d(TAG, "Internal OpenCV library not found. Using OpenCV Manager for initialization");
            OpenCVLoader.initAsync(OpenCVLoader.OPENCV_VERSION_3_1_0, this, baseLoaderCallback);
        } else {
            Log.d(TAG, "OpenCV library found inside package. Using it!");
            baseLoaderCallback.onManagerConnected(LoaderCallbackInterface.SUCCESS);
        }
    }

    public void onDestroy() {
        super.onDestroy();
        if (cameraBridgeViewBase != null)
            cameraBridgeViewBase.disableView();
    }

    @Override
    public void onCameraViewStarted(int width, int height) {

    }

    @Override
    public void onCameraViewStopped() {

    }

    @Override
    public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {
        return inputFrame.gray();
    }
}
  • The only change we did until now in the above code is that we replaced ‘return inputFrame.rgba()’ in onCameraFrame method into ‘return inputFrame.gray()’. Now add the following code into ‘activity_edge_detection.xml’ file for setting the layout of the window in our app.
<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"      xmlns:tools="http://schemas.android.com/tools"      xmlns:opencv="http://schemas.android.com/apk/res-auto"      android:id="@+id/activity_opencv_camera"      android:layout_width="match_parent"      android:layout_height="match_parent"      tools:context="com.example.sriraghu95.opencvandroid_anintroduction.EdgeDetection">

    <org.opencv.android.JavaCameraView          android:layout_width="match_parent"          android:layout_height="match_parent"          android:visibility="gone"          android:id="@+id/camera_view"          opencv:show_fps="true"          opencv:camera_id="any"/>

</RelativeLayout>
  • Add the following lines of code to ‘AndroidManifest.xml’ to specify the theme of our activity.
<activity android:name=".EdgeDetection"      android:screenOrientation="landscape"      android:theme="@style/Theme.AppCompat.Light.NoActionBar.FullScreen">
</activity>
  • Now, just add a button to the MainActivity which opens EdgeDetection activity and run the app on your mobile and test it. Now you should be seeing a grayscale image at ~30 FPS running on your mobile 🙂
    Screenshot_20170420-043738
  • But wait, we didn’t code any C++ until now! Let us write our first custom function in ‘native-lib.cpp’ file. In high level understanding, this function should take an image, do some processing on it and return it to show us on the screen. The general skeleton of a C++ native code looks like this:
extern "C"
JNIEXPORT void JNICALL Java_com_example_sriraghu95_opencvandroid_1anintroduction_EdgeDetection_detectEdges(
        JNIEnv*, jobject /* this */,
        jlong gray) {
    cv::Mat& edges = *(cv::Mat *) gray;
    cv::Canny(edges, edges, 50, 250);
}
  • It starts with a linkage specification extern “C”, JNIEXPORT & JNICALL, return data-types (here, void), method name (), and input data-types (here, jlong). In this scenario, we are passing memory location of the image to reduce the unnecessary duplication. We then applied the cv::Canny to do the edge detection on the image. Feel free to browse through the hyper-links and read more about them. Explaining those concepts is beyond the scope of this blog and I might explain them in detail in my future blogs.
  • We need to add few lines of code inside the onCameraFrame method of EdgeDetection.java to apply the edge detection on every frame of the image. Also, add a line below the OnCameraFrame method referring to detectEdges method.
    @Override
    public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {
        Mat edges = inputFrame.gray();
        detectEdges(edges.getNativeObjAddr());
        return edges;
    }

    public native void detectEdges(long matGray);
  • Now, build the project and test it on your mobile! The results should look like this!

Screenshot_20170420-050722

  • With this, you have the basic setup for C++ development in Android using OpenCV. Let us go a step further and design a simple filter that will produce a Cartoon effect of the image. Without explaining much details, the C++ code for the same should look like this.
extern "C"
JNIEXPORT void JNICALL Java_com_example_sriraghu95_opencvandroid_1anintroduction_EdgeDetection_cartoonify(
        JNIEnv*, jobject /* this */,
        jlong gray, jlong rgb) {
    const int MEDIAN_BLUR_FILTER_SIZE = 7;
    const int LAPLACIAN_FILTER_SIZE = 5;
    const int EDGES_THRESHOLD = 30;
    int repetitions = 5;
    int kSize = 9;
    double sigmaColor = 9;
    double sigmaSpace = 7;

    cv::Mat& edges = *(cv::Mat *) gray;
    cv::medianBlur(edges, edges, MEDIAN_BLUR_FILTER_SIZE);
    cv::Laplacian(edges, edges, CV_8U, LAPLACIAN_FILTER_SIZE);
    cv::Mat mask; cv::threshold(edges, mask, EDGES_THRESHOLD, 255, CV_THRESH_BINARY_INV);

    cv::Mat& src = *(cv::Mat *) rgb;
    cv::cvtColor(src,src,CV_RGBA2RGB);
    cv::Size size = src.size();
    cv::Size smallSize;
    smallSize.width = size.width/4;
    smallSize.height = size.height/4;
    cv::Mat smallImg = cv::Mat(smallSize, CV_8UC3);
    resize(src, smallImg, smallSize, 0, 0, CV_INTER_LINEAR);

    cv::Mat tmp = cv::Mat(smallSize, CV_8UC3);

    for(int i=0; i<repetitions;i++){
        bilateralFilter(smallImg, tmp, kSize, sigmaColor, sigmaSpace);
        bilateralFilter(tmp, smallImg, kSize, sigmaColor, sigmaSpace);
    }

    cv::Mat bigImg;
    resize(smallImg, bigImg, size, 0, 0, CV_INTER_LINEAR);
    cv::Mat dst; bigImg.copyTo(dst,mask);
    cv::medianBlur(dst, src, MEDIAN_BLUR_FILTER_SIZE-4);
}
  • After writing the above piece of code in native-lib.cpp file you can call it in your own custom class and see the results. Here is a screenshot of the above code’s result:

Screenshot_20170425-134718.png

  • The above filter is actually trying to create a cartoon effect of the captured image.

Things to ponder:

In this blog you have seen how to integrate custom C++ code into your application. But if you carefully observe, the simple cartoon filter is consuming a lot of computation time and frame-rate for the same is ~1.2 FPS. Can you think of how to speed up algorithm and come up with a better algorithm to do the same task in real-time? Think about it 😉

 

OpenCV in Android – An Introduction (Part 2/2)

In my previous post, I explained how to integrate OpenCV on Android. In this post, let us integrate camera into our app to do some live testing in future. If you are visiting this blog for the first time, I recommend you to read OpenCV in Android – An Introduction (Part 1/2) before reading the current blog. By the end of this blog you will be having your basic app ready for testing any of your Computer Vision Algorithms on the images that you acquire from camera!

  • In order to use camera in our app, we need to give permissions for our app to access camera in the mobile. Open ‘app/src/main/AndroidManifest.xml’ and add the following lines of code.
    <uses-permission android:name="android.permission.CAMERA" />

    <supports-screens android:resizeable="true"
        android:smallScreens="true"
        android:normalScreens="true"
        android:largeScreens="true"
        android:anyDensity="true" />

    <uses-feature
        android:name="android.hardware.camera"
        android:required="false" />
    <uses-feature
        android:name="android.hardware.camera.autofocus"
        android:required="false" />
    <uses-feature
        android:name="android.hardware.camera.front"
        android:required="false" />
    <uses-feature
        android:name="android.hardware.camera.front.autofocus"
        android:required="false" />
  • Let us add a button in our main activity to navigate to a new activity that uses a camera. Add the following code to ‘src/main/res/layout/activity_main.xml’
    <Button
        android:text="OpenCV Camera"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:layout_alignParentBottom="true"
        android:id="@+id/cameraInit"/>
  • After adding the button, create an intent to a new activity named ‘OpenCVCamera’ in your  MainActivity class by adding the following code.
        // Button to call OpenCV Camera Activity
        Button cameraInit = (Button) findViewById(R.id.cameraInit);
        cameraInit.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
                Intent i = new Intent(getApplicationContext(),OpenCVCamera.class);
                startActivity(i);
            }
        });
  • Now add a new Empty Activity by Right Click -> New -> Activity -> Empty Activity. Name the activity as OpenCVCamera. Edit the layout of your new activity to add camera view by using the code below.
<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    xmlns:opencv="http://schemas.android.com/apk/res-auto"
    android:id="@+id/activity_opencv_camera"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:paddingBottom="@dimen/activity_vertical_margin"
    android:paddingLeft="@dimen/activity_horizontal_margin"
    android:paddingRight="@dimen/activity_horizontal_margin"
    android:paddingTop="@dimen/activity_vertical_margin"
    tools:context="com.example.sriraghu95.opencvandroid_anintroduction.OpenCVCamera">

    <org.opencv.android.JavaCameraView
        android:layout_width="match_parent"
        android:layout_height="match_parent"
        android:visibility="gone"
        android:id="@+id/camera_view"
        opencv:show_fps="true"
        opencv:camera_id="any"/>

</RelativeLayout>

  • Now add the following code into your OpenCVCamera.java file to see some action. After adding the following code try running the app on your device. I will explain the specifics in the later part of this blog.
package com.example.sriraghu95.opencvandroid_anintroduction;

import android.support.v7.app.AppCompatActivity;
import android.os.Bundle;
import android.util.Log;
import android.view.SurfaceView;
import android.view.WindowManager;

import org.opencv.android.BaseLoaderCallback;
import org.opencv.android.CameraBridgeViewBase;
import org.opencv.android.LoaderCallbackInterface;
import org.opencv.android.OpenCVLoader;
import org.opencv.core.Mat;

public class OpenCVCamera extends AppCompatActivity implements CameraBridgeViewBase.CvCameraViewListener2 {

    private static final String TAG = "OpenCVCamera";
    private CameraBridgeViewBase cameraBridgeViewBase;

    private BaseLoaderCallback baseLoaderCallback = new BaseLoaderCallback(this) {
        @Override
        public void onManagerConnected(int status) {
            switch (status) {
                case LoaderCallbackInterface.SUCCESS:
                    cameraBridgeViewBase.enableView();
                    break;
                default:
                    super.onManagerConnected(status);
                    break;
            }
        }
    };

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        getWindow().addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);
        setContentView(R.layout.activity_opencv_camera);
        cameraBridgeViewBase = (CameraBridgeViewBase) findViewById(R.id.camera_view);
        cameraBridgeViewBase.setVisibility(SurfaceView.VISIBLE);
        cameraBridgeViewBase.setCvCameraViewListener(this);
    }

    @Override
    public void onResume(){
        super.onResume();
        if (!OpenCVLoader.initDebug()) {
            Log.d(TAG, "Internal OpenCV library not found. Using OpenCV Manager for initialization");
            OpenCVLoader.initAsync(OpenCVLoader.OPENCV_VERSION_3_1_0, this, baseLoaderCallback);
        } else {
            Log.d(TAG, "OpenCV library found inside package. Using it!");
            baseLoaderCallback.onManagerConnected(LoaderCallbackInterface.SUCCESS);
        }
    }
    @Override
    public void onCameraViewStarted(int width, int height) {

    }

    @Override
    public void onCameraViewStopped() {

    }

    @Override
    public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {
        return inputFrame.rgba();
    }
}
  • If everything works fine, your screen should like the figure below. If your app shows a warning related to Camera Permissions, try going to settings and make sure that the camera permissions for the app is enabled. 🙂
  • But what is exaclty happening here? First you imported some necessary android and OpenCV classes for your app. To allow OpenCV to communicate with android camera functionalities, we implmented CvCameraViewListener2. The variable ‘CameraBridgeViewBase cameraBridgeViewBase’ acts as a bridge between camera and OpenCV. BaseLoaderCallback will give us information about whether OpenCV is loaded in our app or not. We also need some helper functions onResume, onCameraViewStarted, onCameraViewStopped and onCameraFrame to handle the events of the app.
  • With this you are ready with the basic set up of your development environment for Computer Vision application development in Android. I made some final edits to the app to make the camera view into Full Screen Activity and added some more event handlers. The code for the same can be accessed through the following github repo – LINK !

What’s next? In the next blog, I will discuss about how we can write our own custom C++ code for doing fun computer vision experiments using OpenCV on Android!

OpenCV in Android – An Introduction (Part 1/2)

Hello world! I am very excited to write this particular blog on the setup of OpenCV in Android Studio. There are many solutions there online which include setting up OpenCV using Eclipse, Android NDK etc but I didn’t find a single reliable source for doing the same setup using Android Studio. So, we (Me and V.Avinash) finally came up with a feasible solution with which you can setup Native Development setup in Android environment for designing Computer Vision applications using OpenCV and C++!!!

A quick intro about me, I am a Computer Vision enthusiast with nearly 4 years of theoretical and practical experience in the field. That said, I am quite good at implementing CV algorithms on Matlab and Python. But with years, the same field has been developing rapidly from the mere academic interest to industrial interest. But most of the standard algorithms in this field are not really optimized to run in real-time (60 FPS) or not designed specifically for the mobile platform. This has caught my interest and I have been working on this since the Summer 2016. I think about various techniques and hacks for optimizing the existing algorithms for mobile platform and how to acquire (and play with) 3D data from the 2D camera during my free time from being a research assistant.

Before starting this project, I am assuming that you already have basic setup of Android Studio up and running on your machines and you have decent experience working on it.

  • If you don’t already have Android Studio, you can download and install it from the following link.
  • Once you have the Android Studio up and running, you can download OpenCV for Android from the following link. After downloading, extract the contents from the zip file and move it to a specific location. Let it be ‘/Users/user-name/OpenCV-android-sdk’. I am currently using Android Studio v2.2.3 and OpenCV v3.2
  • Now start the Android Studio and click on ‘Start a new Android Studio project’. This will open a new window. Specify your ‘Application Name’, ‘Company Domain’ and ‘Project Location’. Make sure you select the checkbox ‘Include C++ Support‘. Now click Next!
  • In the ‘Targeted Android Devices’ window, select ‘Phone and Tablet’ with Minimum SDK: ‘API 21: Android 5.0 (Lollipop)’. Click Next.screen-shot-2017-02-27-at-6-07-56-pm
  • In the Activity selection window select ‘Empty Activity’ and click Next.screen-shot-2017-02-27-at-6-34-08-pm
  • In the Activity customization window leave everything as it is without any edits and click Next.screen-shot-2017-02-27-at-6-37-17-pm
  • In the Customize C++ Support, select C++ Standard: Toolchain Default and leave all the other checkboxes unchecked (for now, but you are free to experiment) and click Finish!
  • The Android Studio will take some time to load the project with necessary settings. Since you are developing an app that depends on Camera of your mobile, you can’t test these apps on an emulator. You need to connect your Android Phone (with developer options enabled) to computer and select the device when you pressed the debug option. After running the application, you should see the following on your mobile if everything works fine!screenshot_20170227-184805
  • At this point of the project you have your basic native-development (C++ support) enabled in your app. Now let us start integrating OpenCV into your application.
  • Click on File -> New -> Import Module. In the pop-up window, give path to your ‘OpenCV-android-sdk/sdk/java’ directory and click on OK. You can find the module name as ‘openCVLibrary320’ and click Next, Finish to complete the importing.
  • Now, go to “openCVLibrary320/build.gradle” and change the following variables to those in the “app/build.gradle”: compileSdkVersion, buildToolsVersion, minSdkVersion, and targetSdkVersion. Sync the project after editing the gradle files. My “openCVLibrary320/build.gradle” file looks like this!
apply plugin: 'com.android.library'

android {
    compileSdkVersion 25
    buildToolsVersion "25.0.2"

    defaultConfig {
        minSdkVersion 21
        targetSdkVersion 25
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.txt'
        }
    }
}
  • Add a new folder named ‘jniLibs’ to “app/src/main/” by right click -> New -> Directory. Copy and paste the directories in the ‘OpenCV-android-sdk/sdk/native/libs/’ to jniLibs folder in your app. After the import, remove all *.a files from your imported directories. At the end, you should have 7 directories with libopencv_java3.so files in them.Screen Shot 2017-02-28 at 8.08.39 AM.png
  • Now go to ‘app/CMakeLists.txt’ and link the OpenCV by doing the following (Refer to those lines following the [EDIT] block for quick changes):
# library. You should either keep the default value or only pass a
# value of 3.4.0 or lower.

cmake_minimum_required(VERSION 3.4.1)

# [EDIT] Set Path to OpenCV and include the directories
# pathToOpenCV is just an example to how to write in Mac.
# General format: /Users/user-name/OpenCV-android-sdk/sdk/native
set(pathToOpenCV /Users/sriraghu95/OpenCV-android-sdk/sdk/native)
include_directories(${pathToOpenCV}/jni/include)

# Creates and names a library, sets it as either STATIC
# or SHARED, and provides the relative paths to its source code.
# You can define multiple libraries, and CMake builds it for you.
# Gradle automatically packages shared libraries with your APK.

add_library( # Sets the name of the library.
             native-lib

             # Sets the library as a shared library.
             SHARED

             # Provides a relative path to your source file(s).
             # Associated headers in the same location as their source
             # file are automatically included.
             src/main/cpp/native-lib.cpp )

# [EDIT] Similar to above lines, add the OpenCV library
add_library( lib_opencv SHARED IMPORTED )
set_target_properties(lib_opencv PROPERTIES IMPORTED_LOCATION /Users/sriraghu95/Documents/Projects/ComputerVision/OpenCVAndroid-AnIntroduction/app/src/main/jniLibs/${ANDROID_ABI}/libopencv_java3.so)

# Searches for a specified prebuilt library and stores the path as a
# variable. Because system libraries are included in the search path by
# default, you only need to specify the name of the public NDK library
# you want to add. CMake verifies that the library exists before
# completing its build.

find_library( # Sets the name of the path variable.
              log-lib

              # Specifies the name of the NDK library that
              # you want CMake to locate.
              log )

# Specifies libraries CMake should link to your target library. You
# can link multiple libraries, such as libraries you define in the
# build script, prebuilt third-party libraries, or system libraries.

target_link_libraries( # Specifies the target library.
                       native-lib

                       # Links the target library to the log library
                       # included in the NDK.
                       ${log-lib} lib_opencv) #EDIT

  • Edit the ‘app/build.gradle’ set the cppFlags and refer to jniLibs source directories and some other minor changes, you can refer to the code below and replicate the same for your project. All new changes made on the pre-existing code are followed by comments “//EDIT”.
apply plugin: 'com.android.application'

android {
    compileSdkVersion 25
    buildToolsVersion "25.0.2"
    defaultConfig {
        applicationId "com.example.sriraghu95.opencvandroid_anintroduction"
        minSdkVersion 21
        targetSdkVersion 25
        versionCode 1
        versionName "1.0"
        testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
        externalNativeBuild {
            cmake {
                cppFlags "-std=c++11 -frtti -fexceptions" //EDIT
                abiFilters 'x86', 'x86_64', 'armeabi', 'armeabi-v7a', 'arm64-v8a', 'mips', 'mips64' //EDIT
            }
        }
    }

    //EDIT
    sourceSets {
        main {
            jniLibs.srcDirs = ['/Users/sriraghu95/Documents/Projects/ComputerVision/OpenCVAndroid-AnIntroduction/app/src/main/jniLibs'] //EDIT: Use your custom location to jniLibs. Path given is only for example purposes.
        }
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.pro'
        }
    }
    externalNativeBuild {
        cmake {
            path "CMakeLists.txt"
        }
    }
}

dependencies {
    compile fileTree(dir: 'libs', include: ['*.jar'])
    androidTestCompile('com.android.support.test.espresso:espresso-core:2.2.2', {
        exclude group: 'com.android.support', module: 'support-annotations'
    })
    compile 'com.android.support:appcompat-v7:25.1.1'
    testCompile 'junit:junit:4.12'
    compile project(':openCVLibrary320') //EDIT
}
  • Once you are done with all the above steps, do sync the gradle and go to src/main/cpp/native-lib.cpp . To make sure that the project setup is done properly, start including OpenCV files in native-lib.cpp and it should not raise any errors.
#include <jni.h>
#include <string>
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/features2d/features2d.hpp>

extern "C"
jstring
Java_com_example_sriraghu95_opencvandroid_1anintroduction_MainActivity_stringFromJNI(
        JNIEnv *env,
        jobject /* this */) {
    std::string hello = "Hello from C++";
    return env->NewStringUTF(hello.c_str());
}
  • Now make sure all your gradle files are in perfect sync and Rebuild the project once to check there are no errors in your setup.

By the end of this blog, we finished setting up OpenCV in your android project. This is a pre-requisite for any type of android application you want to build using OpenCV. Considering that there will be two types of possibilities using OpenCV in your application: a) Doing processing on images from your own personal library on mobiles and b) Doing real-time processing on live-feed from camera, I think this is best place to stop this part of the blog.

In my next post, I will be focusing on how to use camera in your application and do some simple processing on the data that you acquire from it.

Next: OpenCV in Android – An Introduction (Part 2/2)

Source Code: Link

[New] Android Application: Link

OpenCV in iOS – Face Detection

Hi, after a quite break I am back to my blog to post some new things related to optimized computer vision algorithms on mobile platform. I have been experimenting with android recently to come up with an easiest setup for OpenCV to start developing (I will be posting about it in my next blog). In this post, I will be explaining how to do Face detection in almost real time using OpenCV’s Haar cascades. This is not an advanced tutorial on detection/object recognition but it will help you to start working on your custom classification problems. Let us dive in!

A quick note before diving in, this blog expects that you have already read my previous blogs on OpenCV in iOS (An Introduction, The Camera)so that you can have the starter code up and running.

In this blog post, we are going to detect the faces and eyes from live video stream of your iOS device’s camera. Now start following the steps mentioned below!

  1. Import necessary frameworks into the project: opencv2, AVFoundation, Accelerate, CoreGraphics, CoreImage, QuartzCore, AssetsLibrary, CoreMedia, and UIKit frameworks.
  2. Rename ViewController.m to ViewController.mm to start coding in Objective-C++.
  3. Add necessary haarcascade files from ‘<opencv-folder>/data/haarcascades/’ directory into your supporting files directory of Project. You can do this by right-click on Supporting Files and select ‘Add files to <your-project name>’
  4. Open ViewController.mm and start adding the following lines of code for enabling Objective-C++ and let us also define some colors to draw to identify faces and eyes on the image.screen-shot-2017-02-24-at-3-30-57-pm
  5. Now you need to edit the ViewController interface to initialise the parameters for live view, OpenCV wrappers to get camera access through AVFoundation and Cascade Classifiers.screen-shot-2017-02-24-at-3-34-34-pm
  6. In the ViewController implementation’s viewDidLoad method write the following code to setup the OpenCV view.screen-shot-2017-02-24-at-3-40-07-pm
  7. The tricky part is reading the Cascade classifiers inside the project. Follow the steps suggested below to do the same and start the videoCamera!screen-shot-2017-02-24-at-3-45-32-pm
  8. Once the videoCamera is started, each image has to be processed inside the processImage method! Screen Shot 2017-02-25 at 7.27.15 PM.png
  9. Now the code is complete! Please note that I am not covering specific math topics behind the Haar-Cascades detection as I feel there are so many blogs out there which can explain it really good. For code related to this blog, you can contact me via E-mail (Contact). The screenshot of the execution of my code is placed below!

    img_1742
    Screen Shot

How to publish your game online?

This blog is intended for those people who are aspiring game designers and want to create an online protfolio of their games. I have recently experimented a lot on Unity developing games. But what is the point in developing a game, if others (atleast your friends) don’t get a chance to play your game? I tried searching online for the ways in which I can host my game online for free. All the techniques that I saw involve spending some money from your pocket and some techniques involve uploading your game to dropbox/google drive and share the link to people for playing it online. All these techniques didn’t attract me much. I tried to find a way in which people can play my game right in their system browser. After discussing with some of my friends and searching online if github can support it, I finally came up with a feasible solution!

For those who are familiar with Unity 3D, you should be knowing that thorugh Unity you can build a game to any type of platform that you want. You also should need a github account for hosting your game.

  1. After completing your game, go to File -> Build Settings…<\br>screen-shot-2016-11-05-at-4-02-01-pm
  2. This will take you to a new window named ‘Build Settings’. Select WebGL -> Select all the scenes that you want in your build -> and click on ‘Build’. <\br>screen-shot-2016-11-05-at-4-05-30-pm
  3. Type the name of the folder with the name you want to call your game (per say <gameNameFolder>) in the ‘Save As’ section and click on Save button. By default this folder will be saved under the folder where your game is saved.
  4. Open a terminal and navigate to the <gameNameFolder> location.
    • cd <gameNameFolder>/
    • open index.html
  5. The above commands should open your game in a new tab in your default browser. Check if your game is working properly and if it does, let us move to next step.
  6. Upload the <gameNameFolder> as a new repository into github (refer github docs).
  7. After successfully uploading your project on github, open terminal and write the following commands.
    • cd <gameNameFolder>/
    • git checkout -b gh-pages
    • git push origin gh-pages
  8. That’s it… You can find your game in the ‘https:// <your-github-username>.github.io/<gameNameFolder>’.
  9. Happy Gaming and share the links anywhere in the world to your friends to play in their web browser 🙂

OpenCV in iOS – The Camera

Hello everyone, this is my second blog post on ‘OpenCV in iOS’ series. Before starting this tutorial, it is recommended that you complete the ‘OpenCV in iOS – An Introduction‘ tutorial. In this blog post, I will be explaing how to use the camera inside your iOS app. For setting up the application in Xcode, please complete till step 6 in ‘OpenCV in iOS – An Introduction‘ tutorial before you proceed to the below mentioned steps!

  1. In this app, we need some additional frameworks to include in our project. They are listed as follows –
    • Accelerate.framework
    • AssetsLibrary.framework
    • AVFoundation.framework
    • CoreGraphics.framework
    • CoreImage.framework
    • CoreMedia.framework
    • opencv2.framework
    • QuartzCore.framework
    • UIKit.framework
  2. We already know how to add ‘opencv2.framework‘ from the previous blog post. I will go through the process of how to add one of the above mentioned frameworks (e.g: AVFoundation.framework), likewise you can add the rest. To add ‘AVFoundation.framework‘, go to ‘Linked Frameworks and Libraires‘ and click on the ‘+’ sign. Choose the ‘AVFoundation.framework‘ and click on ‘Add‘.

    Screen Shot 2016-07-23 at 11.36.53 pm

  3. Now your project navigator area should like this.

    Screen Shot 2016-07-24 at 10.36.20 pm

  4. It’s time to make our hands dirty! 🙂 Open ‘ViewController.h‘ and write the following lines of code.

    Screen Shot 2016-07-24 at 10.38.20 pm

  5. Now go to ‘ViewController.mm‘ file and add some lines to include C++ code along with the Objective-C code.

    Screen Shot 2016-07-24 at 10.55.23 pm

  6. Let us initialise some variables for getting the camera access and for live output from camera.

    Screen Shot 2016-07-24 at 10.58.52 pm

  7. Now setup the live view such that it fills the whole app screen.

    Screen Shot 2016-07-24 at 5.36.54 pm.png

  8. Initialise the Camera parameters and start capturing! 

    Screen Shot 2016-07-24 at 11.02.01 pm.png

  9. But wait! we still have to do one more step before actually testing our app. If you observe the line “@implementation ViewController”, you will find a warning “Method ‘processImage:’ in protocol ‘CvVideoCameraDelegate’ not implemented”. To know more about CvVideoCameraDelegate, refer this link. Coming back to our tutorial, we have to add the following lines of code to overcome that warning.

    Screen Shot 2016-07-24 at 11.19.11 pm

  10. And now we are ready to run the app! In this application, we have to use the iPad/iPhone to run and test our application because we have to access the camera of the device. Now we can see the live view of our camera! 🙂 

    IMG_1713.jpg

  11.  Lets give some basic instaTouch! to our app 😉 .Add the following lines of code in the ‘processImage‘ method and run the application on your device.

    This slideshow requires JavaScript.

With this we are coming to an end of this tutorial! 🙂 We have learnt how to access Camera inside the app and apply some live operations on the video. Though this is a basic tutorial, this will act as a precursor for many Augmented Reality type applications! 😀 We will try to get into next level of development of Computer Vision apps in our next tutorial! Still then stay tuned… 🙂 Feel free to comment your suggestions/doubts related to this tutorial.

The SOURCE CODE for the following tutorial is available at the following GITHUB LINK.