Hire the author: Martin K

Image citation: here, GitHub link for this project: Computer Vision


Computer vision is the process of interpreting a video or image feed into a series of data points. The computer then uses these data points to identify and understand what is happening in the video or image. Computer Vision needs to be able to analyze and identify features in an image, such as shapes, colors, textures and patterns.

There are two things that need to be considered when developing an app for image recognition using computer vision: quality and quantity. Having high-quality images with good lighting will improve the performance of the algorithm and allow it to have high accuracy rates. However, it’s also important that there are enough images to work with in order for the algorithm to make any predictions or analyses on them.

First, it’s important to understand what image recognition is. Image recognition is the process of identifying objects and scenes in images.

Two Methods

Roughly speaking, below are two major ways to start with image recognition:

1) Using an open-source model from a deep learning library

2) Doing it manually

We can create deep learning models that run on mobile devices using TensorFlow Lite (TFLite). In fact, TFLite models are tailored for mobile and edge deployment expressly for this purpose. After a deep learning model is created in TensorFlow, developers can use the TensorFlow Lite Converter to convert that model to a format that runs in mobile devices.

Using a custom dataset and a convolutional neural network, we’ll develop a TensorFlow model for detecting photos on Android in this lesson (CNN).

Rather than building a CNN from the ground up, we’ll leverage a pre-trained model and apply transfer learning to tailor it for our new dataset. MobileNetV1 is the pre-trained model we’ll use, and we’ll train a version of it on the app. We’ll use the TFLite converter to produce a mobile-ready model variation once we’ve developed our TensorFlow model with transfer learning. After that, the model will be used in an Android app that detects photographs taken by the camera. The project created as a result of this tutorial may be found on Github. It has been thoroughly tested and is up and running.

Photo Preview


What is Computer Vision?

It is a branch of artificial intelligence that deals with how computers can understand the contents of images. Computer vision techniques are used in various aspects and fields such as image recognition, video surveillance, digital pathology, medical imaging, robotics etc.

Developing an image recognition app is not easy. First, you need to have certain computer vision skills. Secondly, you need to know how to program using a specific language like Python or C++. And lastly, you will need to understand the basics of machine learning algorithms such as neural networks.

What is Image Recognition?

Image recognition is a computer vision task that identifies and categorizes different components of photos and movies. Image recognition models are taught to take an image as input and label it with one or more labels. The set of possible output labels is represented by target classes. Image recognition models can also generate a confidence score, which reflects how confident the model is that an image belongs to a particular class.
If you wanted to build an image recognition model that could automatically detect whether or not a dog was present in a given image, for example, the pipeline might look like this:

  • Image recognition model that has been trained on photos labelled as “dog” or “not dog”
  • Input to the model: Image
  • Model output: A confidence score indicating the chance of that image including that class of object (i.e. dog).

For more information about Tensor flow lite for image recognition in android click here

Step by Step Procedure

Now I’ll walk you through the steps of creating your app. I’m assuming you’ve already downloaded Android Studio and are familiar with the fundamentals of Kotlin.
For the demo app, I’ve created a Github repository. You may either follow along with this tutorial or simply clone the GitHub source. Use this code in any way you like. Keep in mind that sharing is a fantastic thing to do. I’d love to see your most recent Android TensorFlow projects on Github.

Step 1: Requirements for Tensorflow

The following is a list of the components required to do image recognition in Android.

  • Dependency: TensorFlow preview (you can also use the latest version but the preview works just fine)
  • Initialization of Tensorflow (method in ClassifierActivity)
  • TensorFlowImageClassifier
  • Classifier Interface

You can easily copy the Classifier Interface and Classifier Activity. To view these files, follow this link. Later in this tutorial, I’ll give you more detailed instructions. So let’s get started.

Step 2: Adding the required dependencies

It’s always a good idea to start by gathering the essential dependencies. We’ll need the following items for this use case:

  • TensorFlow Lite Android library
  • Bunch of CameraKit dependencies.

So, in your module’s build.gradle file, add the following dependencies:

apply plugin: 'com.android.application'
android {
compileSdkVersion 28
defaultConfig {
applicationId "com.kimani.ktflite"
minSdkVersion 21
targetSdkVersion 28
versionCode 1
versionName "1.0"
testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
buildTypes {
release {
minifyEnabled false
proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.pro'
aaptOptions {
noCompress "tflite"
noCompress "lite"
dependencies {
implementation fileTree(dir: 'libs', include: ['*.jar'])
implementation 'com.android.support:appcompat-v7:28.0.0'
implementation 'com.android.support:exifinterface:28.0.0'
implementation 'com.wonderkiln:camerakit:0.13.0'
implementation 'org.tensorflow:tensorflow-lite:+'
testImplementation 'junit:junit:4.12'
androidTestImplementation 'com.android.support.test:runner:1.0.2'
androidTestImplementation 'com.android.support.test.espresso:espresso-core:3.0.2'
implementation "org.jetbrains.kotlin:kotlin-stdlib-jdk7:$kotlin_version"
implementation 'com.wang.avi:library:2.1.3'
apply plugin: 'kotlin-android'
apply plugin: 'kotlin-android-extensions'
view raw build.gradle hosted with ❤ by GitHub


To quote Google: “CameraKit is a Jetpack support library that makes it easy to create camera apps.”
It does, in fact, provide a set of helpful APIs for interacting with the device’s camera, making developing software for all the many sorts of camera hardware across the notoriously fragmented Android landscape a little easier. CameraKit makes it simple to include a dependable camera into your app. Our open-source camera infrastructure ensures consistent capture results, scalability, and a wide range of camera alternatives.

Set up

The following step will be to create a layout.
For the purposes of this essay, I’ll suppose that your app has exactly one activity and one goal in life: to serve as a camera preview. So, open the layout file for your activity and add the following view.

android:layout_gravity="center" />

(assume your layout file’s root tag is ConstraintLayout) 

This is the view that will show the camera preview on the user’s screen, as you could have predicted. Let’s keep the screen orientation in portrait mode, for now, to keep things simple. Within the AndroidManifest.xml file, locate the activity tag and add the screenOrientation attribute to it:

<activity android:name="com.kimani.ktflite.AppActivity"

Let’s move on to the activity now that we’ve completed the difficult phase.

Step 3: Initialization of Tensorflow in Classifier activity

You must call the methods listed below in the Classifier Activity class’s “on create method.” At the start of your program, Tensorflow is initialized. To accomplish this, paste the following code into the Classifier class. If you don’t understand the code, please post a comment.

import android.content.res.AssetManager
import android.graphics.Bitmap
import org.tensorflow.lite.Interpreter
import java.io.BufferedReader
import java.io.FileInputStream
import java.io.IOException
import java.io.InputStreamReader
import java.lang.Float
import java.nio.ByteBuffer
import java.nio.ByteOrder
import java.nio.MappedByteBuffer
import java.nio.channels.FileChannel
import java.util.*
class Classifier(
var interpreter: Interpreter? = null,
var inputSize: Int = 0,
var labelList: List<String> = emptyList()
) : IClassifier {
companion object {
private val MAX_RESULTS = 3
private val BATCH_SIZE = 1
private val PIXEL_SIZE = 3
private val THRESHOLD = 0.1f
fun create(assetManager: AssetManager,
modelPath: String,
labelPath: String,
inputSize: Int): Classifier {
val classifier = Classifier()
classifier.interpreter = Interpreter(classifier.loadModelFile(assetManager, modelPath))
classifier.labelList = classifier.loadLabelList(assetManager, labelPath)
classifier.inputSize = inputSize
return classifier
override fun recognizeImage(bitmap: Bitmap): List<IClassifier.Recognition> {
val byteBuffer = convertBitmapToByteBuffer(bitmap)
val result = Array(1) { ByteArray(labelList.size) }
interpreter!!.run(byteBuffer, result)
return getSortedResult(result)
override fun close() {
interpreter = null
private fun loadModelFile(assetManager: AssetManager, modelPath: String): MappedByteBuffer {
val fileDescriptor = assetManager.openFd(modelPath)
val inputStream = FileInputStream(fileDescriptor.fileDescriptor)
val fileChannel = inputStream.channel
val startOffset = fileDescriptor.startOffset
val declaredLength = fileDescriptor.declaredLength
return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength)
private fun loadLabelList(assetManager: AssetManager, labelPath: String): List<String> {
val labelList = ArrayList<String>()
val reader = BufferedReader(InputStreamReader(assetManager.open(labelPath)))
while (true) {
val line = reader.readLine() ?: break
return labelList
private fun convertBitmapToByteBuffer(bitmap: Bitmap): ByteBuffer {
val byteBuffer = ByteBuffer.allocateDirect(BATCH_SIZE * inputSize * inputSize * PIXEL_SIZE)
val intValues = IntArray(inputSize * inputSize)
bitmap.getPixels(intValues, 0, bitmap.width, 0, 0, bitmap.width, bitmap.height)
var pixel = 0
for (i in 0 until inputSize) {
for (j in 0 until inputSize) {
val `val` = intValues[pixel++]
byteBuffer.put((`val` shr 16 and 0xFF).toByte())
byteBuffer.put((`val` shr 8 and 0xFF).toByte())
byteBuffer.put((`val` and 0xFF).toByte())
return byteBuffer
private fun getSortedResult(labelProbArray: Array<ByteArray>): List<IClassifier.Recognition> {
val pq = PriorityQueue(
Comparator<IClassifier.Recognition> { (_, _, confidence1), (_, _, confidence2) -> Float.compare(confidence1, confidence2) })
for (i in labelList.indices) {
val confidence = (labelProbArray[0][i].toInt() and 0xff) / 255.0f
if (confidence > THRESHOLD) {
pq.add(IClassifier.Recognition("" + i,
if (labelList.size > i) labelList[i] else "Unknown",
val recognitions = ArrayList<IClassifier.Recognition>()
val recognitionsSize = Math.min(pq.size, MAX_RESULTS)
for (i in 0 until recognitionsSize) {
return recognitions
view raw Classifier.kt hosted with ❤ by GitHub

These Methods run as a thread so you should put “classifier.interpreter = Interpreter(classifier.loadModelFile(assetManager, modelPath))” at the top of this class. The variables are explained below. These are the appropriate parameters for our model.
Layers in Convolutional Neural Networks assist in conducting mathematical calculations on images.d layers are among the layers. Transfer learning allows you to employ pre-trained network model designs that are compatible with standard dataset images. So start by constructing your own network, but you’ll soon see that pre-trained networks perform significantly better. Begin by using some of the pre-trained models.

Step 4: Adding the App Activity

We will start by adding an executor on which the camera image analyzer (more on this in just a sec, or two, depending on how fast you read) will run the processing code; add the following line on top of the activity:

private val executor = Executors.newSingleThreadExecutor()
view raw AppActivity.kt hosted with ❤ by GitHub

In the activity’s onDestroy method, we’ll also shut it down whenever it’s no longer needed:

override fun onDestroy() {
executor.execute { classifier.close() }
view raw AppActivity.kt hosted with ❤ by GitHub

You might’ve noticed, dear reader, that startCamera() function is nowhere to be found in our code yet. Don’t worry, I haven’t forgotten about it. But first, let’s talk about something else.

Image analyzer

We need a way to process the photos we get from the camera in order to do anything interesting with them unless you consider showing a camera preview intriguing enough. This is when the Classifier interface comes into play.

You must call the method “initTensorFlowAndLoadModel” in the App Activity class’s “on create” method. At the start of your program, Tensorflow is initialized. To accomplish this, paste the following code into the App Activity class. If you don’t understand the code, please post a comment.

private fun initTensorFlowAndLoadModel() {
executor.execute {
try {
classifier = Classifier.create(
} catch (e: Exception) {
throw RuntimeException("Error initializing TensorFlow!", e)
view raw AppActivity.kt hosted with ❤ by GitHub

You should use “Executors.newSingleThreadExecutor()” at the top of this class because this approach functions similarly to a thread. In the TensorFlow image recognition tutorial, the variables are explained. These are the appropriate parameters for our model.

package com.kimani.ktflite
import android.app.Dialog
import android.graphics.Bitmap
import android.os.Bundle
import android.support.v7.app.AppCompatActivity
import android.text.method.ScrollingMovementMethod
import android.view.LayoutInflater
import android.view.View
import android.view.Window
import android.widget.Button
import android.widget.ImageView
import android.widget.TextView
import com.wonderkiln.camerakit.*
import java.util.concurrent.Executors
class AppActivity : AppCompatActivity() {
lateinit var classifier: Classifier
private val executor = Executors.newSingleThreadExecutor()
lateinit var textViewResult: TextView
lateinit var btnDetectObject: Button
lateinit var btnToggleCamera:Button
lateinit var imageViewResult: ImageView
lateinit var cameraView: CameraView
override fun onCreate(savedInstanceState: Bundle?) {
cameraView = findViewById(R.id.cameraView)
imageViewResult = findViewById<ImageView>(R.id.imageViewResult)
textViewResult = findViewById(R.id.textViewResult)
textViewResult.movementMethod = ScrollingMovementMethod()
btnToggleCamera = findViewById(R.id.btnToggleCamera)
btnDetectObject = findViewById(R.id.btnDetectObject)
val resultDialog = Dialog(this)
val customProgressView = LayoutInflater.from(this).inflate(R.layout.result_dialog_layout, null)
val ivImageResult = customProgressView.findViewById<ImageView>(R.id.iViewResult)
val tvLoadingText = customProgressView.findViewById<TextView>(R.id.tvLoadingRecognition)
val tvTextResults = customProgressView.findViewById<TextView>(R.id.tvResult)
// The Loader Holder is used due to a bug in the Avi Loader library
val aviLoaderHolder = customProgressView.findViewById<View>(R.id.aviLoaderHolderView)
cameraView.addCameraKitListener(object : CameraKitEventListener {
override fun onEvent(cameraKitEvent: CameraKitEvent) { }
override fun onError(cameraKitError: CameraKitError) { }
override fun onImage(cameraKitImage: CameraKitImage) {
var bitmap = cameraKitImage.bitmap
bitmap = Bitmap.createScaledBitmap(bitmap, INPUT_SIZE, INPUT_SIZE, false)
aviLoaderHolder.visibility = View.GONE
tvLoadingText.visibility = View.GONE
val results = classifier.recognizeImage(bitmap)
tvTextResults.text = results.toString()
tvTextResults.visibility = View.VISIBLE
ivImageResult.visibility = View.VISIBLE
override fun onVideo(cameraKitVideo: CameraKitVideo) { }
btnToggleCamera.setOnClickListener { cameraView.toggleFacing() }
btnDetectObject.setOnClickListener {
tvTextResults.visibility = View.GONE
ivImageResult.visibility = View.GONE
resultDialog.setOnDismissListener {
tvLoadingText.visibility = View.VISIBLE
aviLoaderHolder.visibility = View.VISIBLE
override fun onResume() {
override fun onPause() {
override fun onDestroy() {
executor.execute { classifier.close() }
private fun initTensorFlowAndLoadModel() {
executor.execute {
try {
classifier = Classifier.create(
} catch (e: Exception) {
throw RuntimeException("Error initializing TensorFlow!", e)
private fun makeButtonVisible() {
runOnUiThread { btnDetectObject.visibility = View.VISIBLE }
companion object {
private const val MODEL_PATH = "mobilenet_quant_v1_224.tflite"
private const val LABEL_PATH = "labels.txt"
private const val INPUT_SIZE = 224
view raw AppActivity.kt hosted with ❤ by GitHub

Future Directions

Naturally, the code presented in this article is only scratching the surface- you can do many more things with the image detection, for instance, you could use the bounding box of the detected text.

(To get the image’s position on the screen, use image.boundingBox) and make a variety of augmented reality apps; you can go in a little different path and use other ML Kit APIs by simply changing the image analyzer code. The sky is the limit!

Learning Strategies and Tools

First, I had to learn about the following key areas before developing the Image recognition app in android i.e. TensorFlow Lite model format to create a deep convolutional neural network that works. I did this by using TensorFlow to transfer the learning of a pre-trained model to a fresh dataset.

This made it easier for the model generated by this transfer learning method to be translated into a TensorFlow Lite model, which was used to predict the class labels of fresh photos in the Android Studio project. I am now able to set up all the TensorFlow models eloquent relationship inbuilt and use them efficiently.

With the use of this multipurpose machine learning framework,  TensorFlow can be used anywhere from training huge models across clusters in the cloud to running models locally on an embedded system like your phone. This article taught us how to use TensorFlow Lite to run an image recognition model on an Android device.

Reflective Analysis

After implementing this project my theoretical knowledge related to computer vision specifically image recognition in android has increased because I was visually able to see what was happening. The most typical error individuals make is inserting strings in training data instead of integers, which causes the neural network to fail to train the data and throw an error.

We may evaluate the recall and precision for each label by storing the number of true positives, false positives, true negatives, and false negatives obtained from the model’s predictions. Precision is defined as the ratio of two numbers.


Thank you for your time if you got this far, hope you found the article useful – if you did, and are interested in more tutorials around this topic, leave a comment!
Remember to check out the code here Github link 

Hire the author: Martin K