Tuesday 4 March 2014

Clearing the haar

Hi folks,
             In this edition of the blog I will be writing about a unique feature of openCV, haar classifier training. This methodology allows us to train machines to recognize any object of our choice by learning features of the object. Though the method is pretty easy to execute and will end up with good results, the process is extremely time consuming. For our project, we are interested in getting the machine to recognize human palm. This will be useful for us to define gestures through continuous detection. After the previous post on Introduction to openCV, where we have explored many possibilities for hand gesture detection, we settled on using haar cascade training method.

Now how do we train a machine and make it self learn to recognize a particular object? This sure sounds very intriguing. This can be achieved by telling the machine how the object looks like and what the object does NOT look like and by also by teaching it about any special features of the object that stands out. Now once the machine knows  the features of the object, this machine trained data can be used on the input images to check the presence of the object.

As mentioned earlier, our intention is to detect human palm. Hence we collected around 80 positive images, all cropped to the same ratio(0.64) and around 1000 negative images. Its important that we crop all the positive images to the same ratio and also they must contain only the object of interest preferably with different background and lighting. Negative images must not contain the object of interest. More the number of image samples, better the results! (Takes longer time to train too!). Once we had the images, we were all set to start the training process. The steps we followed is derived from this blog which proved to be extremely helpful. The width and height  used for training was 64 and 100 respectively. Here are a few examples of positive images

 

and negative images included everything(well almost!) that is not human palm.

Steps to get training started:

Assuming that openCV is installed, the necessary scripts can be found here :
     git clone https://github.com/Srip/HaarTraining.git

Below are the steps for training :
1. Organize the data
    Save all the positive images in a directory, say positive_images and all the negative images in negative_images.

2. Once we have the images, all cropped to the same ratio lets save the relative paths of all images to a file.

    find ./positive_images -iname "*.jpg" >postives.txt
    find ./negative_images -iname "*.jpg" > negatives.txt

3. Next step is to create samples from the existing positive images (2000 samples for our cascade classifier).  For this I will be using a perl script found in this blog and also a tool from openCV "opencv_createsamples".Both these scripts can be found in the repository mentioned earlier.

    mkdir samples
    perl createsamples.pl positives.txt negatives.txt samples 2000 "opencv_createsamples -bgcolor 0\
    -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 64 -h 100"

4. Once the samples are created, we will need to merge these samples into a single *.vec file. For this purpose, we use the mergevec.cpp file.

    cp mergevec.cpp ~/opencv-2.4.6.1/apps/haartraining
    cd ~/opencv-2.4.6.1/apps/haartraining
    g++ -I. -o mergevec mergevec.cpp cvboost.cpp cvcommon.cpp cvsamples.cpp\
    cvhaarclassifier.cpp cvhaartraining.cpp `pkg-config --libs --cflags opencv`

5. Once the execution is done, take the executable to the directory where all the scripts and images are stored. This can be used to merge all the *.vec files generated in the samples directory.
 
    find ./samples -name '*.vec' > samples.txt
    ./mergevec samples.txt samples.vec

6. Now for the final training command. Opencv_traincascade is the tool used to training the
machine to detect objects. The meaning of the various arguments in the command can be
understood here. Looking at what the parameter does, appropriate values can be set.

    mkdir classifier
    opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt -numStages 20\
    -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 1500 -numNeg 992 -w 64 -h 100\
    -mode ALL -precalcValBufSize 2048 -precalcIdxBufSize 3072

The width and height parameters should have the same value as used when creating samples. Now this is going to take ALOT (about a week!) of time to execute and the final result will be in the form of an xml document. Since an xml file is generated for each stage of the classifier and later it is combined to give a single file, it is called as cascade classifier. A great advantage of this is that the program can be stopped at any point and on restarting the training continues from the last stopped stage.

Conclusion:

  The training process took around 9 days to complete . The results are not bad, but the rate of false positives is quite high. Hence we have decided the train the machine for the third time with more positive and negative images. As for the results, here is a screenshot.


Until next time.


Useful links:

Complete blog for haar training can be found here and here
Important documentation of openCV on  cascade-classifier and cascade classifier training.



Saturday 22 February 2014

Introducing Multicast Addressing and Device Discovery

Hey folks,

               Our next pit stop here introduces you to the concept of Multicast Addressing and how we plan to implement the very concept of it for sending and receiving data among various devices. 
             Imagine a project group leader assigns a task to his group and all the other members abide him and execute the instruction? After that, each person reports to everyone else in the group during a formal meeting about their progress over time during the course of the project. In a lay man’s language, that is what exactly is the concept of multicast. 





In depth about multicasting 

Here’s to bring to you a few insights about multicast addressing in technical terms..

             As our research goes and with respect to Wiki sources, multicast addressing is a logical identifier for a group of hosts in a computer network, that are available to process datagrams or frames intended to be multicast for a designated network service.
             Multicast addressing can be used in the Link Layer (Layer 2 in the OSI model), such as Ethernet multicast, and at the Internet Layer (Layer 3 for OSI) for Internet Protocol Version 4 (IPv4) or Version 6 (IPv6) multicast.  The group range for addresses varies from 224.0.0.0 to 239.255.255.255. 
So anyone can take one of the addresses in the range and form a group of their own. However a few addresses are reserved for specific purposes which you can find here

To talk more in detail about the various functionality aspects of different multicast address (each in specific), the above link would provide an in detail explanation of the same.









Device Discovery Protocol


Here are a few FAQ's which may help you on your way through... 

Question 1 :  Whom do we send the data to?
Answer      :  Whoever I want to!

Question 2 : What information do I need to know about whom to send the data to? 
Answer      :  Get to know their IP address within the local multicast group. 

What now, once I get to know the IP address....?

Question 3How to send the data to the IP address? 
Answer     Socket programming.


Now, we begin with a step by step implementation of our device Discovery protocol...
 

# Step 1 : Device Discovery to fetch IP Address 

import socket
import sys
import struct
from select import *
import subprocess , platform



# The commands array list takes care of recognising the Operating system (key) and executing the specific command (value) to get our own IP address


commands = {
    'Darwin': "ifconfig  | grep -E 'inet.[0-9]' | grep -v '127.0.0.1' | awk '{ print $2}'",
    'Linux': "/sbin/ifconfig  | grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2 | awk '{ print $1}'"
}


# Protocol messages to identify the beginning and ending of the device discovery protocol 
JOINING_MESSAGE = "Hi"

CLOSING_MESSAGE = "Bye"


# device discovery forks a sub process and returns the IP address
def get_ip_address():

    proc = subprocess.Popen( commands[platform.system()], shell=True, stdout=subprocess.PIPE ) 

    return proc.communicate()[0].replace('\n', '')



# Step 2 : Creating a socket to start communicating with other devices


Assuming we have set MCAST_GRP = '224.3.29.X' and MCAST_PORT = 10000

#Create a  datagram socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

AF_INET constant represent the address (and protocol) families, used for the first argument to socket(). 
SOCK_DGRAM constants represent the socket type as datagram, used for the second argument to socket().

The bind() system call is used to specify the association (local address, local port)


sock.bind(('', MCAST_PORT))
mreq = struct.pack("4sl", socket.inet_aton(MCAST_GROUP), socket.INADDR_ANY)
sock.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, mreq)




# Step 3 : Creating a list to dynamically populate it with IP addresses in the local network

multicast_group_ip = []
t=1

while True:
            t = t+t;
            if t > 10:

                t = 1

We create 3 lists : read, write and error list and pass the arguments to the select api.
The last argument is a timeout value. 
rlist , wlist, xlist = select([sock] , [] , [sock] ,t)


Note the use of a select API here.
The select api is a blocking and so we give it a timeout according to our needs.
The protocol we implement here for the timeout parameter is that of additive increase. Initially, when time=1 sec, it sends a multicast message. The next message is sent at t=2 sec. The consecutive messages are sent at t=4 sec and t=8 sec, following which it crosses a certain threshold of t=10 sec and we revert it back to t=1 sec.




if rlist:
          # Read from list : We wait to receive messages from other active users 
          print >>sys.stderr, '\nwaiting to receive message'
          data, address = sock.recvfrom(1024)

           #Print the received data
        #Once data is received, send an acknowledgement message back to the sender
           print >>sys.stderr, 'received %s bytes from %s' % (len(data), address)
           print >>sys.stderr, 'sending acknowledgement to', address
           sock.sendto('ack', address) 


else:
            # Send joining message to the multicast group
            print >>sys.stderr, 'sending "%s"' % joining_message
            sent = sock.sendto(joining_message, (MCAST_GRP,MCAST_PORT))






# Step 4 : Close the socket

# Send a closing message to the group indicating you are leaving the group
sock.sendto(closing_message,(MCAST_GRP,MCAST_PORT))
print "Sending closing message",closing_message
sock.close()



We hope to have delivered the crux logic of the program.
You too can implement your own multicast client-server programs with an ease.

Feel free to leave comments and share our blog on Facebook or Google+.

Till then, stay tuned folks ! :)





Wednesday 12 February 2014

Diving into OpenCv for Object Detection

OpenCV- Vision to computers O_O So shall we say that computers are new humans?


Could Computers and human interact through Gestures? Indeed yes and in fact, in many ways. So we thought of simple hand movements to get responses from computer.
Check few interesting tutorial here which we went through.

Object Detection Using OpenCv

So the next question that arises is HOW? How to detect object(hand)?

*Motion detection through Background subtraction technique.By the difference in frames motion could be detected. This difference can be used to get the shape of object in motion. This has few disadvantages and mainly cannot detect object in its static position




*Disparity mapping
Disparity refers to the difference in location of an object in corresponding two  images as seen by the left and right eye which is created due to parallax  of stereo cameras .This technique is very useful to detect the object depth, like hand which will be closest to camera during gestures. A very useful and highly accurate and may be the next big thing in Computer-Vision with cameras being replaced with kinects and other such powerful devices. However this technique is not possible with the web-camera presently available with the computers.



*Haar-Cascade classifiers
This idea, proposed by Viola and Jones, is used to rapidly detect any object, like human faces, eyes, hands and many more using AdaBoost classifier cascades that are based on Haar-like features and not pixels. This was the stepping stone in face detection. A haar-like feature contains a detection window. This window contains rectangular regions at specific locations. The pixel intensities are summed up in each region and the difference between these sums are calculated. This difference is used to detect subsections of the image. The target window is then moved over the input image and for each subsection and haar-like feature is calculated. This difference is then compared with the learned value that seperates the objects from non-objects.



Implementation of Haarcascades


Before continuing, we are talking about just the built-in camera and not Kinect or Stereo Camera or any other external sources.

We worked on the above mentioned ideas to get to know them well, may not be completely, but let us tell you it was indeed very helpful.As I already mentioned haar gives better accuracy and its the built-in camera that is being used. So we are building our own haar-like feature for hand and using it to detect hand(open palm).
By using the technique of Detection and Tracking, simple gestures can be defined and corresponding action can be taken. There are many other techniques available for detection of objects like based on color and lot more to explore and learn. Tracking can be done using Meanshift, Camshift and other such techniques which will be explained in the forthcoming posts.

Stay in touch folks cause we will be shortly telling how to make your own haar and about tracking techniques.
Until then see-ya.





Saturday 1 February 2014

When it all began

Hi folks,

Human-computer interaction has always been an  area where a lot of interesting  work  has been carried out with an intention to provide users an interface that is more intuitive to use than the existing ones. As we think of intuitive interaction with machines, the first thought that strikes our minds is, "why not interact with machines as we do with each other?". To enable this, we intend to build an application that will help us make everyday tasks easy to execute and more humanly with less machine interaction.


The ideation phase 


Three people, different interests, loads of hypothetical ideas along with a few feasible ones. Picking an idea to implement was a challenge. 
How many of you use pen drives or hard disks or USB cables or even cluttered mail inbox to transfer files across devices?  I think a lot of us do. As a solution to the above question  we finalized on developing an application which can recognize human gestures as commands, interpret it and perform tasks such as documents, images and other media file sharing across devices. As simple as that!

Opening with openCV


Human gesture recognition needs image processing. When searching the internet for gestures and image processing the best results that pop up are openCV and MatLab. Since openCV was open source, it meant we could start our project right away.  Installation of openCV was a smooth process.  We followed this link.  For a group of people who are very new to image processing, understanding the bulk of openCV APIs can be challenging. Walking through  sample codes and also few interesting projects that we found on internet,  we are slowly getting acquainted with it. This is just the beginning. There is lot more to be explored, experimented and learnt  :) 

Find a few interesting related work here : Swp ,  Flutter ,  MIT Media labs TUI