The Imaginative Universal

Studies in Virtual Phenomenology -- @jamesashley

Helpful vs Creepy Face Recognition

February 13
by James Ashley 13. February 2013 14:03

mr_mall

One of the interesting potential commercial uses for the Kinect for Windows sensor is as a realtime tool for collecting information about people passing by.  The face detection capabilities of the Kinect for Windows SDK lends itself to these scenarios.  Just as Google and Facebook currently collect information about your browsing habits, Kinects can be set up in stores and malls to observe you and determine your shopping habits.

There’s just one problem with this.  On the face of it, it’s creepy.

To help parse what is happening in these scenarios, there is a sophisticated marketing vocabulary intended to distinguish “creepy” face detection from the useful and helpful kind.

First of all, face detection on its own does little more than detect that there is a face in front of the camera.  The face detection algorithm may go even further and break down parts of the face into a coordinate system.  Even this, however, does not turn a particular face into an token that can be indexed and compared against other faces. 

Turning an impression of a face into some sort of hash takes us to the next level and becomes face recognition rather than merely detection.  But even here there is parsing to be done.  Anonymous face recognition seeks to determine generic information about a face rather than specific, identifying information.  Anonymous face recognition provides data about a person’s age and gender – information that is terribly useful to retail chains. 

Consider that today, the main way retailers collect this information is by placing a URL at the bottom of a customer’s receipt and asking them to visit the site and provide this sort of information when the customer returns home.  The fulfillment rate on this strategy is obviously horrible.

Being able to collect these information unobtrusively would allow retailers to better understand how inventory should be shifted around seasonally and regionally to provide customers with the sorts of retail items they are interested in.  Power drills or perfume?  The Kinect can help with these stocking questions.

But have we gotten beyond the creepy factor with anonymous face recognition?  It actually depends on where you are.  In Asia, there is a high tolerance for this sort of surveillance.  In Europe, it would clearly be seen as creepy.  North America, on the other hand, is somewhere between Europe and Asia on privacy issues.  Anonymous face recognition is non-creepy if customers are provided with a clear benefit from it – just as they don’t mind having ads delivered to their browsers as long as they know that getting ads makes other services free.

Finally, identity face recognition in retail would allow custom experiences like the virtual ad delivery system portrayed in the mall scene from The Minority Report.  Currently, this is still considered very creepy.

At work, I’ve had the opportunity to work with NEC, IBM and other vendors on the second kind of face recognition.  The surprising thing is that getting anonymous face recognition working correctly is much harder than getting full face recognition working.  It requires a lot of probabilistic logic as well as a huge database of faces to get any sort of accuracy when it comes to demographics.  Even gender is surprisingly difficult.

Identity face recognition, on the other hand, while challenging, is something you can have in your living room if you have an XBox and a Kinect hooked up to it.  This sort of face recognition is used to log players automatically into their consoles and can even distinguish different members of the same family (for engineers developing facial recognition software, it is an irritating quirk of fate that people who look alike also tend to live in the same house).

If you would like to try identity face recognition out, you can try out the Luxand Face SDK.  Luxand provides a 30-day trial license which I tried out a few months ago.  The code samples are fairly good.  While Luxand does not natively support Kinect development, it is fairly straightforward to turn data in the Kinect’s rgb stream into images which can then be compared against other images using Luxand.

I used Luxand’s SDK to compare anyone standing in front of the Kinect sensor with a series of photos I had saved.  It worked fairly well, but unfortunately only if one stood directly in front of the sensor and about a foot or two in front of it (which wasn’t quite what we needed at the time).  The heart of the code is provided below.  It simply takes color images from Kinect and compares it against a directory of photos to see if a match can be found.  It could be used as part of a system for unlocking a computer when the proper user stands in front of it (though you can probably think of better uses – just try to avoid being creepy).

void _sensor_ColorFrameReady(object sender
    , ColorImageFrameReadyEventArgs e)
{
    using (var frame = e.OpenColorImageFrame())
    {
        var image = frame.ToBitmap();
        this.image2.Source = image.ToBitmapSource();
        LookForMatch(image);
    }
 
}
 
private bool LookForMatch(System.Drawing.Bitmap currentImage)
{
 
        if (currentImage == null)
            return false;
        IntPtr hBitmap = currentImage.GetHbitmap();
        try
        {
        FSDK.CImage image = new FSDK.CImage(hBitmap);
        FSDK.SetFaceDetectionParameters(false, false, 100);
        FSDK.SetFaceDetectionThreshold(3);
        FSDK.TFacePosition facePosition = image.DetectFace();
        if (facePosition.w != 0)
        {
 
            FaceTemplate template = new FaceTemplate();
            template.templateData = 
                ExtractFaceTemplateDataFromImage(image);
            bool match = false;
            FaceTemplate t1 = new FaceTemplate();
            FaceTemplate t2 = new FaceTemplate();
            float best_match = 0.0f;
            float similarity = 0.0f;
            foreach (FaceTemplate t in faceTemplates)
            {
 
                t1 = t;
                FSDK.MatchFaces(ref template.templateData
                    , ref t1.templateData, ref similarity);
                float threshold = 0.0f;
                FSDK.GetMatchingThresholdAtFAR(0.01f
                    , ref threshold);
 
                if (similarity > best_match)
                {
                    this.textBlock1.Text = similarity.ToString();
                    best_match = similarity;
                    t2 = t1;
                    if (similarity > _targetSimilarity)
                        match = true;
                }
            }
            if (match && !_isPlaying)
            {
                return true;
            }
            else
            {
                return false;
            }
        }
        else
            return false;
    }
    finally
    {
        DeleteObject(hBitmap);
        currentImage.Dispose();
    }
}
 
private byte[] ExtractFaceTemplateDataFromImage(FSDK.CImage cimg)
{
    byte[] ret = null;
    Luxand.FSDK.TPoint[] facialFeatures;
    var facePosition = cimg.DetectFace();
    if (0 == facePosition.w)
    {
    }
    else
    {
        bool eyesDetected = false;
        try
        {
            facialFeatures = 
                cimg.DetectEyesInRegion(ref facePosition);
            eyesDetected = true;
        }
        catch (Exception ex)
        {
            return cimg.GetFaceTemplateInRegion(ref facePosition);
        }
 
        if (eyesDetected)
        {
            ret = 
                cimg.GetFaceTemplateUsingEyes(ref facialFeatures);
        }
        else
        {
            ret = cimg.GetFaceTemplateInRegion(ref facePosition);
        }
    }
    return ret;
    cimg.Dispose();
}
 

Tags:

Kinect

3Gear Systems Kinect Handtracking API Unwrapping

October 25
by James Ashley 25. October 2012 17:35

WP_000542

I’ve been spending this last week setting up the rig for the beta hand detection API recently published by 3Gear Systems.  There’s a bit of hardware required to position the two Kinects correctly so they face down at a 45 degree angle.  The Kinect mounts from Amazon arrived within a day and were $6 each with free shipping since I never remember to cancel my Prime membership.  The aluminum parts from 80/20 were a bit more expensive but came to just a little above $100 with shipping.  We already have lots of Kinects around the Razorfish Emerging Experiences Lab, so that wasn’t a problem.

WP_000543

80/20 surprisingly doesn’t offer a lot of instruction on how to put the parts of the aluminum frame together so it took me about half-an-hour of trial-and-error to figure it out.  Then I found this PDF explaining what the frame should end up looking like deep-linked on the 3Gear website and had to adjust the frame to get the dimensions correct.

WP_000544

I wanted to use the Kinect for Windows SDK and, after some initial mistakes, realized that I needed to hook up our K4W Kinects rather than the Kinect for Xbox Kinects to do that.  When using OpenNI rather than K4W (the SDK supports either) you can use either the Xbox Kinect or the Xtion sensor.

My next problem was that although the machine we were building on has two USB Controllers, one of them wasn’t working, so I took a trip to Fry’s and got a new PCI-E USB Controller which ended up not working.  So on the way home I tracked down a USB Controller from a brand I recognized, US Robotics, and tried again the next day.  Success at last!

WP_000545

Next I started going through the setup and calibration steps here.  It’s quite a bit of command line voodoo magic and requires very careful attention to the installation instructions – for instance, install the C++ redistributable and Java SE.

WP_000546

After getting all the right software installed I began the calibration process.  A paper printout of the checkerboard pattern worked fine.  It turns out that the software for adjusting the angle of the Kinect sensor doesn’t work if the sensor is on its side facing down so I had to click-click-click adjust it manually.  That’s always a bit of a scary sound.

WP_000554

Pretty soon I was up and running with a point cloud visualization of my hands.  The performance is extremely good and the rush from watching everything working is incredible.

WP_000556

Of the basic samples, the rotation_trainer programmer is probably the most cool.  It allows one to rotate a 3D model around the Y-axis as well as around the X-axis.  Just this little sample opens up a lot of cool possibilities for HCI design.

WP_000557

From there my colleagues and I moved on to the C++ samples.  According to Chris Twigg from 3Gear, this 3D chess game (with 3D physics) was written by one of their summer interns.  If an intern can do this in a month … you get the picture.

I’m fortunate to get to do a lot of R&D in my job at Razorfish – as do my colleagues.  We’ve got home automation parts, arduino bits, electronic textiles, endless Kinects, 3D walls, transparent screens, video walls, and all manner of high tech toys around our lab.  Despite all that, playing with the 3Gear software has been the first time in a long time that we have had that great sense of “gee-whiz, we didn’t know that this was really possible.”

Thanks, 3Gear, for making our week!

Tags: , ,

Kinect

Two Years of Kinect

October 17
by James Ashley 17. October 2012 12:41

As we approach the second anniversary of the release of the Kinect sensor, it seems appropriate to take inventory of how far we have come. Over the past two months, I have had the privilege of being introduced to several Kinect-based tools and demos that exemplify the potential of the Kinect and provide an indication of where the technology is headed.

restOnDesk

One of my favorites is a startup in San Francisco called 3Gear Systems. 3Gear have conquered the problem of precise finger detection by using dual Kinects. Whereas the original Kinect was very much a full-body sensor intended for bodies up to twelve feet away from the camera, 3Gear have made the Kinect into a more intimate device. The user can pick up digital objects in 3D space, move them, rotate them, and even draw free hand with her finger. The accuracy is amazing. The founders, Robert Wang, Chris Twigg and Kenrick Kin, have just recently released a beta of their finger-precise gesture detection SDK for developers to try out and instructions on purchasing and assembling a rig to take advantage of their software. Here’s a video demonstrating their setup and the amazing things you will be able to do with it.

oblong

Mastering the technology is only half the story, however. Oblong Industries has for several years been designing the correct gestures to use in a post-touch world. This TED Talk by John Underkoffler, Oblong’s Chief Scientist, demonstrates their g-speak technology using gloves to enable precision gesturing. Lately they’ve taken off the gloves in order to accomplish similar interactions using Kinect and Xtion sensors. The difficulty, of course, is that gestural languages can have accents just as spoken languages do. Different people perform the same gesture in different ways. On top of this, interaction gestures should feel intuitive or, at least, be easy for users to discover and master. Oblong’s extensive experience with gestural interfaces has aided them greatly in overcoming these types of hurdles and identifying the sorts of gestures that work broadly.

brekel-kinect-pro-face

The advent of the Kinect is also having a large impact on independent film makers.  While increasingly powerful software has allowed indies to do things in post-production that, five years ago, were solely the provenance of companies like ILM, the Kinect is finally opening up the possibility of doing motion capture on the cheap.  Few have done more than Jasper Brekelmans to help make this possible.  His Kinect Pro Face software, currently sold for $99 USD, allows live streaming of Kinect face tracking data straight into 3D modeling sofrtware.  This data can then be mapped to 3D models to allow for realtime digital puppetry. 

Kinect Pro Face is just one approach to translating and storing the data streams coming out of the Kinect device.  Another approach is being spearheaded by my friend Joshua Blake at Infostrat.  His company’s PointStreamer software treats the video, depth and audio feeds like any other camera, compressing the data for subsequent playback.  PointStreamer’s preferred playback mode is through point clouds which project color data onto 3D space generated using the depth data.  These point cloud playbacks can then be rotated in space, scrubbed in time, and generally distorted in any way we like.  This alpha-stage technology demonstrates the possibility of one day recording everything in pseudo-3D.

Tags: , , ,

Kinect