Helpful vs Creepy Face Recognition

mr_mall

One of the interesting potential commercial uses for the Kinect for Windows sensor is as a realtime tool for collecting information about people passing by.  The face detection capabilities of the Kinect for Windows SDK lends itself to these scenarios.  Just as Google and Facebook currently collect information about your browsing habits, Kinects can be set up in stores and malls to observe you and determine your shopping habits.

There’s just one problem with this.  On the face of it, it’s creepy.

To help parse what is happening in these scenarios, there is a sophisticated marketing vocabulary intended to distinguish “creepy” face detection from the useful and helpful kind.

First of all, face detection on its own does little more than detect that there is a face in front of the camera.  The face detection algorithm may go even further and break down parts of the face into a coordinate system.  Even this, however, does not turn a particular face into an token that can be indexed and compared against other faces. 

Turning an impression of a face into some sort of hash takes us to the next level and becomes face recognition rather than merely detection.  But even here there is parsing to be done.  Anonymous face recognition seeks to determine generic information about a face rather than specific, identifying information.  Anonymous face recognition provides data about a person’s age and gender – information that is terribly useful to retail chains. 

Consider that today, the main way retailers collect this information is by placing a URL at the bottom of a customer’s receipt and asking them to visit the site and provide this sort of information when the customer returns home.  The fulfillment rate on this strategy is obviously horrible.

Being able to collect these information unobtrusively would allow retailers to better understand how inventory should be shifted around seasonally and regionally to provide customers with the sorts of retail items they are interested in.  Power drills or perfume?  The Kinect can help with these stocking questions.

But have we gotten beyond the creepy factor with anonymous face recognition?  It actually depends on where you are.  In Asia, there is a high tolerance for this sort of surveillance.  In Europe, it would clearly be seen as creepy.  North America, on the other hand, is somewhere between Europe and Asia on privacy issues.  Anonymous face recognition is non-creepy if customers are provided with a clear benefit from it – just as they don’t mind having ads delivered to their browsers as long as they know that getting ads makes other services free.

Finally, identity face recognition in retail would allow custom experiences like the virtual ad delivery system portrayed in the mall scene from The Minority Report.  Currently, this is still considered very creepy.

At work, I’ve had the opportunity to work with NEC, IBM and other vendors on the second kind of face recognition.  The surprising thing is that getting anonymous face recognition working correctly is much harder than getting full face recognition working.  It requires a lot of probabilistic logic as well as a huge database of faces to get any sort of accuracy when it comes to demographics.  Even gender is surprisingly difficult.

Identity face recognition, on the other hand, while challenging, is something you can have in your living room if you have an XBox and a Kinect hooked up to it.  This sort of face recognition is used to log players automatically into their consoles and can even distinguish different members of the same family (for engineers developing facial recognition software, it is an irritating quirk of fate that people who look alike also tend to live in the same house).

If you would like to try identity face recognition out, you can try out the Luxand Face SDK.  Luxand provides a 30-day trial license which I tried out a few months ago.  The code samples are fairly good.  While Luxand does not natively support Kinect development, it is fairly straightforward to turn data in the Kinect’s rgb stream into images which can then be compared against other images using Luxand.

I used Luxand’s SDK to compare anyone standing in front of the Kinect sensor with a series of photos I had saved.  It worked fairly well, but unfortunately only if one stood directly in front of the sensor and about a foot or two in front of it (which wasn’t quite what we needed at the time).  The heart of the code is provided below.  It simply takes color images from Kinect and compares it against a directory of photos to see if a match can be found.  It could be used as part of a system for unlocking a computer when the proper user stands in front of it (though you can probably think of better uses – just try to avoid being creepy).

void _sensor_ColorFrameReady(object sender
    , ColorImageFrameReadyEventArgs e)
{
    using (var frame = e.OpenColorImageFrame())
    {
        var image = frame.ToBitmap();
        this.image2.Source = image.ToBitmapSource();
        LookForMatch(image);
    }
 
}
 
private bool LookForMatch(System.Drawing.Bitmap currentImage)
{
 
        if (currentImage == null)
            return false;
        IntPtr hBitmap = currentImage.GetHbitmap();
        try
        {
        FSDK.CImage image = new FSDK.CImage(hBitmap);
        FSDK.SetFaceDetectionParameters(false, false, 100);
        FSDK.SetFaceDetectionThreshold(3);
        FSDK.TFacePosition facePosition = image.DetectFace();
        if (facePosition.w != 0)
        {
 
            FaceTemplate template = new FaceTemplate();
            template.templateData = 
                ExtractFaceTemplateDataFromImage(image);
            bool match = false;
            FaceTemplate t1 = new FaceTemplate();
            FaceTemplate t2 = new FaceTemplate();
            float best_match = 0.0f;
            float similarity = 0.0f;
            foreach (FaceTemplate t in faceTemplates)
            {
 
                t1 = t;
                FSDK.MatchFaces(ref template.templateData
                    , ref t1.templateData, ref similarity);
                float threshold = 0.0f;
                FSDK.GetMatchingThresholdAtFAR(0.01f
                    , ref threshold);
 
                if (similarity > best_match)
                {
                    this.textBlock1.Text = similarity.ToString();
                    best_match = similarity;
                    t2 = t1;
                    if (similarity > _targetSimilarity)
                        match = true;
                }
            }
            if (match && !_isPlaying)
            {
                return true;
            }
            else
            {
                return false;
            }
        }
        else
            return false;
    }
    finally
    {
        DeleteObject(hBitmap);
        currentImage.Dispose();
    }
}
 
private byte[] ExtractFaceTemplateDataFromImage(FSDK.CImage cimg)
{
    byte[] ret = null;
    Luxand.FSDK.TPoint[] facialFeatures;
    var facePosition = cimg.DetectFace();
    if (0 == facePosition.w)
    {
    }
    else
    {
        bool eyesDetected = false;
        try
        {
            facialFeatures = 
                cimg.DetectEyesInRegion(ref facePosition);
            eyesDetected = true;
        }
        catch (Exception ex)
        {
            return cimg.GetFaceTemplateInRegion(ref facePosition);
        }
 
        if (eyesDetected)
        {
            ret = 
                cimg.GetFaceTemplateUsingEyes(ref facialFeatures);
        }
        else
        {
            ret = cimg.GetFaceTemplateInRegion(ref facePosition);
        }
    }
    return ret;
    cimg.Dispose();
}
 

3Gear Systems Kinect Handtracking API Unwrapping

WP_000542

I’ve been spending this last week setting up the rig for the beta hand detection API recently published by 3Gear Systems.  There’s a bit of hardware required to position the two Kinects correctly so they face down at a 45 degree angle.  The Kinect mounts from Amazon arrived within a day and were $6 each with free shipping since I never remember to cancel my Prime membership.  The aluminum parts from 80/20 were a bit more expensive but came to just a little above $100 with shipping.  We already have lots of Kinects around the Razorfish Emerging Experiences Lab, so that wasn’t a problem.

WP_000543

80/20 surprisingly doesn’t offer a lot of instruction on how to put the parts of the aluminum frame together so it took me about half-an-hour of trial-and-error to figure it out.  Then I found this PDF explaining what the frame should end up looking like deep-linked on the 3Gear website and had to adjust the frame to get the dimensions correct.

WP_000544

I wanted to use the Kinect for Windows SDK and, after some initial mistakes, realized that I needed to hook up our K4W Kinects rather than the Kinect for Xbox Kinects to do that.  When using OpenNI rather than K4W (the SDK supports either) you can use either the Xbox Kinect or the Xtion sensor.

My next problem was that although the machine we were building on has two USB Controllers, one of them wasn’t working, so I took a trip to Fry’s and got a new PCI-E USB Controller which ended up not working.  So on the way home I tracked down a USB Controller from a brand I recognized, US Robotics, and tried again the next day.  Success at last!

WP_000545

Next I started going through the setup and calibration steps here.  It’s quite a bit of command line voodoo magic and requires very careful attention to the installation instructions – for instance, install the C++ redistributable and Java SE.

WP_000546

After getting all the right software installed I began the calibration process.  A paper printout of the checkerboard pattern worked fine.  It turns out that the software for adjusting the angle of the Kinect sensor doesn’t work if the sensor is on its side facing down so I had to click-click-click adjust it manually.  That’s always a bit of a scary sound.

WP_000554

Pretty soon I was up and running with a point cloud visualization of my hands.  The performance is extremely good and the rush from watching everything working is incredible.

WP_000556

Of the basic samples, the rotation_trainer programmer is probably the most cool.  It allows one to rotate a 3D model around the Y-axis as well as around the X-axis.  Just this little sample opens up a lot of cool possibilities for HCI design.

WP_000557

From there my colleagues and I moved on to the C++ samples.  According to Chris Twigg from 3Gear, this 3D chess game (with 3D physics) was written by one of their summer interns.  If an intern can do this in a month … you get the picture.

I’m fortunate to get to do a lot of R&D in my job at Razorfish – as do my colleagues.  We’ve got home automation parts, arduino bits, electronic textiles, endless Kinects, 3D walls, transparent screens, video walls, and all manner of high tech toys around our lab.  Despite all that, playing with the 3Gear software has been the first time in a long time that we have had that great sense of “gee-whiz, we didn’t know that this was really possible.”

Thanks, 3Gear, for making our week!

Two Years of Kinect

As we approach the second anniversary of the release of the Kinect sensor, it seems appropriate to take inventory of how far we have come. Over the past two months, I have had the privilege of being introduced to several Kinect-based tools and demos that exemplify the potential of the Kinect and provide an indication of where the technology is headed.

restOnDesk

One of my favorites is a startup in San Francisco called 3Gear Systems. 3Gear have conquered the problem of precise finger detection by using dual Kinects. Whereas the original Kinect was very much a full-body sensor intended for bodies up to twelve feet away from the camera, 3Gear have made the Kinect into a more intimate device. The user can pick up digital objects in 3D space, move them, rotate them, and even draw free hand with her finger. The accuracy is amazing. The founders, Robert Wang, Chris Twigg and Kenrick Kin, have just recently released a beta of their finger-precise gesture detection SDK for developers to try out and instructions on purchasing and assembling a rig to take advantage of their software. Here’s a video demonstrating their setup and the amazing things you will be able to do with it.

oblong

Mastering the technology is only half the story, however. Oblong Industries has for several years been designing the correct gestures to use in a post-touch world. This TED Talk by John Underkoffler, Oblong’s Chief Scientist, demonstrates their g-speak technology using gloves to enable precision gesturing. Lately they’ve taken off the gloves in order to accomplish similar interactions using Kinect and Xtion sensors. The difficulty, of course, is that gestural languages can have accents just as spoken languages do. Different people perform the same gesture in different ways. On top of this, interaction gestures should feel intuitive or, at least, be easy for users to discover and master. Oblong’s extensive experience with gestural interfaces has aided them greatly in overcoming these types of hurdles and identifying the sorts of gestures that work broadly.

brekel-kinect-pro-face

The advent of the Kinect is also having a large impact on independent film makers.  While increasingly powerful software has allowed indies to do things in post-production that, five years ago, were solely the provenance of companies like ILM, the Kinect is finally opening up the possibility of doing motion capture on the cheap.  Few have done more than Jasper Brekelmans to help make this possible.  His Kinect Pro Face software, currently sold for $99 USD, allows live streaming of Kinect face tracking data straight into 3D modeling sofrtware.  This data can then be mapped to 3D models to allow for realtime digital puppetry. 

Kinect Pro Face is just one approach to translating and storing the data streams coming out of the Kinect device.  Another approach is being spearheaded by my friend Joshua Blake at Infostrat.  His company’s PointStreamer software treats the video, depth and audio feeds like any other camera, compressing the data for subsequent playback.  PointStreamer’s preferred playback mode is through point clouds which project color data onto 3D space generated using the depth data.  These point cloud playbacks can then be rotated in space, scrubbed in time, and generally distorted in any way we like.  This alpha-stage technology demonstrates the possibility of one day recording everything in pseudo-3D.

What’s In Kinect for Windows SDK 1.5?

You shouldn't have come back, Flynn.

Microsoft has just published the next release of the Kinect SDK: http://www.microsoft.com/en-us/kinectforwindows/develop/developer-downloads.aspx  Be sure to install both the SDK and the Toolkit.

This release is backwards compatible with the 1.0 release of the SDK.  This is important, because it means that you will not have to recompile applications you have already written with the Kinect SDK 1.0.  They will continue to work as is.  Even better, you can install 1.5 right over 1.0 – the install we take care of everything and you don’t have to go through the messy process of tracking down and removing all the components of the previous install.

I do recommend upgrading your applications to 1.5 if you are able, however.  There are improvements to tracking as well as the depth and color data.

Additionally, several things developers asked for following the initial release have been added.  Near-mode, which allows the sensor to work as close as 40cm, now also supports skeleton tracking (previously it did not). 

Partial Skeleton Tracking is now also supported.  While full body tracking made sense for XBox games, it made less sense when people were sitting in front of their computer or even simply in a crowded room.  With the 1.5 SDK, applications can be configured to ignore everything below the waist and just track the top ten skeleton joints.  This is also known as seated skeleton tracking.

Kinect Studio has been added to the toolkit.  If you have been working with the Kinect on a regular basis, you have probably developed several workplace traumas never dreamed of by OSHA as you tested your applications by gesticulating wildly in the middle of your co-workers.  Kinect Studio allows you to record color, depth and skeleton data from an application and save it off.  Later, after making necessary tweaks to your app, you can simply play it back.  Best of all, the channel between your app and Kinect Studio is transparent.  You do not have to implement any special code in your application to get record and play-back to work.  They just do!  Currently Kinect Studio does not record voice – but we’ll see what happens in the future.

Besides partial skeleton tracking, skeleton tracking now also provides rotation information.  A big complaint with the initial SDK release was that there was no way to find out if a player/user is turning his head.  Now you can – along with lots of other tossing and turning: think Kinect Twister.

Those are things developers asked for.  In the SDK 1.5 release, however, we also get several things no one was expecting.  The Face Tracking Library (part of the toolkit) allows devs to track 87 distinct points on the face.  Additional data is provided indication the location of the eyes, the vertices of a square around a player’s face (I used to jump through hoops with OpenCV to do this), as well as face gesture scalars that tell you things like whether the lower lip is curved upwards or downwards (and consequently whether a player is smiling or frowning).  Unlike libraries such as OpenCV (in case you were wondering), the face tracking library is using rgb as well as depth and skeleton data to perform its analysis.

I fight for the Users!

The other cool thing we get this go-around is a sample application called Avateering that demonstrates how to use the Kinect SDK 1.5 to animate a 3D Model generated by tools like Maya or Blender.  The obvious way to use this, though, would be in common motion capture scenarios.  Jasper Brekelmans has taken this pretty far already with OpenNI and there have been several cool samples published on the web using the K4W SDK (you’ll notice that everyone reuses the same model and basic XNA code).  The 1.5 Toolkit sample takes this even further by, first, having smoother tracking and, second, by adding joint rotation to the mocap animation.  The code is complex and depends a lot on the way the model is generated.  It’s a great starting point, though, and is just crying out for someone to modify it in order to re-implement the Shape Game from v1.0 of the SDK.

The Kinect4Windows team has shown that it can be fast and furious as it continues to build on the momentum of the initial release.

There are some things I am still waiting for the community (rather than K4W) to build, however.  One is a common way to work with point clouds.  KinectFusion has already demonstrated the amazing things that can be done with point clouds and the Kinect.  It’s the sort of technical biz-wang that all our tomorrows will be constructed from.  Currently PCL has done some integration with certain versions of OpenNI (the versioning issues just kill me).  Here’s hoping PC will do something with the SDK soon.

The second major stumbling block is a good gesture library – ideally one built on computer learning.  GesturePak is a good start though I have my doubts about using a pose approach to gesture recognition as a general purpose solution.  It’s still worth checking out while we wait for a better solution, however. 

In my ideal world, a common gesture idiom for the Kinect and other devices would be the responsibility of some of our best UX designers in the agency world.  Maybe we could even call them a consortium!  Once the gestures are hammered out, they would be passed on to engineers who would use computer learning to create decision trees for recognizing these gestures much as the original skeleton tracking for Kinect was done.  Then we would put devices out in the world and they would stream data to people’s Google glasses and … but I’m getting ahead of myself.  Maybe all that will be ready when the Kinect 2.5 SDK is released.  In the meantime, I still have lots to chew on with this release.

Quick Guide to moving from the Kinect SDK beta 2 to v1

If you had been working with the beta 2 of the Kinect SDK prior to February 1st, you may have felt dismay at the number of API changes that were introduced in v1.

After porting several Kinect applications from the beta 2 to v1, however, I finally started to see a pattern to the changes.  For the most part, it is simply a matter of replacing one set of boilerplate code for another set of boilerplate code.  Any unique portions of the code can for the most part be left alone.

In this post, I want to demonstrate five simple code transformations that will ease your way from the beta 2 to the Kinect SDK v1.  I’ll do it boilerplate fragment by boilerplate fragment.

1. Namespaces have been shifted around.  Microsoft.Research.Kinect.Nui is now just Microsoft.Kinect.  Fortunately Visual Studio makes resolving namespaces relatively easy, so we can just move on.

2. The Runtime type, the controller object for working with data streams from the Kinect, is now called a KinectSensor type.  Grabbing an instance of it has also changed.  You used to just new up an instance like this:

Runtime nui = new Runtime();

Now you instead grab an instance of the KinectSensor from a static array containing all the KinectSensors attached to your PC. 

KinectSensor sensor = KinectSensor.KinectSensors[0];

3. Initializing a KinectSensor object to start reading the color stream, depth stream or skeleton stream has also changed.  In the beta 2, the initialization procedure just didn’t look very .NET-y.  In v1, this has been cleaned up dramatically.  The beta 2 code for initializing a depth and skeleton stream looked like this:

_nui.SkeletonFrameReady += 
    new EventHandler<SkeletonFrameReadyEventArgs>(
        _nui_SkeletonFrameReady
        );
_nui.DepthFrameReady += 
    new EventHandler<ImageFrameReadyEventArgs>(
        _nui_DepthFrameReady
        );
_nui.Initialize(RuntimeOptions.UseDepth, RuntimeOptions.UseSkeletalTracking);
_nui.DepthStream.Open(ImageStreamType.Depth
    , 2
    , ImageResolution.Resolution320x240
    , ImageType.DepthAndPlayerIndex);
     

 

In v1, this boilerplate code has been altered so the Initialize method goes away, roughly replaced by a Start method.  The Open methods on the streams, in turn, have been replaced by Enable.  The DepthAndPlayerIndex data is made available simply by having the skeleton stream enabled.  Also note that the event argument types for the depth and color streams are now different.  Here is the same code in v1:

sensor.SkeletonFrameReady += 
    new EventHandler<SkeletonFrameReadyEventArgs>(
        sensor_SkeletonFrameReady
        );
sensor.DepthFrameReady += 
    new EventHandler<DepthImageFrameReadyEventArgs>(
        sensor_DepthFrameReady
        );
sensor.SkeletonStream.Enable();
sensor.DepthStream.Enable(
    DepthImageFormat.Resolution320x240Fps30
    );
sensor.Start();

4. Transform Smoothing: it used to be really easy to smooth out the skeleton stream in beta 2.  You simply turned it on.

nui.SkeletonStream.TransformSmooth = true;

In v1, you have to create a new TransformSmoothParameters object and pass it to the skeleton stream’s enable property.  Unlike the beta 2, you also have to initialize the values yourself since they all default to zero.

sensor.SkeletonStream.Enable(
    new TransformSmoothParameters() 
    {   Correction = 0.5f
    , JitterRadius = 0.05f
    , MaxDeviationRadius = 0.04f
    , Smoothing = 0.5f });

5. Stream event handling: handling the ready events from the depth stream, the video stream and the skeleton stream also used to be much easier.  Here’s how you handled the DepthFrameReady event in beta 2 (skeleton and video followed the same pattern):

void _nui_DepthFrameReady(object sender
    , ImageFrameReadyEventArgs e)
{
    var frame = e.ImageFrame;
    var planarImage = frame.Image;
    var bits = planarImage.Bits;
    // your code goes here
}

For performance reasons, the newer v1 code looks very different and the underlying C++ API leaks through a bit.  In v1, we are required to open the image frame and check to make sure something was returned.  Additionally, we create our own array of bytes (for the depth stream this has become an array of shorts) and populate it from the frame object.  The PlanarImage type which you may have gotten cozy with in beta 2 has disappeared altogether.  Also note the using keyword to dispose of the ImageFrame object. The transliteration of the code above now looks like this:

void sensor_DepthFrameReady(object sender
    , DepthImageFrameReadyEventArgs e)
{
    using (var depthFrame = e.OpenDepthImageFrame())
    {
        if (depthFrame != null)
        {
            var bits =
                new short[depthFrame.PixelDataLength];
            depthFrame.CopyPixelDataTo(bits);
            // your code goes here
        }
    }
}

 

I have noticed that many sites and libraries that were using the Kinect SDK beta 2 still have not been ported to Kinect SDK v1.  I certainly understand the hesitation given how much the API seems to have changed.

If you follow these five simple translation rules, however, you’ll be able to convert approximately 80% of your code very quickly.

The right way to do Background Subtraction with the Kinect SDK v1

greenscreen

MapDepthFrameToColorFrame is a beautiful method introduced rather late into the Kinect SDK v1.  As far as I know, it primarily has one purpose: to make background subtraction operations easier and more performant.

Background subtraction is a technique for removing any pixels in an image that are not the primary actors.  Green Screening – which if you are old enough to have seen the original Star wars when it came out is known to you as Blue Screening – is a particular implementation of background subtraction in the movies which has actors performing in front of a green background.  The green background is then subtracted from the final film and another background image is inserted in its place.

With the Kinect, background subtraction is accomplished by comparing the data streams rendered by the depth camera and the color camera.  The depth camera will actually tell us which pixels of the depth image belong to a human being (with the pre-condition that Skeleton Tracking must be enabled for this to work).  The pixels represented in the depth stream must then be compared to the pixels in the color stream in order to subtract out any pixels that do not belong to a player.  The big trick is each pixel in the depth stream must be mapped to an equivalent pixel in the color stream in order to make this comparison possible.

I’m going to first show you how this was traditionally done (and by “traditionally” I really mean in a three to four month period before the SDK v1 was released) as well as a better way to do it.  In both techniques, we are working with three images: the image encoded in the color stream, the image encoded in the depth stream, and the resultant “output” bitmap we are trying to reconstruct pixel by pixel.

The traditional technique goes through the depth stream pixel by pixel and tries to extrapolate that same pixel location in the color stream one at a time using the MapDepthToColorImagePoint method.

var pixelFormat = PixelFormats.Bgra32;
WriteableBitmap target = new WriteableBitmap(depthWidth
    , depthHeight
    , 96, 96
    , pixelFormat
    , null);
var targetRect = new System.Windows.Int32Rect(0, 0
    , depthWidth
    , depthHeight);
var outputBytesPerPixel = pixelFormat.BitsPerPixel / 8;
sensor.AllFramesReady += (s, e) =>
{
 
    using (var depthFrame = e.OpenDepthImageFrame())
    using (var colorFrame = e.OpenColorImageFrame())
    {
        if (depthFrame != null && colorFrame != null)
        {
            var depthBits = 
                new short[depthFrame.PixelDataLength];
            depthFrame.CopyPixelDataTo(depthBits);
 
            var colorBits = 
                new byte[colorFrame.PixelDataLength];
            colorFrame.CopyPixelDataTo(colorBits);
            int colorStride = 
                colorFrame.BytesPerPixel * colorFrame.Width;
 
            byte[] output =
                new byte[depthWidth * depthHeight
                    * outputBytesPerPixel];
 
            int outputIndex = 0;
 
            for (int depthY = 0; depthY < depthFrame.Height
                ; depthY++)
            {
                for (int depthX = 0; depthX < depthFrame.Width
                    ; depthX++
                    , outputIndex += outputBytesPerPixel)
                {
                    var depthIndex = 
                        depthX + (depthY * depthFrame.Width);
 
                    var playerIndex = 
                        depthBits[depthIndex] &
                        DepthImageFrame.PlayerIndexBitmask;
 
                    var colorPoint = 
                        sensor.MapDepthToColorImagePoint(
                        depthFrame.Format
                        , depthX
                        , depthY
                        , depthBits[depthIndex]
                        , colorFrame.Format);
 
                    var colorPixelIndex = (colorPoint.X 
                        * colorFrame.BytesPerPixel) 
                        + (colorPoint.Y * colorStride);
 
                    output[outputIndex] = 
                        colorBits[colorPixelIndex + 0];
                    output[outputIndex + 1] = 
                        colorBits[colorPixelIndex + 1];
                    output[outputIndex + 2] = 
                        colorBits[colorPixelIndex + 2];
                    output[outputIndex + 3] = 
                        playerIndex > 0 ? (byte)255 : (byte)0;
 
                }
            }
            target.WritePixels(targetRect
                , output
                , depthFrame.Width * outputBytesPerPixel
                , 0);
 
 
        }
 
    }
 
};

You’ll notice that we are traversing the depth image by going across pixel by pixel (the inner loop) and then down pixel row by pixel row (the outer loop).  The pixel width of the bitmap, for reference, is known as its stride.  Then inside the inner loop, we are mapping each depth pixel to its equivalent color pixel in the color stream by using the MapDepthToColorImagePoint method.

It turns out that these calls to MapDepthToColorImagePoint are rather expensive.  It is much more efficient to simply create an array of ColorImagePoints and populate it in one go before doing any looping.  This is exactly what MapDepthFrameToColorFrame does.  The following example uses it in place of the iterative MapDepthToColorImagePoint method.  It has an added advantage in that, instead of having to iterate through the depth stream column by column and row by row, I can simply go through the depth stream pixel by pixel, removing the need for nested loops.

var pixelFormat = PixelFormats.Bgra32;
WriteableBitmap target = new WriteableBitmap(depthWidth
    , depthHeight
    , 96, 96
    , pixelFormat
    , null);
var targetRect = new System.Windows.Int32Rect(0, 0
    , depthWidth
    , depthHeight);
var outputBytesPerPixel = pixelFormat.BitsPerPixel / 8;
 
sensor.AllFramesReady += (s, e) =>
{
 
    using (var depthFrame = e.OpenDepthImageFrame())
    using (var colorFrame = e.OpenColorImageFrame())
    {
        if (depthFrame != null && colorFrame != null)
        {
            var depthBits = 
                new short[depthFrame.PixelDataLength];
            depthFrame.CopyPixelDataTo(depthBits);
 
            var colorBits = 
                new byte[colorFrame.PixelDataLength];
            colorFrame.CopyPixelDataTo(colorBits);
            int colorStride = 
                colorFrame.BytesPerPixel * colorFrame.Width;
 
            byte[] output =
                new byte[depthWidth * depthHeight
                    * outputBytesPerPixel];
 
            int outputIndex = 0;
 
            var colorCoordinates =
                new ColorImagePoint[depthFrame.PixelDataLength];
            sensor.MapDepthFrameToColorFrame(depthFrame.Format
                , depthBits
                , colorFrame.Format
                , colorCoordinates);
 
            for (int depthIndex = 0;
                depthIndex < depthBits.Length;
                depthIndex++, outputIndex += outputBytesPerPixel)
            {
                var playerIndex = depthBits[depthIndex] &
                    DepthImageFrame.PlayerIndexBitmask;
 
                var colorPoint = colorCoordinates[depthIndex];
 
                var colorPixelIndex = 
                    (colorPoint.X * colorFrame.BytesPerPixel) +
                                    (colorPoint.Y * colorStride);
 
                output[outputIndex] = 
                    colorBits[colorPixelIndex + 0];
                output[outputIndex + 1] = 
                    colorBits[colorPixelIndex + 1];
                output[outputIndex + 2] = 
                    colorBits[colorPixelIndex + 2];
                output[outputIndex + 3] = 
                    playerIndex > 0 ? (byte)255 : (byte)0;
 
            }
            target.WritePixels(targetRect
                , output
                , depthFrame.Width * outputBytesPerPixel
                , 0);
 
        }
 
    }
 
};

Why the Kinect for Windows Sensor Costs $249.99

 

This post is purely speculative.  I have no particular insight into Microsoft strategy.  Now that I’ve disqualified myself as any sort of authority on this matter, let me explain why the $249.99 price tag for the new Kinect for Windows sensor makes sense.

The new Kinect for Windows sensor went on the market earlier this month  for $249.99.  This has caused some consternation and confusion since the Kinect for Xbox sensor only costs $150 and sometimes less when bundled with other Xbox products.

Officially the Kinect for Windows sensor is the sensor you should use with the Kinect for Windows SDK – the libraries that Microsoft provides for writing programs that take advantage of the Kinect.  Prior to the release of the v1 of the SDK, there was the Kinect SDK beta and then the beta 2.  These could be used in non-commercial products and research projects with the original Xbox sensor.

By license, if you want to use the Kinect for Windows SDK publicly, however, you must use the Kinect for Windows hardware.  If you previously had a non-commercial product running with the Kinect for Xbox sensor and the beta SDK and want to upgrade to the v1 SDK, you will also need to upgrade your hardware to the more expensive model.  In other words, you will need to pay an additional $249.99 to get the correct hardware.  The one exception is for development.  You can still use the less expensive version of the sensor for development.  Your users must use the more expensive version of the sensor once the application is deployed.

I can make this even more complicated.  If you want to use one of the non-Microsoft frameworks + drivers for writing Kinect enabled applications such as OpenNI, you are not required to use the new Kinect for Windows hardware.  Shortly after the release of the original Kinect for Xbox sensor in 2010, Microsoft acknowledged that efforts to create drivers and APIs for the sensor were okay and they have not gone back on that.  You are only required to purchase the more expensive hardware if you are using the official Microsoft drivers and SDK.

So what is physically different between the new sensor and the old one?  Not much, actually.  The newer hardware has different firmware, for one thing.  The newer firmware allows depth detection as near as 40 cm. The older firmware only allowed depth detection from 80 cm.  However, the closer depth detection can only be used when the near mode flag is turned on.  Near mode is from 40 cm to 300 cm while the default mode is from 80 cm to 400 cm. In v1 of the SDK, near mode = true has the unfortunate side-effect of disabling skeleton tracking for the entire 40 cm to 300 cm range.

Additionally, the newer firmware identifies the hardware as Kinect for Windows hardware.  The Kinect for Windows SDK checks for this.  For now, the only real effect this has is that if the full SDK is not installed on a machine (i.e. a non-development machine) a Kinect for Windows application will not work with the old Xbox hardware.  If you do have the full SDK installed, then you can continue to develop using the Xbox sensor.  For completeness, if a Kinect for Windows application is running on a machine with the Kinect for Windows hardware and the full SDK is not installed on that machine, the application will still work.

The other difference between the Kinect for Windows sensor and the Kinect for Xbox sensor is that the usb/power cord is slightly different.  It is shorter and, more importantly, is designed for the peculiarities of a PC.  The Kinect for Xbox sensor usb/power cord was designed for the peculiarities of the Xbox usb ports.  Potentially, then, the Kinect for Windows sensor will just operate better with a PC than the Kinect for Xbox sensor will.

Oh.  And by the way, you can’t create Xbox games using the Kinect for Windows SDK and XNA.  That’s not what it is for.  It is for building PC applications running on Windows 7 and, eventually, Windows 8.

So, knowing all of this, why is Microsoft forcing people to dish out extra money for a new sensor when the old one seems to work fine?

Microsoft is pouring resources into developing the Kinect SDK.  The hacker community has asked them to do this for a while, actually, because they 1) understand the technologies behind the Kinect and 2) have experience building APIs.  This is completely in their wheelhouse.

The new team they have built up to develop the Kinect SDK is substantial and – according to rumor – is now even larger than the WPF and Silverlight teams put together.  They have now put out an SDK that provides pretty much all the features provided by projects like OpenNI but have also surpassed them with superior skeleton recognition and speech recognition.  Their plans for future deliverables, from what I’ve seen, will take all of this much further.  Over the next year, OpenNI will be left in the dust.

How should Microsoft pay for all of this?  A case can be made that they ought to do this for free.  The Kinect came along at a time when people no longer considered Microsoft to be a technology innovator anymore.  Their profits come from Windows and then Office while their internal politics revolve around protecting these two cash cows.  The Kinect proved to the public at large (and investors) not only that all that R&D money over the years had been well spent but also that Microsoft could still surprise us.  It could still do cool stuff and hadn’t completely abdicated technology and experience leadership to the likes of Apple and Google.  Why not pour money into the Kinect simply for the sake of goodwill?  How do you put a price on a Microsoft product that actually makes people smile?

Yeah, well.  Being a technology innovator doesn’t mean much to investors if those innovations don’t also make money.  The prestige of a product internally at Microsoft also depends on how much money your team wields.  To the extent that money is power, the success of the Kinect for non-gaming purposes depends on the ability of the new SDK to generate revenue.  Do you remember the inversion from the musical Camelot when King Arthur says that Might makes Right should be turned around in Camelot into Right makes Might?  The same sort of inversion occurs hear.  We’ve grown used to the notion that Money can make anything Cool.  The Kinect will test out the notion, within Microsoft, that Cool can also make Money.

So how should Microsoft make that money?  They could have opted to charge developers for a license to build on their SDK.  I’m grateful they didn’t, though.  This would have ended up being a tax on community innovation.  Instead, developers are allowed to develop on the Kinects they already have if they want to (the $150 Kinect).

Microsoft opted to invest in innovation.  They are giving the SDK away for free.  And now we all wait for someone to build a killer Kinect for Windows app.  Whoever does that will make a killing.  This isn’t anything like building phone apps or even Metro apps for the Windows 8 tablet.  We’re talking serious money.  And Microsoft is betting on someone coming along and building that killer app in order to recoup its investment since Microsoft won’t start making money until there is an overriding reason for people to start buying the Kinect for Windows hardware (e.g. that killer app).

This may not happen, of course.  There may never be a killer app to use with the Kinect for Windows sensor.  But in this case Microsoft can’t be blamed for hampering developers in any way.  They aren’t even charging us a developer fee the way the Windows Phone marketplace or IOS developer program does.  Instead, with the Kinect for Windows pricing, they’ve put their full faith in the developer community.  And by doing this, Microsoft shows me that they can, in fact, occasionally be pretty cool.

Changes in Kinect SDK Beta 2

blamed_rachel

To celebrate the one year anniversary of the Kinect, Microsoft has launched a new Kinect website and released the Beta 2 version of the Kinect for Windows SDK: http://www.kinectforwindows.org/ .

This is not the commercial license we have been waiting for (it is reported to be coming in early 2012) but truly the next best thing.  The Beta 2 SDK introduces many performance improvements over the Beta 1 that was released in June. 

With the improvements also come some alterations to the basic syntax for instantiating the core objects of both the Nui and Audio namespaces, though these changes also have side-effects that will likely affect your code.  In particular, I want to cover one substantial change in the Nui namespace and one substantial change in the Audio namespace.

In the Beta 1, it was standard to instantiate a Nui.Runtime object in order to configure applications to read the depth, color and skeleton streams.  In a WPF application, the code looked like this:

        Microsoft.Research.Kinect.Nui.Runtime _nui;

        public MainWindow()
        {
            InitializeComponent();

            this.Unloaded += (s,e) => _nui.Uninitialize();
            _nui = new Runtime();

            _nui.Initialize(RuntimeOptions.UseColor 
                | RuntimeOptions.UseDepth);
            _nui.VideoFrameReady += _nui_VideoFrameReady;
            _nui.DepthFrameReady += _nui_DepthFrameReady;

            _nui.VideoStream.Open(ImageStreamType.Video, 2
                , ImageResolution.Resolution640x480
                , ImageType.Color);
            _nui.DepthStream.Open(ImageStreamType.Depth, 2
                , ImageResolution.Resolution320x240
                , ImageType.Depth);

        }

In the Beta 2, the Nui.Runtime has been obsoleted.  Instead, the Runtime type provides a static collection called Kinects that returns a Runtime object for each Kinect connected to the PC.    Additionally, in the new Beta, Runtime objects cannot be configured and used in the Initialize method as was possible with the Beta 1.  Instead, this must be done in the Loaded event handler.  This has ended up breaking a lot of my Beta 1 code.  Fortunately the refactor is fairly easy.  The following code assumes there is only one Kinect sensor plugged into the PC. 

        Microsoft.Research.Kinect.Nui.Runtime _nui;

        public MainWindow()
        {
            InitializeComponent();
            this.Loaded += (s, e) =>
                {
                _nui = Runtime.Kinects[0];
                _nui.Initialize(RuntimeOptions.UseColor);
                _nui.VideoFrameReady += _nui_VideoFrameReady;
                _nui.VideoStream.Open(ImageStreamType.Video
                , 2
                , ImageResolution.Resolution1280x1024
                , ImageType.Color);
                };
        }

A central difficulty in working with the audio namespace is that the DMO named the KinectAudioSource, which is as fundamental to audio processing for the Kinect as the Nui.Runtime is for video and skeletal processing, must be instantiated on a thread running in a multithreaded apartment model.  WPF applications, unfortunately, run in an STA thread.  This required a bit of additional wiring up in the Beta 1 to create the KinectAudioSource object on a separate thread.

With the Kinect for Windows SDK Beta 2, this is implicitly taken care of for us.  If you have already been working with the Beta 1 audio namespace, you will finally be able to take out the workarounds you created to program against the audio stream along with all the additional care required for dealing with a multithreaded application.

[Thanks to Clint Rutkas of Coding4Fun for correcting me on the difference between obsolescence and breaking changes. The new Runtime instantiation process is not technically a breaking change.]

Book Announcement

My colleague Jarrett Webb and I have reached the halfway point on our book for APress this week, so it finally seems safe to announce that we are working on Beginning Kinect Programming with the Kinect SDK to be released towards the end of the year.  My original plan for September and October was to blog feverishly on the Kinect SDK, but given the tyranny of the book schedule, I’m finding it hard not to put everything related to the Kinect into the book instead.  Towards the end of the process, however, we’ll likely have odds and ends that do not make sense in a Beginner book that will, nevertheless, want to be said.  Those odds and ends will eventually make their way here.

In the meantime, the big news is WinRT.  Besides the fact that everyone and their neighbor is already writing their reflections on BUILD, obviating any need for me to contribute, I’ve actually already written a BUILD summary for Razorfish, which should appear on www.emergingexperiences.com shortly.

I can’t resist commenting, all the same, that while I have javascript and C# under my belt, C++ is in my near future.  No love or time for F#, sadly.

Why ReMIX?

Logo-72dpi-Color

On August 6th, the Atlanta developer community will be hosting ReMIX South, a conference for designers and developers.  This is the second year the conference has been held in Atlanta.  Early Bird tickets can be purchased at http://remixsouth.eventbrite.com .  The official website is at www.remixsouth.com .  Tickets are only $30 through June 28th.

There are lots of great conferences throughout the year such as MIX, An Event Apart and TechEd.  These all tend to be extremely expensive, however.  At the other end of the spectrum are community events such as MadExpo, CodeStock, the various code camps and DevLink.  These are great, inexpensive grassroots level events.  Anyone can speak and the agendas tend to be more or less random.

ReMIX is an attempt to create something in between these two extremes.  We created an event that has the level of speakers you would typically see at all the former two to three thousand dollar events but at the price of a community event.

We do this by spending much of our time throughout the year at all of these other conferences trying to recruit speakers for ReMIX South.  We spend half the year discussing who is a top speaker, who is a rising speaker, and what topics have become important in 2011.  In other words, we spend the majority of our effort simply planning out our speakers the way a painter mixes colors or a chef blends flavors.

We do this in order to provide what is, to our minds, a unique and satisfying experience for our attendees.  Of all the speakers we reached out to, only two of our must haves couldn’t make it: Bill Buxton and Robby Ingebretsen – both had prior engagements.

We keep prices low through very generous sponsorship as well as being very frugal with your money – though controversial, we don’t provide lunch or t-shirts.  While these are standard for most conferences, we found that we can cut the price in half simply by leaving them out.  We also keep your bottom line low by choosing a central location, The Marriott at Perimeter Center, which has free parking.

(As an aside, when I was at An Event Apart, I paid the same amount for my parking as the price of one early bird ticket to ReMIX: $30.)

The other thing we try to do at ReMIX is to provide a designer event that is friendly toward developers, as well as a developer event that is friendly toward designers.

It is also an event that we try to make welcoming for both Microsoft stack as well as non-Microsoft stack developers.  We understand that, depending on where you come from, our agenda will always seem to lean too much in one direction or the other.  To our thinking, this is a good thing.  We want to bring the different communities together. 

Non-Microsoft developers will get a bit of exposure to a world they tend to stay away from, while Microsoft stack developers will have their minds expanded to a world they are not familiar with.  At the end of the day, we all leave knowing a little more about our craft than we did when we came and have a broader understanding of what our craft entails.  Everyone moves out of their comfort zone and becomes stronger for it.

And if you don’t want to have your mind expanded, that’s cool, too.  We have enough sessions to keep anyone inside their comfort zone, if that’s what they want.

Here is what we are offering this year:

The Keynote

Albert Shum is one of the most fascinating people currently working at Microsoft.  He is part of the revolution within Microsoft that transformed their mobile strategy and placed, for once, design at the center of a new technology offering.  Albert led the design team that created the much discussed “Metro” language used originally on Windows Phone and now, according to well-placed rumors, on the Surface 2 and Windows 8.  If you are still confused about what “Metro” actually is, this is your best opportunity to find out – he’ll be at the conference all day and is very approachable.

The Web Track

This is the track we are perhaps proudest of.  If you are a Microsoft stack developer, then you might think of HTML5 as a zombie-like infestation that is taking over and displacing all the technologies you are used to working with.

On the other hand, if you aren’t part of the Microsoft world, you probably are perplexed when people make a big deal about HTML5 and wonder if they are talking about CSS3 + JQuery. 

So what we are offering in the Web track at ReMIX is a bunch of non-Microsoft stack web developers explaining HTML5 to both Microsoft and non-Microsoft developers.  Brilliant, right?

J. Cornelius of CoffeeCup Software will start by presenting “HTML5: Yes Really.”  The title is a joke and if you don’t get it, then you really need to attend.   He will provide the opening overview of HTML5

John Agan, who builds amazing web experiences for Epic Labs, will teach us about JQuery

Josh Netherton of MailChimp will school us on CSS3 in “More Than Just Rounded Corners.” 

Finally, we’ve invited August de los Reyes of Artefact to speak.  If you aren’t familiar with August, he happens to have given the most impressive talk at MIX11 this year – and it was only ten minutes long!You can find a video of his MIX presentation here .  At ReMIX South, August will be presenting an expanded version of his talk 21st Century Design.

Mobile \ Tablet Track

The past year has been spent pursuing the code-once dream for mobile development using tools like Mono, PhoneGap, the Adaptive Web and, most recently, HTML5.  If you’ve been following the trends in iPhone, Android and Windows Phone, you’ll know that this has been a rocky and occasionally treacherous path.  Not only do the different tools not always work … ahem … perfectly, but the rise of tablets is also making it clear that designing and developing for non-desktop computers is a lot more complex than just working with different form-factors.  We’ve invited several Microsoft as well as non-Microsoft stack people to walk us through the variegated world of mobile and tablet development.

Douglas Knudson, Technical Architect for Universal Mind and organizer of the Atlanta Flex User Group, will show us how to use Adobe Air to target multiple mobile and tablet platforms.

Luke Hamilton, Creative Director at Razorfish, will speak on “The Interface Revolution” and cover how to work with all the new devices we are being confronted with as technology keeps progressing.

Shawn Wildermuth, a well known trainer and expert in Windows Phone development, will walk us through the new features being introduced in Windows Phone Mango.

Jeremy Likness will talk about his experience working with Silverlight for tablets.  He will also discuss what we currently know about Windows 8, which is being promoted as a tablet platform that uses both HTML5 as well as a XAML-based language for development.

Rob Cameron, who was also with us last year, is a Microsoft Architect Evangelist.  He will talk, among other things, about developing games for Windows Phone using XNA.

Windows Phone Garage

No pretense of non-Microsoft material here.  Starting in mid- to late-August, the Windows Phone Marketplace will start accepting Mango apps.  This full day dev garage will get you ready for that.  Just bring a laptop and we’ll take care of the rest – by the end of the day you will have an application ready to start making you money. 

mixheadshot_reasonably_small

Unlike other phone garages, this one will be surrounded by top talent in development and design as well as several Windows Phone development MVPs.  If you would like their help, we’ll setup a sign-up sheet so you can arrange to get one-on-one advice about your app.

UX Track

The UX Track has always made the ReMIX conference stand apart from other conference.  This is really the place where we invite speakers to talk broadly about a variety of topics which we place, loosely, under the UX rubric.

Let me point out, first of all, that all of our speakers are amazing.  These are our rock stars. Rick Barraza has become an institution helping developers understand UX and design as well as trying to help devs and designers to work together.  He spoke at MIX this year.  Jenn Downs is simply cool and MailChimp, her company, is widely lauded for breaking new ground in connecting with customers by being hip, playful, cheeky and, of course, extremely useful.  MailChimp has pretty much been invited to speak at every major conference this year.  Zach Pousman and James Chittenden were both extremely popular speakers at last year’s ReMIX.  Zach is an expert in both academic and practical UX, while James is a UX Architect for Microsoft Consulting – you probably didn’t even know there was such a thing.  We are very lucky to have them back.  Designers think Matthias Shapiro is a designer while developers assume he is a developer since he has been so effective in bridging both worlds.  His talk on Motion is a must see.

If you have spent your careers as developers and have never been exposed to the world of UX and design, then the best favor I can do for you is to recommend that you spend your whole Saturday in this track.  You’ll thank me for it.  Really, you will.

Kinect Track

This is very exciting for us.  The Kinect Track is our opportunity to take a new technology, bring together some of the leading experts on developing for the Kinect and hold the first conference event about the Kinect.  Other conferences are beginning to have one or two Kinect talks a piece, if they have any.  At ReMIX, we provide a full day of Kinect content.

All of our Kinect speakers come for the most part from the pages of the KinectHacks website. 

Jarrett Webb is the creator of KinectShop, an application that has given us the best picture so far of how the Kinect and related technologies will one day be used in retail.  He is generously providing the introductory talk on developing for the Kinect.

Zahoor Zafrulla is a Phd candidate at Georgia Tech.  He is making breakthroughs in using the Kinect sensor for education.  His particular interest is in using the Kinect to teach American Sign Language.

Steve Dawson and Alex Nichols wrote the DaVinci Kinect application in November of 2010 – shortly after drivers for building Kinect applications for the PC became available.  It was one of the first apps recognized for successfully pulling off  ‘The Minority Report’ effect and Microsoft later asked them to present it at the E3 conference to demonstrate what Kinect hacking is all about.

Josh Blake is the best known figure in the Kinect world.  Besides being widely recognized as an authority on Natural User Interface concepts, he is also the founder of the OpenKinect community.  There are few people who know more about the growth and future potential of the Kinect technology than Josh.

The final Kinect session of the day will be a panel discussion moderated by Josh Blake with panelists Albert Shum, Rick Barraza, Luke Hamilton and Zahoor Zafrulla.  They will be discussing the influence of TV shows and movies on how we envision the future of technology as well as what the future of technology will actually look like.