Kinect and the Atlanta Film Festival

Tomorrow, I will be appearing at the Atlanta Film Festival on a panel (link) moderated by  Elizabeth Strickler of the Georgia State Digital Arts Entertainment Lab.  The panel is called Post Production: How to Hack a Kinect to Make Your Own Motion Controlled Content and will be at the Landmark Midtown Art Cinema on March 28th at 2:45.  The other panelists are Ryan Kellogg, creative lead for Vivaki’s Emerging Experiences group, and Tara Walker, a Microsoft Kinect evangelist.

Quick Guide to moving from the Kinect SDK beta 2 to v1

If you had been working with the beta 2 of the Kinect SDK prior to February 1st, you may have felt dismay at the number of API changes that were introduced in v1.

After porting several Kinect applications from the beta 2 to v1, however, I finally started to see a pattern to the changes.  For the most part, it is simply a matter of replacing one set of boilerplate code for another set of boilerplate code.  Any unique portions of the code can for the most part be left alone.

In this post, I want to demonstrate five simple code transformations that will ease your way from the beta 2 to the Kinect SDK v1.  I’ll do it boilerplate fragment by boilerplate fragment.

1. Namespaces have been shifted around.  Microsoft.Research.Kinect.Nui is now just Microsoft.Kinect.  Fortunately Visual Studio makes resolving namespaces relatively easy, so we can just move on.

2. The Runtime type, the controller object for working with data streams from the Kinect, is now called a KinectSensor type.  Grabbing an instance of it has also changed.  You used to just new up an instance like this:

Runtime nui = new Runtime();

Now you instead grab an instance of the KinectSensor from a static array containing all the KinectSensors attached to your PC. 

KinectSensor sensor = KinectSensor.KinectSensors[0];

3. Initializing a KinectSensor object to start reading the color stream, depth stream or skeleton stream has also changed.  In the beta 2, the initialization procedure just didn’t look very .NET-y.  In v1, this has been cleaned up dramatically.  The beta 2 code for initializing a depth and skeleton stream looked like this:

_nui.SkeletonFrameReady += 
    new EventHandler<SkeletonFrameReadyEventArgs>(
        _nui_SkeletonFrameReady
        );
_nui.DepthFrameReady += 
    new EventHandler<ImageFrameReadyEventArgs>(
        _nui_DepthFrameReady
        );
_nui.Initialize(RuntimeOptions.UseDepth, RuntimeOptions.UseSkeletalTracking);
_nui.DepthStream.Open(ImageStreamType.Depth
    , 2
    , ImageResolution.Resolution320x240
    , ImageType.DepthAndPlayerIndex);
     

 

In v1, this boilerplate code has been altered so the Initialize method goes away, roughly replaced by a Start method.  The Open methods on the streams, in turn, have been replaced by Enable.  The DepthAndPlayerIndex data is made available simply by having the skeleton stream enabled.  Also note that the event argument types for the depth and color streams are now different.  Here is the same code in v1:

sensor.SkeletonFrameReady += 
    new EventHandler<SkeletonFrameReadyEventArgs>(
        sensor_SkeletonFrameReady
        );
sensor.DepthFrameReady += 
    new EventHandler<DepthImageFrameReadyEventArgs>(
        sensor_DepthFrameReady
        );
sensor.SkeletonStream.Enable();
sensor.DepthStream.Enable(
    DepthImageFormat.Resolution320x240Fps30
    );
sensor.Start();

4. Transform Smoothing: it used to be really easy to smooth out the skeleton stream in beta 2.  You simply turned it on.

nui.SkeletonStream.TransformSmooth = true;

In v1, you have to create a new TransformSmoothParameters object and pass it to the skeleton stream’s enable property.  Unlike the beta 2, you also have to initialize the values yourself since they all default to zero.

sensor.SkeletonStream.Enable(
    new TransformSmoothParameters() 
    {   Correction = 0.5f
    , JitterRadius = 0.05f
    , MaxDeviationRadius = 0.04f
    , Smoothing = 0.5f });

5. Stream event handling: handling the ready events from the depth stream, the video stream and the skeleton stream also used to be much easier.  Here’s how you handled the DepthFrameReady event in beta 2 (skeleton and video followed the same pattern):

void _nui_DepthFrameReady(object sender
    , ImageFrameReadyEventArgs e)
{
    var frame = e.ImageFrame;
    var planarImage = frame.Image;
    var bits = planarImage.Bits;
    // your code goes here
}

For performance reasons, the newer v1 code looks very different and the underlying C++ API leaks through a bit.  In v1, we are required to open the image frame and check to make sure something was returned.  Additionally, we create our own array of bytes (for the depth stream this has become an array of shorts) and populate it from the frame object.  The PlanarImage type which you may have gotten cozy with in beta 2 has disappeared altogether.  Also note the using keyword to dispose of the ImageFrame object. The transliteration of the code above now looks like this:

void sensor_DepthFrameReady(object sender
    , DepthImageFrameReadyEventArgs e)
{
    using (var depthFrame = e.OpenDepthImageFrame())
    {
        if (depthFrame != null)
        {
            var bits =
                new short[depthFrame.PixelDataLength];
            depthFrame.CopyPixelDataTo(bits);
            // your code goes here
        }
    }
}

 

I have noticed that many sites and libraries that were using the Kinect SDK beta 2 still have not been ported to Kinect SDK v1.  I certainly understand the hesitation given how much the API seems to have changed.

If you follow these five simple translation rules, however, you’ll be able to convert approximately 80% of your code very quickly.

The right way to do Background Subtraction with the Kinect SDK v1

greenscreen

MapDepthFrameToColorFrame is a beautiful method introduced rather late into the Kinect SDK v1.  As far as I know, it primarily has one purpose: to make background subtraction operations easier and more performant.

Background subtraction is a technique for removing any pixels in an image that are not the primary actors.  Green Screening – which if you are old enough to have seen the original Star wars when it came out is known to you as Blue Screening – is a particular implementation of background subtraction in the movies which has actors performing in front of a green background.  The green background is then subtracted from the final film and another background image is inserted in its place.

With the Kinect, background subtraction is accomplished by comparing the data streams rendered by the depth camera and the color camera.  The depth camera will actually tell us which pixels of the depth image belong to a human being (with the pre-condition that Skeleton Tracking must be enabled for this to work).  The pixels represented in the depth stream must then be compared to the pixels in the color stream in order to subtract out any pixels that do not belong to a player.  The big trick is each pixel in the depth stream must be mapped to an equivalent pixel in the color stream in order to make this comparison possible.

I’m going to first show you how this was traditionally done (and by “traditionally” I really mean in a three to four month period before the SDK v1 was released) as well as a better way to do it.  In both techniques, we are working with three images: the image encoded in the color stream, the image encoded in the depth stream, and the resultant “output” bitmap we are trying to reconstruct pixel by pixel.

The traditional technique goes through the depth stream pixel by pixel and tries to extrapolate that same pixel location in the color stream one at a time using the MapDepthToColorImagePoint method.

var pixelFormat = PixelFormats.Bgra32;
WriteableBitmap target = new WriteableBitmap(depthWidth
    , depthHeight
    , 96, 96
    , pixelFormat
    , null);
var targetRect = new System.Windows.Int32Rect(0, 0
    , depthWidth
    , depthHeight);
var outputBytesPerPixel = pixelFormat.BitsPerPixel / 8;
sensor.AllFramesReady += (s, e) =>
{
 
    using (var depthFrame = e.OpenDepthImageFrame())
    using (var colorFrame = e.OpenColorImageFrame())
    {
        if (depthFrame != null && colorFrame != null)
        {
            var depthBits = 
                new short[depthFrame.PixelDataLength];
            depthFrame.CopyPixelDataTo(depthBits);
 
            var colorBits = 
                new byte[colorFrame.PixelDataLength];
            colorFrame.CopyPixelDataTo(colorBits);
            int colorStride = 
                colorFrame.BytesPerPixel * colorFrame.Width;
 
            byte[] output =
                new byte[depthWidth * depthHeight
                    * outputBytesPerPixel];
 
            int outputIndex = 0;
 
            for (int depthY = 0; depthY < depthFrame.Height
                ; depthY++)
            {
                for (int depthX = 0; depthX < depthFrame.Width
                    ; depthX++
                    , outputIndex += outputBytesPerPixel)
                {
                    var depthIndex = 
                        depthX + (depthY * depthFrame.Width);
 
                    var playerIndex = 
                        depthBits[depthIndex] &
                        DepthImageFrame.PlayerIndexBitmask;
 
                    var colorPoint = 
                        sensor.MapDepthToColorImagePoint(
                        depthFrame.Format
                        , depthX
                        , depthY
                        , depthBits[depthIndex]
                        , colorFrame.Format);
 
                    var colorPixelIndex = (colorPoint.X 
                        * colorFrame.BytesPerPixel) 
                        + (colorPoint.Y * colorStride);
 
                    output[outputIndex] = 
                        colorBits[colorPixelIndex + 0];
                    output[outputIndex + 1] = 
                        colorBits[colorPixelIndex + 1];
                    output[outputIndex + 2] = 
                        colorBits[colorPixelIndex + 2];
                    output[outputIndex + 3] = 
                        playerIndex > 0 ? (byte)255 : (byte)0;
 
                }
            }
            target.WritePixels(targetRect
                , output
                , depthFrame.Width * outputBytesPerPixel
                , 0);
 
 
        }
 
    }
 
};

You’ll notice that we are traversing the depth image by going across pixel by pixel (the inner loop) and then down pixel row by pixel row (the outer loop).  The pixel width of the bitmap, for reference, is known as its stride.  Then inside the inner loop, we are mapping each depth pixel to its equivalent color pixel in the color stream by using the MapDepthToColorImagePoint method.

It turns out that these calls to MapDepthToColorImagePoint are rather expensive.  It is much more efficient to simply create an array of ColorImagePoints and populate it in one go before doing any looping.  This is exactly what MapDepthFrameToColorFrame does.  The following example uses it in place of the iterative MapDepthToColorImagePoint method.  It has an added advantage in that, instead of having to iterate through the depth stream column by column and row by row, I can simply go through the depth stream pixel by pixel, removing the need for nested loops.

var pixelFormat = PixelFormats.Bgra32;
WriteableBitmap target = new WriteableBitmap(depthWidth
    , depthHeight
    , 96, 96
    , pixelFormat
    , null);
var targetRect = new System.Windows.Int32Rect(0, 0
    , depthWidth
    , depthHeight);
var outputBytesPerPixel = pixelFormat.BitsPerPixel / 8;
 
sensor.AllFramesReady += (s, e) =>
{
 
    using (var depthFrame = e.OpenDepthImageFrame())
    using (var colorFrame = e.OpenColorImageFrame())
    {
        if (depthFrame != null && colorFrame != null)
        {
            var depthBits = 
                new short[depthFrame.PixelDataLength];
            depthFrame.CopyPixelDataTo(depthBits);
 
            var colorBits = 
                new byte[colorFrame.PixelDataLength];
            colorFrame.CopyPixelDataTo(colorBits);
            int colorStride = 
                colorFrame.BytesPerPixel * colorFrame.Width;
 
            byte[] output =
                new byte[depthWidth * depthHeight
                    * outputBytesPerPixel];
 
            int outputIndex = 0;
 
            var colorCoordinates =
                new ColorImagePoint[depthFrame.PixelDataLength];
            sensor.MapDepthFrameToColorFrame(depthFrame.Format
                , depthBits
                , colorFrame.Format
                , colorCoordinates);
 
            for (int depthIndex = 0;
                depthIndex < depthBits.Length;
                depthIndex++, outputIndex += outputBytesPerPixel)
            {
                var playerIndex = depthBits[depthIndex] &
                    DepthImageFrame.PlayerIndexBitmask;
 
                var colorPoint = colorCoordinates[depthIndex];
 
                var colorPixelIndex = 
                    (colorPoint.X * colorFrame.BytesPerPixel) +
                                    (colorPoint.Y * colorStride);
 
                output[outputIndex] = 
                    colorBits[colorPixelIndex + 0];
                output[outputIndex + 1] = 
                    colorBits[colorPixelIndex + 1];
                output[outputIndex + 2] = 
                    colorBits[colorPixelIndex + 2];
                output[outputIndex + 3] = 
                    playerIndex > 0 ? (byte)255 : (byte)0;
 
            }
            target.WritePixels(targetRect
                , output
                , depthFrame.Width * outputBytesPerPixel
                , 0);
 
        }
 
    }
 
};