Kinect v2 Community Projects

bogart

One of the great strengths of the original Kinect sensor was the community that gathered around it almost by happenstance.  The same thing is currently happening with the Kinect for Windows v2 – even though the non-XBox version of the hardware is still yet to be released.  Going into this v2 release, Microsoft took the prescient stance of reaching out to creative coders, researchers and digital agency types (that’s me) to give them pre-release versions of the hardware to start playing with.

Here are just a few of the things they’ve come up with:

wieden-kennedy/Cinder-Kinect2 – Stephen Schieberl’s Kinect v2 wrapper for Cinderlib

englandrp/Kinect2-Data-Transmitter – a Unity3D plugin for Kinect v2

rfilkov/kinect2-unity-example-with-ms-sdk – another Unity3D plugin for Kinect v2 (also, I believe, using the Data Transmitter strategy)

OpenKinect/libfreenect2 – Josh Blake, Theo Watson, et. al.’s open source drivers for Kinect v2 (in progress, but this will allow it to run on operating systems other than Windows 8 – for instance, on a Mac)

https://github.com/MarcusKohnert/Kinect.ReactiveV2 – a reactive library for Kinect v2

https://github.com/DevHwan/K4Wv2OpenCVModule – OpenCV bridge for Kinect v2

http://k4wv2heartrate.codeplex.com/ – Dwight Goins’ sample implementation of heart rate detection using Kinect v2

… and then there are twice as many in the works I’ve heard about through the grapevine.

The walled garden approach to software doesn’t work anymore and the Microsoft Kinect for Windows team seems to have embraced that in a big way.  Not only are people experimenting with the new hardware but they are even making their code publicly available – free as in beer type available – in order to foster the community. 

This is a philosophical stance that in some ways harkens back to one of Bill Gates’ early intuitions when he was building the Microsoft Corporation.  At some point, he realized that he couldn’t be the smartest person in the room forever.  What he could do, though, was to gather the best people he could find and drive them to be their best.  He would contribute by clearing the roadblocks and guiding these people toward his goals.  This, more or less, was also how Steve Jobs went about adding value to his company and to contemporary culture.

The community currently building up around Kinect v2 is like that but with a difference.  The goal isn’t to lead anyone in a particular direction.  Instead, the objective is to open up tools / toys to allow people to discover their own goals.  Each community member contributes to something bigger than herself by making it possible for other people to do something new and original – whether this turns out to be an app, an art installation, a better way of shopping, an improved layout for visualizing spreadsheets – whatever.

So what’s so bad about walled gardens?

Quite simply, they stifle innovation.  The Microsoft I’d grown used to in the double naughts was all about best practices and guidelines and “components” and sealed classes. 

Ultimately, Microsoft did everything it could to minimize support calls.  Developers were given a certain way to do things – whether this was a good way or not – and if they went off the reservation (sorry, left the walled garden) they were typically on their own: no callbacks from MS and a lot of abuse on support forums asking ‘wtf are you doing that for?’

And I can understand all that — support calls suck – but the end result of this approach was that innovation started occurring more and more outside of Microsoft platforms.  Microsoft, in turn, became a ‘use case’ culture.  Instead of opening up their APIs like everyone else was doing, their most common response to requests was ‘what’s your use case’ followed by ‘we’ll get back to you on that.’

The logic of this was very simple.  Microsoft in the 00’s was about standardization of programming practices.  If you’re the sort of person who wants to innovate, however, you don’t want to do the ‘standard’ – by definition you don’t want to do what everyone else is doing.  So you looked for platforms with open APIs and tried to find ways to do things with the APIs that no one else was doing, i.e., you hacked those APIs.

And Microsoft, traditionally, hasn’t liked people using their products in ways they are not intended to be used – they haven’t liked hacking.

The original Kinect sensor changed all that.  It took a moment, but as videos started showing up all over the place showing people using a hacked driver to read the Kinect sensor streams, the Grinch’s heart grew three sizes that day.  MS was getting instant street cred by simply letting people do what they were doing anyways and giving a thumbs up to it.  Overnight, Microsoft was once again recognized as an innovative company (they always have been, really, but that wasn’t the public perception).

Which is why v2 of libfreenect is so exciting.  It’s a project that will, ultimately, allow you to use the Kinect on a Mac.

To put things into context, PrimeSense (the provider of the depth technology behind Kinect v1) got bought out by Apple last year.  PrimeSense’s alternative, open source library + drivers for Kinect, OpenNI, was suddenly put in jeopardy and an announcement was circulated that the OpenNI site was coming down in April, 2014.  So…

The anti-Microsoft is currently bringing OpenNI inside its walled garden.  Meanwhile Microsoft is providing devices to the people writing libfreenect, which will allow people to use Kinect devices outside of Microsoft’s not-so-walled garden. 

How do you like them Apples?

Quick Reference: Kinect 1 vs Kinect 2

 

This information is preliminary as Kinect for Windows SDK 2.0 has not been released in final form and some of this may change.  Some things, such as no tilt motor and supported USB standards, are probably impossible to change.

Feature Kinect for Windows 1 Kinect for Windows 2
Color Camera 640 x 480 @30 fps 1920 x 1080 @30 fps
Depth Camera 320 x 240 512 x 424
Max Depth Distance ~4.5 M 8 M
Min Depth Distance 40 cm in near mode 50 cm
Depth Horizontal Field of View 57 degrees 70 degrees
Depth Vertical Field of View 43 degrees 60 degrees
Tilt Motor yes no
Skeleton Joints Defined 20 joints 25 joints
Full Skeletons Tracked 2 6
USB Standard 2.0 3.0
Supported OS Win 7, Win 8 Win 8
Price $249 $199

Razzle Dazzle

kinect for XBox One

People continue to ask what the difference is between the Kinect for XBox One and the Kinect for Windows v2.  I had to wait to unveil the Thanksgiving miracle to my children, but now I have some pictures to illustrate the differences.

side by side

On the sensors distributed through the developer preview program (thank you Microsoft!) there is a sticker along the top covering up the XBox embossing on the left.  There is an additional sticker covering up the XBox logo on the front of the device.  The power/data cables that comes off of the two  sensors look a bit like tails.  Like the body of the sensors, the tails are also identical.  These sensors plug directly into the XBox One.  To plug them into a PC, you need an additional adapter that draws power from a power cord and sends data to a USB 3.0 cable and passes both of these through the special plugs shown in the picture below.

usb

So what’s with those stickers?  It’s a pattern called razzle dazzle (and sometimes razzmatazz).  In World War I, it was used as a form of camouflage for war ships by the British navy.  It’s purpose is to confuse rather than conceal — to obfuscate rather than occlude.

war razzle dazzle

Microsoft has been using it not only for the Kinect for Windows devices but also in developer units of the XBox One and controllers that went out six months ago. 

This is a technique of obfuscation popular with auto manufacturers who need to test their vehicles but do not want competitors or media to know exactly what they are working on.  At the same time, automakers do use this peculiar pattern to let their competitors and the media know that they are, in fact, working on something.

car razzle dazzle

What we are here calling razzle dazzle was, in a more simple age, called the occult.  Umberto Eco demonstrates in his fascinating exploration of the occult, Foucault’s Pendulum, that the nature of hidden knowledge is to make sure other people know you have hidden knowledge.  In other words, having a secret is no good if people don’t know you have it.  Dr. Strangelove expressed it best in Stanley Kubrick’s classic film:

Of course, the whole point of a Doomsday Machine is lost if you keep it a secret!

A secret, however, loses its power if it is ever revealed.  This has always been the difficulty of maintaining mystery series like The X-Files and Lost.  An audience is put off if all you ever do is constantly tease them without telling them what’s really going on. 

magic

By the same token, the reveal is always a bit of a letdown.  Capturing bigfoot and finding out that it is some sort of hairy hominid would be terribly disappointing.  Catching the Loch Ness Monster – even discovering that it is in fact a plesiosaur that survived the extinction of the dinosaurs – would be deflating compared to the sweetness of having it exist as a pure potential we don’t even believe in.

This letdown even applies to the future and new technologies.  New technologies are like bigfoot in the way they disappoint when we finally get our hands on them.  The initial excitement is always short-lived and is followed by a peculiar depression.  Such was the case in an infamous blog post by Scott Hanselman called Leap Motion Amazing, Revolutionary, Useless – but known informally as his Dis-kinect post – which is an odd and ambivalent blend of snarky and sympathetic.  Or perhaps snarky and sympathetic is simply our constant stance regarding the always impending future.

bigfoot

The classic bad reveal – the one that traumatized millions of idealistic would-be Jedi – is the quasi-scientific explanation of midichlorians  in The Phantom Menace.   The offences are many – not least because the mystery of the force is simply shifted to magic bacteria that pervade the universe and live inside sentient beings – an explanation that explains nothing but does allow the force to be quantified in a midichlorian count. 

The midichlorian plot device highlights an important point.  Explanations, revelations and unmaskings do not always make things easier to understand, especially when it’s something like the force that, in some sense, is already understood intuitively.  Every child already knows that by being good, one ultimately gets what one wants and gets along with others.  This is essentially the lesson of that ancient Jedi religion – by following the tenets of the Jedi, one is able to move distant objects with one’s will, influence people, and be one with the universe.  An over-analysis of this premise of childhood virtue destroys rather than enlightens.

the force razzle dazzle

The force, like virtue itself, is a kind of razzle dazzle – by obfuscating it also brings something into existence – it creates a secret.  In attempts to explain the potential of the Kinect sensor, people often resort to images of Tom Cruise at the Desk of the Future or Picard on the holodeck.  The true emotional connection, however, is with that earlier (and adolescent) fantasy awakened by A New Hope of moving things by simply wanting them to move, or changing someone’s mind with a wave of the hand and a few words – these are not the droids you are looking for.  Ben Kenobi’s trick in turn has its primordial source in the infant’s crying and waving of the arms as a way to magically make food appear. 

It’s not coincidental, after all, that Kinect sensors have always had both a depth sensor to track hand movements as well as a virtual microphone array to detect speech.

Kinect for Windows v2 First Look

WP_20131123_001

I’ve had a little less than a week to play with the new Kinect for Windows v2 so far, thanks to the developer preview program and the Kinect MVP program.  The original unboxing video is on Vimeo.  So far it is everything Kinect developers and designers have been hoping for – full HD through the color camera and a much improved depth camera as well as USB 3.0 data throughput. 

Additionally, much of the processing is now occurring on the GPU rather than the onboard chip or your computer’s CPU.  While amazing things were possible with the first Kinect for Windows sensor, most developers found themselves pushing the performance envelope at times and wishing they could get just a little more resolution or just a little more data speed.  Now they will have both.

20131126_110049

At this point the programming model has changed a bit between Kinect for Windows v1 and Kinect for Windows v2.  While knowing the original SDK will definitely give you a leg up, a bit of work will still need to be done to port Kinect v1 apps to the new Kinect v2 SDK when it is eventually released.

What I find actually confusing is the naming.  With the first round of devices that came out in 2010-11, we had the Kinect for XBox and Kinect for Windows.  It makes sense that the follow up to Kinect for XBox is the “Kinect for XBox One”.  But the follow up to Kinect for Windows is “Kinect for Windows v2” so we end up with the Kinect for XBox One as the correlate to K4W2. Furthermore,  by “Windows” we mean Windows 8 (now 8.1) so to be truly accurate, we really should be calling the newest Windows sensor K4W8.1v2.  For convenience, I’ll just be calling it the “new Kinect” for a while.

WP_20131123_004

What’s different between the new Kinect for XBox One and the Kinect for Windows v2?  It turns out not a lot.  The Kinect for XBox has a special USB 3.0 adapter that draws both lots of power as well as data from the XBox One.  Because it is a non-standard connector, it can’t be plugged straight into a PC (unlike with the original Kinect which had a standard USB 2.0 plug).

To make the new Kinect work with a PC, then, requires a special breakout board.  This board serves as an adapter with three ports – one for the Kinect, one for a power source, finally one for a standard USB 3.0 cable. 

We can also probably expect the firmware on the two versions of the new Kinect sensor to also diverge over time as occurred with the original Kinect.

kinec2_skel

Skeleton detection is greatly improved with the new Kinect.  Not only are more joints now detected, but many of the jitters developers became used to working around are now gone.  The new SDK recognizes up to 6 skeletons rather than just two.  Finally, because of the improved Time-of-Flight depth camera, which replaces the Primesense technology used in the previous hardware, the accuracy of the skeleton detection is much better and includes excellent hand detection.  Grip recognition as well as Lasso recognition (two fingers used to draw) are now available out of the box – even in this early alpha version of the SDK.

WP_20131123_005

I won’t hesitate to say – even this early in the game – that the new hardware is amazing and is leaps and bounds better than the original sensor.  The big question, though, is whether it will take off the way the original hardware did.

If you recall, when Microsoft released the first Kinect sensor they didn’t have immediate plans to use it for anything other than a game controller – no SDK, no motor controller, not a single luxury.  Instead, creative developers, artists, researchers and hackers figured out ways to read the raw USB data and started manipulating it to create amazingly original applications that took advantage of the depth sensor – and they posted them to the Internet.

Will this happen the second time around?  Microsoft is endeavoring to do better this time by getting an SDK out much earlier.  As I mentioned above, the alpha SDK for Kinect v2 is already available to people in the developer preview program.  The trick will be in attracting the types of creative people that were drawn to the Kinect three years ago – the kind of creative technologists Microsoft has always had trouble attracting toward other products like Windows Phone and Windows tablets.

My colleagues and I at Razorfish Emerging Experiences are currently working on combining the new Kinect with other technologies such as Oculus Rift, Google Glass, Unity 3D, Cinder, Leap Motion and 4K video.  Like a modern day scrying device (or simply a mad scientist’s experiment) we hope that by simply mixing all these gadgets together we’ll get a glimpse at what the future looks like and, perhaps, even help to create that future.

Ghost Hunting with Kinect

Paranormal Activity 4

I don’t usually try to undersell the capabilities of the Kinect.  Being a Microsoft Kinect for Windows MVP, I actually tend to promote all the things that Kinect currently does and one day will do.  In fact, I have a pretty big vision of how Kinect, Kinect 2, Leap Motion, Intel’s Perceptual Computing camera and related gestural technologies will change the way we interact with our environment.

Having said that, let me just add that Kinect cannot find ghosts.  It might reveal bugs in the underlying Kinect software – but it cannot find ghosts.

Nevertheless, “experts” are apparently using Kinect sensors to reveal the presence of ghosts.  Here’s a clip from Travel Channel’s Ghost Adventures.  It’s an episode called Cripple Creek and you’ll want to skip ahead to about 3:50 (ht to friend Josh Blake for finding this).

The logic of this is based on some very sophisticated algorithms the Kinect uses to identify “skeletons” – or outlines of the human form.  The current Kinect can spot two skeletons at a time including up to 20 joints on each skeleton.  Additionally, it has a “seated mode” that allows it to identify partial skeletons from about the waist up – this tends to be a little more dodgy though.  All of this skeleton information is provided primarily to allow developers to create games that track the human body and, typically, animate an onscreen avatar that emulates the player’s movements.

The underlying theory behind using it for ghost hunting is that, since when someone passes in front of the Kinect sensor the Kinect will typically register a skeleton, it follows that if the Kinect registers a skeleton someone must have passed in front of it.

skeleton

Unfortunately, this is not really the case.  There are lots of forum posts from developers asking how to work around peculiarities with the Kinect skeletons while anyone who has played a Kinect game on XBox has probably noticed that the sensor will occasionally provide false positives (which for gaming, is ultimately better than false negatives).  In fact, even my dog would sometimes register as a skeleton when he ran in front of me while I was playing. 

Perhaps you’ve also noticed that in an oddly shaped room, Kinect is prone to register false speech commands.  This happens to me especially when I’m trying to watch my favorite ghost hunting show on Netflix – probably because of the feedback from the television itself (which the Kinect tends to be very good at cancelling out if you take the trouble to configure it according to instructions – but I don’t).  I know this isn’t a ghost pausing my TV show, though, because the Kinect isn’t set up to hear anything I don’t hear.  Just because the Kinect emulates some human features – like following simple voice commands like “Play” and “Pause” – doesn’t mean it’s something from The Terminator, The Matrix or Minority Report.  It is no more psychic than I am and it doesn’t have super hearing.

Kinect 2 IR

Similarly, skeleton tracking on Kinect isn’t specially fitted to see invisible things.  It uses a combination of an infrared camera and a color camera to collect data which it interprets as a human structure.  But these cameras don’t see anything the human eye can’t see with the lights on.  Those light photons that are being collected by the sensors still have to bounce off of something visible, even if you can’t see the light beams themselves.  Perhaps part of the illusion is that, because we can’t see the infrared light being emitted and collected by the Kinect, people assume that what it detects also can’t be seen?

Here’s another episode of Ghost Adventures on location at the haunted Talumne Hospital.  It’s especially remarkable because the Kinect here is doing exactly what it is expected to do.  As the subject lifts himself off the bed, he separates his outline from the background and Kinect for Windows’ “seated mode” identifies his partial skeleton from approximately the waist up.  The intrepid ghost hunters then scream out “It was in your gut!”  Television gold.

Apparently the use of unfamiliar (and misunderstood) technology provides a veneer of seriousness to what these people do on their shows.  Another piece of weird technology all these shows use is something called EVP – electronic voice phenomena.  Here the idea is that you put out a tape recorder or digital recorder and let it run for a while – often with a white noise machine in the background.  Then you play it back later and you start hearing things you didn’t hear at the time.  The trick is that if you run these recordings through software intended to clean up audio in order to discover voices, they remarkably discover voices that you never heard but which must be the voice of ghosts.

I can’t help feeling, however, that it isn’t the world of extrasensory phenomena that is mysterious and baffling to us.  It’s all the crazy new technologies that appear every day that is truly supernatural and overwhelming.  Perhaps tying all of these frightening technologies to our traditional myths and collective superstitions is just a way of making sense of it all and normalizing it.

Book Review: Augmented Reality with Kinect

4384OT_Mini

Rui Wang’s Augmented Reality with Kinect from Pakt Publishing is my new favorite book about the Kinect sensor.  It’s a solid 5 out of 5 for me and if you want to learn how to use the Kinect 4W SDK 1.5 and above with C++, then this is the book for you.  That said, however, it is also an incredibly frustrating software programming book.

The first issue I have with it is that it isn’t really about Augmented Reality, as such.  The way AR fits in is simply that the central project created in the course of the book is a Fruit Ninja-style game using Kinect and with a player overlay.  AR seems very much incidental to the book.

What it actually is is an intro book to C++ and the Kinect for Windows SDK.  This is actually a much needed resource in the Kinect community and one I have been on the lookout for for a long time.  I’m not sure why the publisher decided to add this “AR” twist to the concept for the book.  It really wasn’t necessary.

Second, the book’s tool chain is Visual Studio 2012, C++, Kinect for Windows SDK 1.5 and OpenGL.  One of these is not like the others!  In the second chapter, we are then told that the book covers OpenGL rather than DirectX because “…it is only used under Windows currently, and can hardly support languages except C/C++ and C#.”  Hmmm.

With those reservations out of the way, this is a really fine book about programming for the Kinect sensor.  C++ is the right way to do vision processing and this is a great introduction to the topic.  Along the way, it even includes a nice overview of face tracking.

Kinect PowerPoint Mapper

I just published a Kinect mapping tool for PowerPoint allowing users to navigate through a PowerPoint slide deck using gestures.  It’s here on CodePlex: https://k4wppt.codeplex.com/ .  There are already a lot of these out there, by the way – one of my favorites is the one Josh Blake published.

So why did I think the world needed one more? 

kinect_for_windows_fig1

The main thing is that, prior to the release of the Kinect SDK 1.7, controlling a slide deck with a Kinect was prone to error and absurdity.  Because they are almost universally written for the swipe gesture, prior PowerPoint controllers using Kinect had a tendency to recognize any sort of hand waving gesture as an event.  Consequently, as a speaker innocently gesticulated through his point the slides would begin to wander on their own.

The Kinect for Windows team added the grip gesture as well as the push gesture in the SDK 1.7.  This required several months of computer learning work to get these recognizers to work effectively in a wide variety of circumstances.  They are extremely solid at this point.

The Kinect PowerPoint Mapper I just uploaded to CodePlex takes advantage of the grip gesture to implement a grab-and-throw for PowerPoint navigation.  This effectively disambiguates navigation gestures from other symbolic gestures a presenter might use during the course of a talk.

I see the Kinect PowerPoint Mapper serving several audiences:

1. It’s for people who just want a more usable Kinect-navigation tool for PowerPoint.

2. It’s a reference application for developers who want to learn how they can pull the grip and the push recognizers out of the Microsoft Kinect controls and use them in combination with other gestures.  (A word of warning, tho – while double grip is working really well in this project, double push seems a little flakey.)  One of the peculiarities of the underlying interfaces is that the push notification is a state, when for most purposes it needs to be an event.  The grip, on the other hand, is basically a pair of events (grip and ungrip) which need to be transposed into states.  The source code for the Mapper demonstrates how these translations can be implemented.

3. The Mapper is configuration based, so users can actually use it with PC apps other than PowerPoint simply by remapping gestures to keystrokes.  The current mappings in KinectKeyMapper.exe.config look like this:

    <add key="DoubleGraspAction" value="{F5}" />
    <add key="DoublePushAction" value="{Esc}" />
    <add key="RightSwipeWithGraspAction" value="{Right}" />
    <add key="LeftSwipeWithGraspAction" value="{Left}" />
    <add key="RightSwipeNoGraspAction" value="" />
    <add key="LeftSwipeNoGraspAction" value="" />
    <add key="RightPush" value="" />
    <add key="LeftPush" value="" />
    <add key="TargetApplicationProcessName" value="POWERPNT"/>

Behind the scenes, this is basically translating gesture recognition algorithms (some complex, some not so much) to keystrokes.  To have a gesture mapped to a different keystroke, just change the value associated with the gesture – making sure to include the squiggly brackets.  If the value is left blank, the gesture will not be read.  Finally, the TargetApplicationProcessName tells the application which process to send the keystroke to if there are multiple applications open at the same time.  To find a process name in Windows, just go into the Task Manager and look under the process tab.  The process name for all currently running applications can be found there – just remove the dot-E-X-E at the end of the name. 

4. The project ought to be extended as more gesture recognizers become available from Microsoft or as people just find good algorithms for gesture recognizers over time.  Ideally, there will ultimately be enough gestures to map onto your favorite MMO.  A key mapper created by the media lab at USC was actually one of the first Kinect apps I started following back in 2010.  It seemed like a cool idea then and it still seems cool to me today.

Free XBox One with purchase of Kinect

the difference between the Kinect and Kinect for Windows

It’s true.  In November, Microsoft will release the Kinect 2 for approximately $500.  The new Kinect2 comes with HD video at 30 fps (we currently get 640×480 with the Kinect1), much improved skeleton tracking and improved audio tracking.  One of the most significant changes is in depth tracking.  Instead of the Primesense structured light technology used in Kinect1, Kinect2 uses the more traditional and more accurate time-of-flight technology.  Since most Time of Flight depth cameras start at around $1K, getting this in a Kinect2 along with all the other features for half that price is pretty amazing.

But the deal doesn’t stop there.  If you buy the Kinect for XBox, you automatically get an XBox for free!  You actually can’t even buy the XBox on its own.  You only can get it if you buy the Kinect2.

How do they give the new XBox One away for free you may ask?  Apparently the price of the XBox One will be subsidized through game sales.  Since the games for XBox will tend to have some sort of Kinect capability – enabled by the requirement that you can’t get the XBox on its own – the expectation seems to be that these unique games will get enough sales that, through volume, the cost of producing the XBox One will eventually be recouped.

But what if you aren’t interested in gaming?  What if – like at my company, Razorfish – you are mainly interested in building commercial interfaces and artistic experiences with the Kinect technologies. 

In this case, Microsoft will be providing another version of the Kinect (one assumes that it will be called something like Kinect2 for Windows or perhaps K4W2 – its Star Wars droid name) that has a USB 3 adapter that will plug into a PC.  And because it is for people who are not interested in gaming, it will probably cost a bit less than $500 to make up for the fact that it doesn’t come with a free XBox One and won’t ever recoup that hardware cost from non-gamers.  By the way, this version of the Kinect sensor will be released some time – perhaps months? — following the K4X1 November release.

Finally, to make the distinction between the two kinds of Kinect2s clear, the Kinect2 for XBox will not plug into a PC and Kinect2 for Windows will not plug into an XBox.  It’s just cleaner that way.

With the original Kinect, there was quite a bit of confusion introduced by the fact that when it was released it used a typical USB connector that could be plugged into either the XBox 360 or a PC.  This turned out to be a great thing for Microsoft because it set off an amazing flood of creativity among hackers who started building their own frameworks and drivers to read the USB data and then build applications on top of it. 

Overnight, this grassroots Kinect Hacks movement made Microsoft cool again.  There is currently talk going around that the USB connector on the Kinect was simply fortuitous.  I’m pretty sure, however, that it was prescient – at least on someone’s part – and the intent was – again on someone’s part if not everyone’s – to provide the sort of platform that could be taken advantage of to build more than games.

As Microsoft moved forward with the development of the Kinect SDK as a platform for developers to build Kinect applications on, they decided that this should be coupled with a special version of the “Kinect” called Kinect for Windows that would carry special firmware supporting near mode.  Additionally, the commercial version of the hardware (which was pretty much the same as the the gaming version of the hardware) required a special dongle (see photo above) that would help regulate the power on PCs. The biggest difference between the two Kinects, however, was the licensing terms and the price.  Basically, if you wanted to use Kinect technology commercially with the Kinect SDK, you needed to use the Kinect for Windows sensor which carried a higher, un-subsidized price. 

This, naturally, caused a lot of confusion.  People wondered why Microsoft was overcharging for the commercial version of the sensor when with a Copernican frame of mind that might have just as easily asked why Microsoft was undercharging for the gaming version of the sensor.

With the Kinect2 sensors, all of this confusion is removed by fiat since the gaming version and commercial version now have different connectors.  From a hardware standpoint, rather than merely a legal one, you cannot use your gaming sensor with a PC.

Of course, you could also perform a Copernican revolution on my framing above and suggest that it isn’t the XBox One that is being subsidized through the purchase of the Kinect2 but rather the Kinect2 that is being subsidized through the purchase of the XBox One.

It’s all a bit of an accounting trick, isn’t it?  Basically the money has to come from somewhere.  Given that Microsoft received a lot of free, positive PR from the Kinect hacking movement, it would be cool if they gave a little back and made the non-gaming Kinect2 sensor more accessible. 

Contrarily, however, it is already the case that a time-of-flight camera for under $500 along with all the other features loaded onto the Kinect2 is a pretty amazing deal for weekend coders, installation artists, and retailers. 

In any case, it gives me peace of mind to think of the Kinect2 sensor as a $500 device that comes with a free XBox One.  A lot of the angst I might otherwise feel about pricing simply melts away.  Though if Microsoft felt like subsidizing the price of the K4W2 sensor with some of the excess money they make off of Sharepoint licenses, I’d be cool with that, too.

Kinect Application Project Template

Over the past year, every time I start a new Kinect for Windows project, I’ve basically just copied the infrastructure code from a previous project.  The starting point was the code my friend Jarrett Webb and I wrote for our book Beginning Kinect Programming with the Microsoft Kinect SDK, but I’ve made incremental improvements to this code as needed and based on pointers I’ve found in various places.  I finally realized that I’d made enough changes and it was time to just turn this base code into a project template for myself and my colleagues at work.  Realizing that there wasn’t a Kinect Application project template available yet on the visual studio gallery, I uploaded it there, also.

The cool thing about templates uploaded to the gallery is that anyone with visual studio can now install it from the IDE.  If you select Tools | Extension Manager … and then search for “Kinect” under the Online Gallery, you should see something like this.  From here you can install the Kinect Application project template to your computer.

Kinect Application Project Template

If you then create a new project and look under C# | Windows, you will be able to build a Kinect WPF application with a bit of a headstart.  Here are some key features:

1. Initialization Code

All the initialization code and Kinect stream event handlers are stubbed out in the InitSensor method.  All you need to do is uncomment the streams you want to use.  Additionally, the event handler code is also stubbed out with the proper pattern for opening and disposing of frame objects.  Whatever you need to do with the image, depth and skeleton frames can be done inside those using statements.  This code also uses the latest agreed upon best practices for efficiently managing streamed data as of the 1.7 SDK.

void sensor_ColorFrameReady(object sender
    , ColorImageFrameReadyEventArgs e)
{
    using (ColorImageFrame frame = e.OpenColorImageFrame())
    {
        if (frame == null)
            return;

        if (_colorBits == null) _colorBits = 
            new byte[frame.PixelDataLength];
        frame.CopyPixelDataTo(_colorBits);

        throw new NotImplementedException();
    }
}

2. Disposal Code

Whatever you enable in the InitSensor method you will need to disable and dispose of in the DeInitSensor method.  Again, this just requires uncommenting the appropriate lines.  The DeInitSensor also implements a disposal pattern that is somewhat popular now.  The sensor is actually shut down on a background thread rather than on the main thread.  I’m not sure if this is a best practice as such, but it resolves a problem many C# developers were running into in shutting down their Kinect-enabled applications.

3. Status Changed Code

The Kinect can actually be disconnected in mid-process or simply not be on when you first run an application.  It is also surprisingly common to forget to plug the Kinect’s power supply in.  Generally, your application will just crash in such situations.  If you properly handle the KinectSensors.StatusChanged event, however, your application will just start up again when you get the sensor plugged back in.  A pattern for doing this was first introduced in the KinectChooser component in the Developer Toolkit.  A lightweight version of this pattern is included in the Kinect Application Project Template.

void KinectSensors_StatusChanged(object sender
    , StatusChangedEventArgs e)
{
    if (e.Status == KinectStatus.Disconnected)
    {
        if (_sensor != null)
        {
            DeInitSensor(_sensor);
        }
    }

    if (e.Status == KinectStatus.Connected)
    {
        _sensor = e.Sensor;
        InitSensor(_sensor);
    }
}
 

4. Extension Methods

While most people were working on controls for the Kinect 4 Windows SDK, Clint Rutkas and the Coding4Fun guys brilliantly came up with the idea of developing extension methods for handling the various Kinect streams.

The extension methods included with this template provide lots of conversions from bitmaps byte arrays to BitmapSource types (useful for WPF image controls) and vice-versa.  This allows you to do something easy like display a color stream which otherwise can be rather hairy.  The snippet below assumes there is an image control in the MainWindow named canvas.

using (ColorImageFrame frame = e.OpenColorImageFrame())
{
    if (frame == null)
        return;

    if (_colorBits == null) _colorBits = 
        new byte[frame.PixelDataLength];
    frame.CopyPixelDataTo(_colorBits);

    // new line
    this.canvas.Source = 
        _colorBits.ToBitmapSource(PixelFormats.Bgr32, 640, 480);
}

More in line with the original Coding4Fun Toolkit, the extension methods also make some very difficult scenarios trivial – for instance background subtraction (also known as green screening), skeleton drawing, player masking.  These methods should make it easier for quickly mock up a demo or even show off the power of the Kinect in the middle of a presentation using just a few lines of code.

private void InitSensor(KinectSensor sensor)
{
    if (sensor == null)
        return;

    sensor.ColorStream.Enable();
    sensor.DepthStream.Enable();
    sensor.SkeletonStream.Enable();
    sensor.Start();
    this.canvas.Source = sensor.RenderActivePlayer();
}

Again, this code assumes there is an image control in MainWindow named canvas.  You’ll want to put the following code in the InitSensor method to ensure that the code is called again if your Kinect sensor accidentally gets dislodged.  To create a simple background subtraction image, enable the color, depth and skeleton streams and then call the RenderActivePlayer extension method.  By stacking another image beneath the canvas image, I create an effect like this:

me on tatooine

Here are some overloads of the RenderActivePlayer method and the effects they create.  I’ve removed Tatooine from the background in the following samples.

 

canvas.Source = sensor.RenderActivePlayer(System.Drawing.Color.Blue);

blue_man

 

canvas.Source = sensor.RenderActivePlayer(System.Drawing.Color.Blue
                , System.Drawing.Color.Fuchsia);

blue_fuschia

 

canvas.Source = sensor.RenderActivePlayer(System.Drawing.Color.Transparent
                , System.Drawing.Color.Fuchsia);

trasnparent_fuschia

 

And so on.  There’s also this one:

canvas.Source = sensor.RenderPredatorView();

predator

 

… as well as this oldie but goodie:

canvas.Source = sensor.RenderPlayerSkeleton();
skeleton_view

The base method uses the colors (and quite honestly most of the code) from the Kinect Toolkit that goes with the SDK.  As with the RenderActivePlayer extension method, however, there are lots of overrides so you can change all the colors if you wish to.

canvas.Source = sensor.RenderPlayerSkeleton(System.Drawing.Color.Turquoise
    , System.Drawing.Color.Indigo
    , System.Drawing.Color.IndianRed
    , trackedBoneThickness: 1
    , jointThickness: 10);

balls

 

Finally, you can also layer all these different effects:

canvas.Source = sensor.RenderActivePlayer();
canvas2.Source = sensor.RenderPlayerSkeleton(System.Drawing.Color.Transparent);

everything

PRISM, Xbox One Kinect, Privacy and Semantics

loose lips

It’s interesting that at one time getting people to keep quiet was a priority for the government.  During World War II the government promoted a major advertising campaign to remind people that “loose lips sink ships.”  During war time (back when wars were temporary affairs), it was standard practice to suppress the flow of information and censor personal letters to ensure that useful information would not fall into enemy hands.  In a sense, privacy and national security were one.

Recent leaks about the NSA’s PRISM program suggest that things have dramatically changed.  We’ve realized for several years now that our cell phone service providers, our social networks, and our search engines are constantly tracking our physical and digital movements and mining that data for marketing.  We basically have traded our privacy for convenience in the same way that we accept ads on TV and on the Internet in exchange for free content. 

The dark side of all this is when all of this information is being passed along to third parties we didn’t even know about until we start getting junk mail in our inboxes for products we have no interest in.

What we only suspected, until now, was that the infrastructure that has been built to support these transactions of personal information for services were also of interest to our government and that we are sharing our identifying information not only with content providers, service providers, spammers and junk mailers but also with the United States security apparatus.  Now that all that information has been collected, the government wants to mine it also.

We don’t live in a police state today.  I don’t belong to either the far right wing nor the far left wing – I’m neither an occupier nor a tea partay kind of guy – so I also don’t believe we are even close to slipping into a police state in the near future.  I’m not concerned that the government will or ever will use this information to track me down and I am pretty confident that all this data mining will mainly be used only to track down terrorists and to send me unwanted emails.  And yet, it bugs me on a visceral level that people are going through my stuff, whatever that ethereal stuff actually is.

Gears of War

The main argument against this cooties feeling about my privacy is that only metadata is being inspected and not actual content.  Unfortunately, this seems like a porous boundary to me.  To paraphrase Hegel’s overarching criticism of Kant, whenever we draw a line we also necessarily have to cross over it at the same time.  From everything I know about software, the only way to gather metadata is to inspect the content in order to generate metadata about it.  For instance, when a government computer system listens to phone traffic in order to pick out key words and constellations of words, it still has to listen to all the other words first in order to pick out what it is interested in. 

Moreover, according to Slate, the data mining being done by PRISM is incredibly broad:

It appears the National Security Agency’s sweeping surveillance is not something only Verizon customers should be concerned about. The agency has also reportedly obtained access to the central servers of major U.S. Internet companies as part of a secret program that involves the monitoring of emails, file transfers, photos, videos, chats, and even live surveillance of search terms.

The semantics of Privacy today, as defined under the regime of the NSA, doesn’t mean no one is listening to what you are saying – it just means no one cares.  The best way to protect one’s privacy today is to simply be boring.

At the same time that all these revelations about PRISM were coming out (in fact on the very same day), Microsoft released a brief about privacy concerns around the new Xbox One’s Kinect peripheral.  Here’s an attempted explanation of the brief on Windows Phone Central I found particularly fascinating:

A lot of people feared that the Kinect would be able to listen to you when the Xbox One was off. Apparently, when off, the Xbox One is only listening for one command in its low-power state: “Xbox On”. It’s nice to know that you’re in control when the Kinect is on, off or paused. Some games though will require Kinect functionality (again, at the discretion of the game developers/publisher). That’s up to you to play or not play those games.

 

The author’s reassurance is based on a semantic sleight-of-hand.  The Kinect is not listening to you, according to the author, because it “is only listening for one command.”  This is an honest mistake, but a dangerous one.  In fact, in order to listen for one command, the Kinect has to have that microphone turned on and listening to everything anyone is saying.  What it is actually doing is only acting on one command – and hopefully throwing away everything else.  Additionally I do have a bit of experience with Microsoft’s speech recognition technology both on the Kinect and on the PC, and the “low-power state” modifier doesn’t particularly make sense.  It takes a similar amount of effort to identify insignificant data as it does to identify significant data, AFAIK. (There’s always the possibility that the Xbox Kinect has an on-board language processor just to listen for this one command that is separate from the rest of its speech recognition processing chain – but I haven’t heard about anything like that so far.)

Halo IV

The original Microsoft brief called Privacy by Design, upon which I assume the Windows Phone Central post is based, doesn’t play this particular semantic game – though it plays another.  At the same time, it also seems particularly and intentionally vague about certain points.

The semantic game in Microsoft’s Privacy post is around the term ‘design’.  Does design  here refer to the hardware design, the software architecture, the usability design or the marketing campaign?  These are all things that are encompassed by the term design and, in the linked article, privacy could be referring to any of them.  If it refers to the marketing campaign and UX, as it probably does, this doesn’t actually provide me any guarantees of privacy.  All it tells me is that Microsoft doesn’t initially intend to use the new Kinect sitting in my living room to collect random conversations.  ‘Design’ may refer to the initial software architecture, but this doesn’t provide us with any particular guarantees since any post-release software update can change the way the software works.

To put this another, way, the article describes Microsoft’s intent but doesn’t provide any guarantees.  Is there anything in the hardware that will prevent speech data from being mined in the future?  Probably not.  In that case, is there anything in the licensing that prevents Microsoft from mining this data?  Microsoft’s privacy brief doesn’t even touch on this.

So should you be concerned?  Totally – and here’s why.  In its pursuit of security, the NSA has instituted an infrastructure that performs better and better the more information it is fed.  Do terrorists play Xbox?  I have no idea.  Would the NSA want all that data anyways? 

Call of Duty

Hypothetically, the new Xbox One and the Kinect can collect this information on us.  Here’s how.  According to recent Microsoft announcements, the Xbox One must be connected to the Internet once every 24 hours in order to play games on it.  The new Kinect is designed to always be on and I am obligated to have it (I can’t buy a Kinect One without it).  Even when my Xbox One is off, my Kinect is still on listening for a command to turn it on.  The infrastructure is there and the NSA’s PRISM project is a monster that is hungry for it.

To be clear, I don’t think Microsoft is particularly interested in collecting this data.  Microsoft has no particular use for the typically rather boring conversations I have in my living room.  They won’t be gleaning any particularly useful marketing information from my conversations either. 

Nevertheless, I think it would be extremely forward looking of Microsoft to explain what they have put in place to prevent the government from ever issuing a request for this data and getting it the way they have already gotten other data, so far, from Verizon, AT&T, Microsoft, Yahoo, Google, Facebook, AOL, Skype, YouTube, and Apple.

Has Microsoft designed a mechanism, either through hardware or through a customer agreement they won’t/can’t rescind in the future, that will future proof my privacy?