How HoloLens Sensors Work


[hardware specs were released this week. This post is now updated to reflect the final specs.]

In addition to a sophisticated AR display, the Micosoft HoloLens contains a wide array of sensors that constantly collect data about the user’s external and internal environments. These sensors are used to synchronize the augmented reality world with the real world as well as respond to commands. The HoloLens’s sensor technology can be thought of as a combination of two streams of research: one from the evolution of the Microsoft Kinect and the other from developments in virtual reality positioning technology. While what follows is almost entirely just well-informed guesswork, we can have a fair degree of confidence in these guesses based on what is already known publicly about the tech behind the Kinect and well documented VR gear like the Oculus Rift.

While this article will provide a broad survey of the HoloLens sensor hardware, the reader can go deeper into this topic on her own through resources like the book Beginning Kinect Programming by James Ashley and Jarrett Webb, Oliver Kreylos’s brilliant Doc-OK blog, and the perpetually enlightening Oculus blog.

Let’s begin with a list of the sensors believed to be housed in the HoloLens HMD:

  1. Gyroscope
  2. Magnetometer
  3. Accelerometer
  4. Internal facing eye tracking cameras (?)
  5. Ambient Light Detector (?)
  6. Microphone Array (4 (?) mics)
  7. Depth sensors Grayscale Cameras (4)
  8. RGB cameras (1)
  9. Depth sensor (1)

The first three make up an Inertial Measurement Unit often found in head-mounted displays for AR as well as VR. The eye tracker is technology that became commercialized by 3rd parties like Eye Tribe following the release of the Kinect but not previously used in Microsoft hardware – though it isn’t completely clear that there is any sort of eye tracking being used. There is a small sensor at the front that some people assume is an ambient light detector. The last three are similar to technology found in the Kinect.

microphone array
copyright Adobe Stock

I want to highlight the microphone array first because it was always the least understood and most overlooked feature of Kinect. The microphone array is extremely useful for speech recognition because it can distinguish between vocal commands from the user and ambient noise. Ideally, it should also be able to amplify speech from the user so commands can be heard even over a noisy room. Speech commands will likely be lit up by integrating the mic array with Microsoft’s cloud-based Cortana speech rec technology rather than something like the Microsoft Speech SDK. Depending on how the array is oriented, it may also be able to identify the direction of external sounds. In future iterations of HoloLens, we may be able to marry the microphone array’s directional capabilities with the RGB cameras and face recognition to amplify speech from our friends through the biaural audio speakers built into HoloLens.

copyright Microsoft

Eye tracking cameras are part of a complex mechanism allowing the human gaze to be used in order to manipulate augmented reality menus. When presented with an AR menu, the user can gaze at buttons in the menu in order to highlight them. Selection then occurs either by maintaining the gaze or by introducing an alternative selection mechanism like a hand press – which would in turn use the depth camera combined with hand tracking algorithms. Besides being extremely cool, eye tracking is a NUI solution to a problem many of us have like encountered with the Kinect on devices like Xbox. As responsive as hand tracking can be using a depth camera, it still has lag and jitteriness that makes manipulation of graphical user interface menus tricky. There’s certainly an underlying problem in trying to transpose one interaction paradigm, menu manipulation, into another paradigm based on gestures. Similar issues occur when we try to put interaction paradigms like a keyboard on a touch screen — it can be made to work, but isn’t easy. Eye tracking is a way to remove friction when using menus in augmented reality. It’s fascinating, however, to imagine what else we could use it for in future HoloLens iterations. It can be used to store images and environmental data whenever our gaze dwells for a threshold amount of time on external objects. When we want to recall something we saw during the day, the HoloLens can bring it back to us: that book in the book store, that outfit the guy in the coffee shop was wearing, the name of the street we passed on the way to lunch. As we sleep each night, perhaps these images can be analyzed in the cloud to discover patterns in our daily lives of which we were previously unaware.

Kinect has a feature called coordinate mapping which allows you to compare pixels from the depth camera and pixels from the color camera. Because the depth camera stream contained information about pixels belonging to human beings and those that did not, the coordinate mapper could be used to identify people in the RGB image. The RGB image in turn could be manipulated to do interesting things with the human-only pixels such as background subtraction and selective application of shaders such that these effects would appear to follow the player around. HoloLens must do something similar but on a vastly grander scale. The HoloLens must map virtual content onto 3D coordinates in the world and make them persist in those locations even as the user twists and turns his head, jumps up and down, and moves freely around the virtual objects that have been placed in the world. Not only must these objects persist, but in order to maintain the illusion of persistence there can be no perceivable lag between user movements and redrawing the virtual objects on the HoloLens’s two stereoscopic displays – perhaps no more than 20 ms of delay.

This is a major problem for both augmented and virtual reality systems. The problem can be broken up into two related issues: orientation tracking and position tracking. Orientation tracking determines where we are looking when wearing a HMD. Position tracking determines where we are located with respect to the external world.

head orientation tracking
copyright Adobe Stock: Sergey Niven

Orientation tracking is accomplished through a device known as an Inertial Measurement Unit which is made up of a gyroscope, magnetometer and accelerometer. The inertial unit of measure for an Inertial Measurement Unit (see what I did there?) is radians per second (rad/s), which provides the angular velocity of any head movements. Steve LaValle provides an excellent primer on how the data from these sensors are fused together on the Oculus blog. I’ll just provide a digest here as a way to explain how HoloLens is doing roughly the same thing.

The gyroscope is the core head orientation tracking device. It measures angular velocity. Once we have the values for the head at rest, we can repeatedly check the gyroscope to see whether our head has moved and in which direction it has moved. By comparing the velocity of that movement as well as the direction and comparing this to the amount of time that has passed, we can determine how the head is currently oriented compared to its previous orientation. In fact the Oculus does this one thousand times per second and we can assume that HoloLens is collecting data at a similarly furious rate.

Over time, unfortunately, the gyroscope’s data loses precision – this is known as “drift.” The two remaining orientation trackers are used to correct for this drift. The accelerometer performs an unexpected function here by determining the acceleration due to the force of gravity. The accelerometer provides the true direction of “up” (gravity pulls down so the acceleration we feel is actually upward, as in a rocket ship flying directly up) which can be used to correct the gyroscope’s misconstrued impression of the real direction of up. “Up,” unfortunately, doesn’t provide all the correction we need. If you turn your head right and left to make the gesture for “no,” you’ll notice immediately that knowing up in this case tells us nothing about the direction in which your head is facing. In this case, knowing the direction of magnetic north would provide the additional data needed to correct for yaw error – which is why a magnetometer is also a necessary sensor in HoloLens.

position tracking
copyright Adobe Stock

Even though the IMU, made up of a gyroscope, magnetometer and accelerometer, is great for determining the deltas in head orientation from moment to moment, it doesn’t work so well for determining diffs in head position. For a beautiful demonstration of why this is the case, you can view Oliver Kreylos’s video Pure IMU-Based Positional Tracking is a No-Go. For a very detailed explanation, you should read Head Tracking for the Oculus Rift by Steven LaValle and his colleagues at Oculus.

The Oculus Rift DK2 introduced a secondary camera for positional tracking that sits a few feet from the VR user and detects IR markers on the Oculus HMD. This is known as outside-in positional tracking being the external camera determines the location of the goggles and passes it back to Oculus software. This works well for the Oculus mainly because the Rift is a tethered device. The user sits or stands in a place near to the computer that runs the experience and cannot stray far from there.

There are some alternative approaches to positional tracking which allow for greater freedom of movement. The HTC Vive virtual reality system, for instance, uses two stationary devices in a setup called Lighthouse. Instead of stationary cameras like the Oculus Rift uses, these Lighthouse boxes are stationary emitters of infrared light that the Vive uses to determine it’s position in a room with respect to them. This is sometimes called an inside-out positional tracking solution because the HMD is determining it’s location relative to known external fixed positions.

Google’s Project Tango is another example of inside-out positional tracking that uses the sensors built into handheld devices (smart phones and tablets) in order to add AR and VR functionality to applications. Because these devices aren’t packed into IMUs, they can be laggy. To compensate, Project Tango uses data from onboard device cameras to determine the orientation of the room around the device. These reconstructions are constantly compared against previous reconstructions in order to determine both the device’s position as well as its orientation with respect to the room surfaces around it.

It is widely assumed that HoloLens uses a similar technique to correct for positional drift from the Inertial Measurement Unit. After all, HoloLens has four depth IR grayscale (?) cameras built into it. The IMU, in this supposition, would provide fast but drifty positional data while the combination of data from the four depth grayscale cameras and an RGB cameras provide possibly slower (we’re talking in milliseconds, after all) but much more accurate positional data. Together, this configuration provides inside-out positional tracking that is truly tether-less. This is, in all honestly, a simply amazing feat and almost entirely overlooked in most overviews of the HoloLens.

The secret sauce that integrates camera data into an accurate and fast reconstruction of the world to be used, among other things, for position tracking is called the Holographic Processing Unit – a chip the Microsoft HoloLens team is designing itself. I’ve heard from reliable sources that fragments from Stonehenge are embedded in each chip to make this magic work.

AR wordart

On top of this, the depth sensors, IR cameras, and RGB cameras will likely be accessible as independent data streams that can be used for the same sorts of functionality for which they have been used in Kinect applications over the past four years: art, research, diagnostic, medical, architecture, and gaming. Though not discussed previously, I would hope that complex functionality we have become familiar with from Kinect development like skeleton tracking and raw hand tracking will also be made available to HoloLens developers.

Such a continuity of capabilities and APIs between Kinect and HoloLens, if present, would make it easy to port the thousands of Kinect experiences the creative and software communities have developed over the years leading up to HoloLens. This sort of continuity was, after all, responsible for the explosion of online hacking videos that originally made the Kinect such an object of fascination. The Kinect hardware used a standard USB connector that developers were able to quickly hack and then pass on to –- for the most part –- pre-existing creative applications that used less well known, less available and non-standard depth and RGB cameras. The Kinect connected all these different worlds of enthusiasts by using common parts and common paradigms.

It is my hope and expectation that HoloLens is set on a similar path.

[This post has been updated 11/07/15 following opportunities to make a closer inspection of the hardware while in Redmond, WA. for the MVP Global Summit. Big thanks to the MPC and HoloLens groups as well as the Emerging Experiences MVP program for making this possible.]

[This post has been updated again 3/3/15 following release of final specs.]

Augmented Reality without Helmets


Given current augmented reality technologies like Magic Leap and HoloLens, it has become a reflexive habit to associate augmented reality with head-mounted displays.

This tendency has always been present and has to undergo constant correction as in this 1997 paper by the legendary Ron Azuma that provides a survey of AR:

“Some researchers  define AR in a way that requires the use of Head-Mounted Displays (HMDs). To avoid limiting AR to specific technologies, this survey defines AR as systems that have the following three characteristics:


1) Combines real and virtual

2) Interactive in real time

3) Registered in 3-D


“This definition allows other technologies besides HMDs while retaining the essential components of AR.”

Azuma goes on to describe the juxtaposition of real and virtual content from Who Framed Roger Rabbit as an illustrative example of AR as he has defined it. Interestingly, he doesn’t cite the holodeck from Star Trek as an example of HMD-less AR – probably because it is tricky to use fantasy future technology to really prove anything.

Nevertheless, the holodeck is one of the great examples of the sort of tetherless AR we all ultimate want. It often goes under the name “hard AR” and finds expression in Vernor Vinge’s Hugo winning Rainbow’s End.

The Star Trek TNG writers were always careful not to explain too much about how the holodeck actually worked. We get a hint of it, however, in the 1988 episode Elementary, Dear Data in which Geordi, Data and Dr. Pulaski enter the holodeck in order to create an original Sherlock Holmes adventure for Data to solve. This is apparently the first time Dr. Pulaski has seen a state-of-the-art holodeck implementation.

Pulaski:  “How does it work? The real London was hundreds of square kilometers in size.”


Data:  “This is no larger than the holodeck, of course, so the computer adjusts by placing images of more distant perspective on the holodeck walls.”


Geordi:  “But with an image so perfect that you’d actually have to touch the wall to know it was there. And the computer fools you in other ways.”

What fascinates me about this particular explanation of holodeck technology is that it sounds an awful lot like the way Microsoft Research’s RoomAlive project works.


RoomAlive uses a series of coordinated projectors, typically calibrated using Kinects, to project realtime interactive content on the walls of the RoomAlive space using a technique called projection mapping.

You might finally notice some similarities between the demos for RoomAlive and the latest gaming demos for HoloLens.


These experiences are cognates rather than derivations of one another. The reason so many AR experiences end up looking similar, despite the technology used to implement them (and more importantly regardless of the presence or absence of HMDs), is because AR applications all tend to solve the same sorts of problems that other technologies, like virtual reality, do not.


According to an alternative explanation, however, all AR experiences end up looking the same because all AR experiences ultimately borrow their ideas from the Star Trek holodeck.

By the way, if you would like to create your own holodeck-inspired experience, the researchers behind RoomAlive have open sourced their code under the MIT license. Might I suggest a Sherlock Holmes themed adventure?

How Hololens Displays Work


There’s been a lot of debate concerning how the HoloLens display technology works. Some of the best discussions have been on reddit/hololens but really great discussions can be found all over the web. The hardest problem in combing through all this information is that people come to the question at different levels of detail. A second problem is that there is a lot of guessing involved and the amount of guessing going on isn’t always explained. I’d like to correct that by providing a layered explanation of how the HoloLens displays work and by being very up front that this is all guesswork. I am a Microsoft MVP in the Kinect for Windows program but do not really have any insider information about HoloLens I can share and do not in any way speak for Microsoft or the HoloLens team. My guesses are really about as good as the next guy’s.

High Level Explanation


The HoloLens display is basically a set of transparent screens placed just in front of the eyes. Each eyepiece or screen lets light through and also shows digital content the way your monitor does. Each screen shows a slightly different image to create a stereoscopic illusion like the View Master toy does or 3D glasses do at 3D movies.

A few years ago I worked with transparent screens created by Samsung that were basically just LCD screens with their backings removed. LCDs work by suspending liquid crystals between layers of glass. There are two factors that make them bad candidates for augmented reality head mounts. First, they require soft backlighting in order to be reasonably useful. Second, and more importantly, they are too thick.

At this level of granularity, we can say that HoloLens works by using a light-weight material that displays color images while at the same time letting light through the displays. For fun, let’s call this sort of display an augmented reality combiner, since it combines the light from digital images with the light from the real world passing through it.


Intermediate Level Explanation

Light from the real world passes through two transparent pieces of plastic. That part is pretty easy to understand. But how does the digital content get onto those pieces of plastic?


The magic concept here is that the displays are waveguides. Optical fiber is an instance of a waveguide we are all familiar with. Optical fiber is a great method for transferring data over long distances because is is lossless, bouncing light back and forth between its reflective internal surfaces.


The two HoloLens eye screens are basically flat optical fibers or planar waveguides. Some sort of image source at one end of these screens sends out RGB data along the length of the transparent displays. We’ll call this the image former. This light bounces around the internal front and back of each display and in this manner traverses down its length. These light rays eventually get extracted from the displays and make their way to your pupils. If you examine the image of the disassembled HoloLens at the top, it should be apparent that the image former is somewhere above where the bridge of your nose would go.


Low Level Explanation

The lowest level is where much of the controversy comes in. In fact, it’s such a low level that many people don’t realize it’s there. And when I think about it, I pretty much feel like I’m repeating dialog from a Star Trek episode about dilithium crystals and quantum phase converters. I don’t really understand this stuff. I just think I do.

In the field of augmented reality, there are two main techniques for extracting light from a waveguide: holographic extraction and diffractive extraction. A holographic optical element has holograms inside the waveguide which route light into and out of the waveguide. Two holograms can be used at either end of the microdisplay: one turns the originating image 90 degrees from the source and sends it down the length of the waveguide. Another intercepts the light rays and turns them another ninety degrees toward the wearer’s pupils.

A company called TruLife Optics produces these types of displays and has a great FAQ to explain how they work. Many people, including Oliver Kreylos who has written quite a bit on the subject, believe that this is how the HoloLens microdisplays work. One reason for this is Microsoft’s emphasis on the terms “hologram” and “holographic” to describe their technology.

On the other hand, diffractive extraction is a technique pioneered by researchers at Nokia – for which Microsoft currently owns the patents and research. Due to a variety of reasons, this technique falls under the semantic umbrella of a related technology called Exit Pupil Expansion. EPE literally means making an image bigger (expanding it) so it covers as much of the exit pupil as possible, which means your eye plus every area your pupil might go to as you rotate your eyeball to take in your field of view (about a 10mm x 8mm rectangle or eye box). This, in turn, is probably why measuring the interpupillary distance is a large aspect of fitting people for the HoloLens.


Nanometer wide structures or gratings are placed on the surface of the waveguide at the location where we want to extract an image. The grating effectively creates an interference pattern that diffracts the light out and even enlarges the image. This is known as SRG or surface relief grating as shown in the above image from

Reasons for believing HoloLens is using SRG as its way of doing EPE include the Nokia connection as well as this post from Jonathan Lewis, the CEO of TruLife, in which Lewis states following the original HoloLens announcement that it isn’t the holographic technology he’s familiar with and is probably EPE. There’s also the second edition of Woodrow Barfield’s Wearable Computers and Augmented Reality in which Barfield seems pretty adamant that diffractive extraction is used in HoloLens. Being a professor at the University of Washington, which has a very good technology program as well as close ties to Microsoft, he may know something about it.

On the other hand, it doesn’t get favored or disfavored in this Microsoft patent clearly talking about HoloLens that ends up discussing both volume holograms (VH) as well as surface relief grating (SRG). I think HL is more likely to be using diffractive extraction rather than holographic extraction, but it’s by no means a sure thing.


Impact oN Field of View

An important aspect of these two technologies is that they both involve a limited field of view based on the ways we are bouncing and bending light in order to extract it from the waveguides. As Oliver Kreylos has eloquently pointed out, “the current FoV is a physical (or, rather, optical) limitation instead of a performance one.” In other words, any augmented reality head mounted display (HMD) or near eye display (NED) is going to suffer from a small field of view when compared to virtual reality devices. This is equally true of the currently announced devices like HoloLens and Magic Leap, the currently available AR devices like those by Vuzix and DigiLens, and the expected but unannounced devices from Google, Facebook and Amazon.  Let’s call this the keyhole problem (KP).


The limitations posed by KP are a direct result of the need to use transparent displays that are actually wearable. Given this, I think it is going to be a waste of time to lament the fact that AR FOVs are smaller than we have been led to expect from the movies we watch. I know Iron Man has already had much better AR for several years with a 360 degree field of view but hey, he’s a superhero and he lives in a comic book world and the physical limitations of our world don’t apply to him.

Instead of worrying that tech companies for some reason are refusing to give us better augmented reality, it probably makes more sense to simply embrace the laws of physics and recognize that, as we’ve been told repeatedly, hard AR is still several years away and there are many technological breakthroughs still needed to get us there (let’s say five years or even “in the Windows 10 timeframe”).

In the meantime, we are being treated to first generation AR devices with all that the term “first generation” entails. This is really just as well because it’s going to take us a lot of time to figure out what we want to do with AR gear, when we get beyond the initial romantic phase, and a longer amount of time to figure out how to do these experiences well. After all, that’s where the real fun comes in. We get to take the next couple of years to plan out what kinds of experiences we are going to create for our brave new augmented world.

Terminator Vision


James Cameron’s film The Terminator introduced an interesting visual effect that allowed audiences to get inside the head and behind the eyes of the eponymous cyborg. What came to be called terminator vision is now a staple of science fiction movies from Robocop to Iron Man. Prior to The Terminator, however, the only similar robot-centric perspective shot seems to have been in the 1973 Yul Brynner thriller Westworld. Terminator vision is basically a scene filmed from the T-800’s point-of-view. What makes the terminator vision point-of-view style special is that the camera’s view is overlaid with informatics concerning background data, potential dialog choices, and threat assessments.


But does this tell us anything about how computers actually see the world? With the suspension of disbelief typically required to enjoy science fiction, we can accept that a cyborg from the future would need to identify threats and then have contingency plans in case the threat exceeds a certain threshold. In the same way, it makes sense that a cyborg would perform visual scans and analysis of the objects around him. What makes less sense is why a computer would need an internal display readout. Why does the computer that performs this analysis need to present the data back to itself to read on its own eyeballs?


Looked at from another way, we might wonder how the T-800 processes the images and informatics it is displaying to itself inside the theater of its own mind. Is there yet another terminator inside the head of the T-800 that takes in this image and processes it? Does the inner terminator then redisplay all of this information to yet another terminator inside its own head – an inner-inner terminator? Does this epiphenomenal reflection and redisplaying of information go on ad infinitum? Or does it make more sense to simply reject the whole notion of a machine examining and reflecting on its own visual processing?


I don’t mean to set up terminator vision as a straw man in this way just so I can knock it down. Where terminator vision falls somewhat short in showing us how computers see the world, it excels in teaching us about how we human beings see computers. Terminator vision is so effective as a story telling trope because it fills in for something that cannot exist. Computers take in data, follow their programming, perform operations and return results. They do not think, as such. They are on the far side of an uncanny valley, performing operations we might perform but more quickly and without hesitation. Because of this, we find it reassuring to imagine that computers deliberate in the same way we do. It gives us pleasure to project our own thinking processes onto them. Far from being jarring, seeing dialog options appear on Arnold Schwartzenegger’s inner vidscreen like a 1990’s text-based computer game is comforting because it paves over the uncanny valley between humans and machines.

Virtual Names for Augmented Reality (Or Why “Mixed-Reality” is a Bad Moniker)


It’s taken about a year but now everyone who’s interested can easily distinguish between augmented reality and virtual reality. Augmented reality experiences like the one provided by HoloLens combine digital and actual content. Virtual reality experiences like that provided by Oculus Rift are purely digital experiences. Both have commonalities such as stereoscopy, head tracking and object positioning to create the illusion that the digital objects introduced into a user’s field of view have a physical presence and can be walked around.

Sticklers may point out that there is a third kind of experience called a head-up display in which informatics are displayed at the top corner of a user’s field of view to provide digital content and text. Because head-up display devices like the now passe Google Glass do not overlay digital content on top of real world content, but instead displays them more or less side-by-side, it is not considered augmented reality.

Even with augmented reality, however, a distinction can be drawn between informational content and digital content made up of 3D models. The informational type of augmented reality, as in the picture of my dog Marcie above, is often called the Terminator view, after the first-person (first-cyborg?) camera perspective used as a story telling device in the eponymous movie. The other type of augmented reality content has variously been described inaccurately as holography by marketers or, more recently, mixed reality.

The distinction is being drawn largely to distinguish what might be called hard AR from the more typical 2D overlays on smart phones that help you find a pizza restaurant. Mixed reality is a term intended to emphasize the point that not all AR is created equal.

Abandoning the term “augmented reality” in favor of “mixed reality” to describe HoloLens and Magic Leap, however, seems a bit drastic and recalls Gresham’s Law, the observation that bad money drives out good money. When the principle is generalized, as Albert Jay Knock did in his brilliant autobiography Memoirs of a Superfluous Man, it simply means that counterfeit and derivative concepts will drive out the authentic ones.

This is what appears to be happening here. Before the advent of the iPhone, researchers were already working on augmented reality. The augmented reality experiences they were building, in turn, were not Terminator vision style. Early AR projects like KARMA from 1992 were like the type of experiences that are now being made possible in Redmond and Hollywood, Florida. Terminator vision apps only came later with the mass distribution of smart phones and the fact that flat AR experiences are the only type of AR those devices can support.

I prefer the term augmented reality because it contains within itself a longer perspective on these technologies. Ultimately, the combination of digital and real content is intended to uplift us and enhance our lives. If done right, it has the ability to re-enchant everyday life. Compared to those aspirations, the term “mixed reality” seems overly prosaic and fatally underwhelming.

I will personally continue to use the moniker “mixed reality” as a generic term when I want to talk about both virtual reality and augmented reality as a single concept. Unless the marketing juggernaut overtakes me, however, I will give preference to the more precise and aspirational term “augmented reality” when talking about HoloLens, Magic Leap and cool projects like RoomAlive.

The Mvp Program REORG Explained Through Gamification

Today chief Microsoft evangelist Steve Guggenheimer announced dramatic changes to the MVP program on his blog.


In case you are unfamiliar with the MVP program, it is a recognition Microsoft gives to members of the developer community and is generally understood as a mark of expertise in a particular Microsoft technology (e.g. Windows Phone, Outlook, Kinect for Windows). In truth, though, there are many ways to get the MVP recognition without necessarily being an expert in any particular technology and running user groups or helping at coding events are common ways.  The origins of the program go back to the days when it was given out for answering forum questions about Microsoft technologies. This is a good way to understand the program – it is a reward of sorts for people who are basically helping their communities out as well as helping Microsoft out. Besides the status conferred by the award, the MVP includes an annual subscription to MSDN and an annual invitation to the Redmond campus for the MVP summit. Depending on the discipline as well as marketing cycles, you may also have access to regular calls with particular product teams. MVPs also have to renew every year by explaining what they’ve done in the prior 12 months to help the developer, IT or consumer community.

As with any sort of status thingy that confers a sense of self-worth and may even affect income, it is occasionally a source of turmoil, stress and drama for people. Like soccer mom levels of drama. For instance, occasionally a product category like Silverlight will just disappear and that particular discipline has to be scrapped. The people who are Silverlight MVPs will typically feel hurt by this and understandably feel slighted.  They didn’t become suddenly unworthy, after all, simply because the product they had poured so much energy into isn’t around anymore.

Some products are hot and some are not, while others start off hot then become not. If you were one of those Silverlight MVPs, you probably would like to point out that you are in fact worthy and know lots of other things but had been ignoring other technical interests in order to promote just Silverlight. You probably would feel that it is unjust to be punished for overinvesting in one technology.

In response to situations such as this, the Microsoft MVP program is undergoing a re-organization.

I’ll quote the synoptic statement from Steve’s post:

Moving forward, the MVP Award structure will shift to encompass the broad array of community contributions across technologies. For our Developer and IT Pro oriented MVPs, we’re moving from 36 areas of technical expertise to a set of 10 broader categories that encompass a combined set of 90 technology areas—including open source technologies.


The best most fun way to understand this is in terms of Dungeons & Dragons. To do so it is important that I first try to explain the difference between class-based role playing games and skills-based RPGs. Diablo is a great example of a class-based RPG. You choose from a handful of classes like barbarian, demon hunter or monk, and based on that your skills are pretty much picked out for you. At the opposite extreme is a game like Fallout where you have full control over how to upgrade your abilities; the game doesn’t prescribe how you should play at all. In the middle are RPGs like World of Warcraft which has cross-class skills but also provides a boost to certain skills depending on what class you initially choose. Certain class/skill combinations are advisable, but none are proscribed. You have freedom to play the game the way you want – for instance as an elf hunter with mutton chops and a musket. Totally do-able.

Dungeons & Dragons is a game that changes its rules every so often and causes lots of consternation whenever it does so. One of the corrections happened between D&D 3.0 and D&D 3.5 when the game went from a simplified class-based system to a more open skills based system. This allowed players a lot more freedom in how they customized their characters who could now gain skills that aren’t traditionally tied to their class.

The MVP program is undergoing the same sort of correction, moving from a class-based gaming system to a skills-based gaming system. Instead of just being a Silverlight MVP, you can now be an fifth-level druid with Javascript and handle animals skills, or a third-level Data Platform MVP with interests in IoT,  Azure machine learning, alchemy, light armor and open lock. You can customize the MVP program to fit your style of play rather than letting the program prescribe what sort of tech things you need to be working on.

This, I believe, will help meliorate the problem of people basing their self-worth on a fixed idea of what their MVP-ness means or the bigger problem of comparing their MVP-ness to the MVP-nesses of others. Going forward, one’s MVP-ness is whatever one makes of it. And that’s a good thing.

The Next Big Thing in Depth Sensors


Today Orbbec3D, my employer, announced a new depth sensor called the Orbbec Persee. We are calling it a camera-computer because instead of attaching a sensor to a computer, we’ve taken the much more reasonable approach of putting the computer inside our sensor. It’s basically a 3D camera that doesn’t require a separate computer hooked up to it because it has its own onboard CPU while maintaining a small physical footprint. This is the same sort of move being made by products like the HoloLens. 

Unlike the Oculus Rift which requires an additional computer or Google Glass which needs a CPU on a nearby smartphone, the Persee falls into a category of devices that perform their own computing and that you can program as well as load pre-built software on.

For retail scenarios like advanced proximity detection or face recognition, this means compact installations. One of the major difficulties with traditional “peripheral” cameras is placing computers in such a way that they stay hidden while also allowing for air circulation and appropriate heat dissipation. Having done this multiple times, I can confirm that this is a very tricky problem and typically introduces multiple points of failure. The Persee removes all those obstacles and allows for sleek fabricated installs at a great price.

What has me truly excited about the Persee is Orbbec’s efforts to cater to the creative coding community and the way that the creative community has taken to it. These people are my heroes and having them give our product the nod means the world to me. People like Golan Levin, Phoenix Perry, Kyle McDonald, James George, Greg Borenstein, and Elliot Woods.

The device is OpenNI compatible but also provides it’s own middleware to add new capabilities and fill both creative and commercial needs (<– this is the part I’m working on).


Is it a replacement for Kinect? In my opinion, not really because they do different things. The Kinect will always be my first love and is the full package, offering high-rez video, depth and a 3D mic. It is primarily a gaming peripheral. The Orbbec Persee fills a very different niche and competes with devices like the Asus Xtion and Intel RealSense as realtime collectors of volumetric data – in the way your thermometer collects thermal data. What distinguishes the Persee from its competitors is that it is an intelligent device and not just a mere peripheral. It is a first class citizen in the Internet of Things — to invoke magical marketing thinking – where each device in the web of intelligent objects not only reports its status but also reflects, processes and adjusts its status. It’s the extra kick that makes the Internet of Things not just a buzzword, but also a step along the path toward non-Skynet hard AI. It’s the next big thing.