The Problem with Comparing Depth Camera Resolutions

September 15, 2015 James Ashley

We all want to have an easy way to compare different depth cameras to one another. Where we often stumble in comparing depth cameras, however, is in making the mistake of thinking of them in the same way we think of color cameras or color displays.

When we go to buy a color television or computer monitor, for instance, we look to the pixel density in order to determine the best value. A display that supports 1920 by 1080 has roughly 2.5 times the pixel density of a 1280 by 720 display. The first is considered high definition resolution while the second is commonly thought of as standard definition. From this, we have a rule of thumb that HD is 2.5 times denser than SD. With digital cameras, we similarly look to pixel density in order to compare value. A 4 megapixel camera is roughly twice as good as a 2 megapixel camera, while an 8 MP camera is four times as good. There are always other factors involved, but for quick evaluations the pixel density trick seems to work. My phone happens to have a 41 MP camera and I don’t know what to do with all those extra megapixels – all I know is that it is over 20 times as good as that 2 megapixel camera I used to have and that makes me happy.

When Microsoft’s Kinect 2 sensor came out, it was tempting to compare it against the Kinect v1 in a similar way: by using pixel density. The Kinect v1 depth camera had a resolution of 320 by 240 depth pixels. The Kinect 2 depth camera, on the other hand, had an increased resolution of 512 b 424 depth pixels. Comparing the total depth pixels provided by the Kinect v1 to the total provided by the Kinect 2: 76,800 vs 2, 217,088, many people arrived at the conclusion that the Kinect 2’s depth cameras was roughly three times better than the Kinect v1’s.

Another feature of the Kinect 2 is a greater field of view for the depth camera. Where the Kinect v1 has a field of view of 57 degrees by 43 degrees, the Kinect 2 has a 70 by 60 degree field of view. The new Intel RealSense 3D F200 camera, in turn, advertises an improved depth resolution of 480 by 360 degrees with an increased field of view of roughly 90 degrees by 72 degrees.

What often gets lost in these feature comparisons is that our two different depth camera attributes, resolution and field of view, can actually affect each other. Increased pixel resolution is only really meaningful if the field of view stays the same between different cameras. If we increase the field of view, however, we are in effect diluting the resolution of each pixel by trying to stuff more of the real world into the pixels we already have.

It turns out that 3D math works slightly differently from regular 2D math. To understand this better, imagine a sheet of cardboard held a meter out in front of each of our two Kinect sensors. How much of each sheet is actually caught by the Kinect v1 and the Kinect 2?

To derive the area of the inner rectangle captured by the Kinect v1 in the diagram above, we will use a bit of trigonometry. The field of view of the Kinect v1 is 58.5 degrees horizontal by 46.6 vertical. To get good angles to work with, however, we will need to bisect these angles. For instance, half of 46.6 is 23.3. The tangent of 21.5 degrees times the 1 meter hypotenuse (since the cardboard sheet is 1 M away) gives us an opposite side of .39 meters. Since this is only half of that rectangle’s side (because we bisected the angle) we multiply by two to get the full vertical side which is .78 meters. Using the same technique for the horizontal field of view, we capture a horizontal side of 1.09 meters.

Using the same method for the sheet of cardboard in front of the Kinect 2, we discover that the Kinect 2 captures a rectangular surface that is 1.4 meters by 1.14 meters. If we now calculate the area on the cardboard sheets in front of each camera and divide by each camera’s resolution, we discover that far from being three times better than the Kinect v1, each pixel caught by the Kinect 2 depth camera holds 1.5 times as much of the real world as each pixel of the Kinect v1. It is still a better camera, but not what one would think by comparing resolutions alone.

This was actually a lot of math in order to make a simple and mundane point: it all depends. Depth pixel resolutions do not tell us everything we need to know when comparing different depth cameras. I invite the reader to compare the true density of the RealSense 3D camera to the Kinect 2 or Xtion Pro Live camera if she would like.

On the other hand, it might be worth considering the range of these different cameras. The RealSense F200 cuts off at about a meter whereas the Kinect cameras only start performing really well at about that distance. Another factor is, of course, the accuracy of the depth information each camera provides. A third factor is whether one can improve the performance of a camera by throwing on more hardware. Because the Kinect 2 is GPU bound, it will actually work better if you simply add a better graphics card.

For me, personally, the most important question will always be how good the SDK is and how strong the community around the device is. With good language and community support, even a low quality depth camera can be made to do amazing things. An extremely high resolution depth camera with a weak SDK, alternatively, might in turn make a better paperweight than a feature forward technology solution.

[I’d like to express my gratitude to Kinect for Windows MVPs Matteo Valoriani and Vincent Guigui for introducing me to this geometric bagatelle.]

Emgu, Kinect and Computer Vision

June 11, 2015 James Ashley

monalisacv

Last week saw the announcement of the long awaited OpenCV 3.0 release, the open source computer vision library originally developed by Intel that allows hackers and artists to analyze images in fun, fascinating and sometimes useful ways. It is an amazing library when combined with a sophisticated camera like the Kinect 2.0 sensor. The one downside is that you typically need to know how to work in C++ to make it work for you.

This is where EmguCV comes in. Emgu is a .NET wrapper library for OpenCV that allows you to use some of the power of OpenCV on .NET platforms like WPF and WinForms. Furthermore, all it takes to make it work with the Kinect is a few conversion functions that I will show you in the post.

Emgu gotchas

The first trick is just doing all the correct things to get Emgu working for you. Because it is a wrapper around C++ classes, there are some not so straightforward things you need to remember to do.

1. First of all, Emgu downloads as an executable that extracts all its files to your C: drive. This is actually convenient since it makes sharing code and writing instructions immensely easier.

2. Any CPU isn’t going to cut it when setting up your project. You will need to specify your target CPU architecture since C++ isn’t as flexible about this as .NET is. Also, remember where your project’s executable is being compiled to. For instance, an x64 debug build gets compiled to the folder bin/x64/Debug, etc.

3. You need to grab the correct OpenCV C++ library files and drop them in the appropriate target project file for your project. Basically, when you run a program using Emgu, your executable expects to find the OpenCV libraries in its root directory. There are lots of ways to do this such as setting up pre-compile directives to copy the necessary files. The easiest way, though, is to just go to the right folder, e.g. C:\Emgu\emgucv-windows-universal-cuda 2.4.10.1940\bin\x64, copy everything in there and paste it into the correct project folder, e.g. bin/x64/Debug. If you do a straightforward copy/paste, just remember not to Clean your project or Rebuild your project since either action will delete all the content from the target folder.

4. Last step is the easiest. Reference the necessary Emgu libraries. The two base ones are Emgu.CV.dll and Emgu.Util.dll. I like to copy these files into a project subdirectory called libs and use relative paths for referencing the dlls, but you probably have your own preferred way, too.

WPF and Kinect SDK 2.0

I’m going to show you how to work with Emgu and Kinect in a WPF project. The main difficulty is simply converting between image types that Kinect knows and image types that are native to Emgu. I like to do these conversions using extension methods. I provided these extensions in my first book Beginning Kinect Programming about the Kinect 1 and will basically just be stealing from myself here.

I assume you already know the basics of setting up a simple Kinect program in WPF. In MainWindow.xaml, just add an image to the root grid and call it rgb:

<Grid> 
    <Image x:Name="rgb"></Image> 
</Grid>

Make sure you have a reference to the Microsoft.Kinect 2.0 dll and put your Kinect initialization code in your code behind:

KinectSensor _sensor;
ColorFrameReader _rgbReader;


private void InitKinect()
{
    _sensor = KinectSensor.GetDefault();
    _rgbReader = _sensor.ColorFrameSource.OpenReader();
    _rgbReader.FrameArrived += rgbReader_FrameArrived;
    _sensor.Open();
}

public MainWindow()
{
    InitializeComponent();
    InitKinect();
}

protected override void OnClosing(System.ComponentModel.CancelEventArgs e)
{
    if (_rgbReader != null)
    {
        _rgbReader.Dispose();
        _rgbReader = null;
    }
    if (_sensor != null)
    {
        _sensor.Close();
        _sensor = null;
    }
    base.OnClosing(e);
}

Kinect SDK 2.0 and Emgu

You will now just need the extension methods for converting between Bitmaps, Bitmapsources, and IImages. In order to make this work, your project will additionally need to reference the System.Drawing dll:

static class extensions
{

    [DllImport("gdi32")]
    private static extern int DeleteObject(IntPtr o);


    public static Bitmap ToBitmap(this byte[] data, int width, int height
        , System.Drawing.Imaging.PixelFormat format = System.Drawing.Imaging.PixelFormat.Format32bppRgb)
    {
        var bitmap = new Bitmap(width, height, format);

        var bitmapData = bitmap.LockBits(
            new System.Drawing.Rectangle(0, 0, bitmap.Width, bitmap.Height),
            ImageLockMode.WriteOnly,
            bitmap.PixelFormat);
        Marshal.Copy(data, 0, bitmapData.Scan0, data.Length);
        bitmap.UnlockBits(bitmapData);
        return bitmap;
    }

    public static Bitmap ToBitmap(this ColorFrame frame)
    {
        if (frame == null || frame.FrameDescription.LengthInPixels == 0)
            return null;

        var width = frame.FrameDescription.Width;
        var height = frame.FrameDescription.Height;

        var data = new byte[width * height * PixelFormats.Bgra32.BitsPerPixel / 8];
        frame.CopyConvertedFrameDataToArray(data, ColorImageFormat.Bgra);

        return data.ToBitmap(width, height);
    }

    public static BitmapSource ToBitmapSource(this Bitmap bitmap)
    {
        if (bitmap == null) return null;
        IntPtr ptr = bitmap.GetHbitmap();
        var source = System.Windows.Interop.Imaging.CreateBitmapSourceFromHBitmap(
        ptr,
        IntPtr.Zero,
        Int32Rect.Empty,
        System.Windows.Media.Imaging.BitmapSizeOptions.FromEmptyOptions());
        DeleteObject(ptr);
        return source;
    }

    public static Image<TColor, TDepth> ToOpenCVImage<TColor, TDepth>(this ColorFrame image)
        where TColor : struct, IColor
        where TDepth : new()
        {
            var bitmap = image.ToBitmap();
            return new Image<TColor, TDepth>(bitmap);
        }

    public static Image<TColor, TDepth> ToOpenCVImage<TColor, TDepth>(this Bitmap bitmap)
        where TColor : struct, IColor
        where TDepth : new()
    {
        return new Image<TColor, TDepth>(bitmap);
    }

    public static BitmapSource ToBitmapSource(this IImage image)
    {
        var source = image.Bitmap.ToBitmapSource();
        return source;
    }
   
}

Kinect SDK 2.0 and Computer Vision

Here is some basic code to use these extension methods to extract an Emgu IImage type from the ColorFrame object each time Kinect sends you one and then convert the IImage back into a BitmapSource object:

void rgbReader_FrameArrived(object sender, ColorFrameArrivedEventArgs e)
{
    using (var frame = e.FrameReference.AcquireFrame())
    {
        if (frame != null)
        {
            var format = PixelFormats.Bgra32;
            var width = frame.FrameDescription.Width;
            var height = frame.FrameDescription.Height;
            var bitmap = frame.ToBitmap();
            var image = bitmap.ToOpenCVImage<Bgr,byte>();

            //do something here with the IImage 
            //end doing something


            var source = image.ToBitmapSource();
            this.rgb.Source = source;

        }
    }
}

face_capture

You should now be able to plug in any of the sample code provided with Emgu to get some cool CV going. As an example, in the code below I use the Haarcascade algorithms to identify heads and eyes in the Kinect video stream. I’m sampling the data every 10 frames because the Kinect is sending 30 frames a second while the Haarcascade code can take as long as 80ms to process. Here’s what the code would look like:

int frameCount = 0;
List<System.Drawing.Rectangle> faces;
List<System.Drawing.Rectangle> eyes;

void rgbReader_FrameArrived(object sender, ColorFrameArrivedEventArgs e)
{
    using (var frame = e.FrameReference.AcquireFrame())
    {
        if (frame != null)
        {
            var format = PixelFormats.Bgra32;
            var width = frame.FrameDescription.Width;
            var height = frame.FrameDescription.Height;
            var bitmap = frame.ToBitmap();
            var image = bitmap.ToOpenCVImage<Bgr,byte>();

            //do something here with the IImage 
            int frameSkip = 10;
            //every 10 frames

            if (++frameCount == frameSkip)
            {
                long detectionTime;
                faces = new List<System.Drawing.Rectangle>();
                eyes = new List<System.Drawing.Rectangle>();
                DetectFace.Detect(image, "haarcascade_frontalface_default.xml", "haarcascade_eye.xml", faces, eyes, out detectionTime);
                frameCount = 0;
            }

            if (faces != null)
            {
                foreach (System.Drawing.Rectangle face in faces)
                    image.Draw(face, new Bgr(System.Drawing.Color.Red), 2);
                foreach (System.Drawing.Rectangle eye in eyes)
                    image.Draw(eye, new Bgr(System.Drawing.Color.Blue), 2);
            }
            //end doing something


            var source = image.ToBitmapSource();
            this.rgb.Source = source;

        }
    }
}

HoloCoding resources from Build 2015

May 1, 2015 James Ashley

darren!

The Hololens team has stayed extremely quiet over the past 100 days in order to have a greater impact at Build. Alex Kipman was the cleanup batter on the first day keynote at Build with an amazing overview of realistic Hololens scenarios. This was followed by Hololens demos as well as private tutorials on using Hololens with a version of Unity 3D. Finally there were sessions on Hololens and a pre-recorded session on using the Kinect with the brand new RoomAlive Toolkit.

darren?

Here are some useful links:

Day One Keynote (fast forward to 2:28:20)
Developing for Hololens with Alex Kipman
Case Studies of Hololens App Development
Hacking Augmented Reality with Kinect with Ben Lower (Kinect) and Andy Wilson (MSR IllumiRoom)

There were two things I found particularly interesting in Alex Kipman’s first day keynote presentation.

The first was the ability of the onstage video to actually capture what was being shown through the hololens but from a different perspective. The third person view of what the person wearing the hololens even worked when the camera moved around the room. Was this just brilliant After Effects work perfectly synced to the action onstage? Or were we seeing a hololens-enabled camera at work? If the latter – this might be even more impressive than the hololens itself.

mission_impossible_five

Second, when demonstrating the ability to pin movies to the wall using Hololens gestures, why was the new Mission Impossible trailer used for the example? Wouldn’t something from, say, The Matrix been much more appropriate.

Perhaps it was just a licensing issue, but I like to think there was a subtle nod to the inexplicable and indirect role Tom Cruise has played in the advancement of Microsoft’s holo-technologies. Minority Report and the image of Cruise wearing biking gloves with his arms raised in the air, conductor-like, was the single most powerful image invoked with Microsoft first introduced the Kinect sensor. As most people know by now, Alex Kipman was the man responsible not only for carrying the Kinect nee Natal Project to success, but now for guiding the development of the Hololens. Perhaps showing Tom Cruise onstage at Build was a subtle nod to this implicit relationship.

The HoloCoder’s Bookshelf

April 30, 2015 James Ashley

Professions are held together by touchstones such as as a common jargon that both excludes outsiders and reinforces the sense of inclusion among insiders based on mastery of the jargon. On this level, software development has managed to surpass more traditional practices such as medicine, law or business in its ability to generate new vocabulary and maintain a sense that those who lack competence in using the jargon simply lack competence. Perhaps it is part and parcel with new fields such as software development that even practitioners of the common jargon do not always understand each other or agree on what the terms of their profession mean. Stack Overflow, in many cases, serves merely as a giant professional dictionary in progress as developers argue over what they mean by de-coupling, separation of concerns, pragmatism, architecture, elegance, and code smell.

Cultures, unlike professions, are held together not only by jargon but also by shared ideas and philosophies that delineate what is important to the tribe and what is not. Between a profession and a culture, the members of a professional culture, in turn, share a common imaginative world that allows them to discuss shared concepts in the same way that other people might discuss their favorite TV shows.

This post is an experiment to see what the shared library of augmented reality and virtual reality developers might one day look like. Digital reality development is a profession that currently does not really exist but which is already being predicted to be a multi-billion dollar industry by 2020.

HoloCoding, in other words, is a profession that exists only virtually for now. As a profession, it will envelop concerns much greater than those considered by today’s software developers. Whereas contemporary software development is mostly about collecting data, reporting on data and moving data from point A to points B and C, spatial software development will be more concerned with environments and will have to draw on complex mathematics as well as design and experiential psychology. The bookshelf of a holocoder will look remarkably different from that of a modern data coder. Here are a few ideas regarding what I would expect to find on a future developer’s bookshelf in five to ten years.

1. Understanding Media by Marshall McLuhan – written in the 60’s and responsible for concepts such as ‘the global village’ and hot versus cool media, McLuhan pioneered the field of media theory. Because AR and VR are essentially new media, this book is required reading for understanding how these technologies stand side-by-side with or perhaps will supplant older media.

2. Illuminations by Walter Benjamin – while the whole work is great, the essay ‘The Work of Art in the Age of Mechanical Reproduction’ is a must read for discussing how traditional notions about creativity fit into the modern world of print and now digital reproduction (which Benjamin did not even know about). It also deals at an advanced level with how human interactions work on stage versus film and the strange effect this creates.

3. Sketching User Experiences by Bill Buxton – this classic was quickly adopted by web designers when it came out. What is sometimes forgotten is that the book largely covers the design of products and not websites or print media – products like those that can be built with HoloLens, Magic Leap and Oculus Rift. Full of insights, Buxton helps his readers to see the importance of lived experience when we design and build technology.

4. Bergsonism by Gilles Deleuze – though Deleuze is probably most famous for his collaborations with Felix Guattari, this work on the philosophical meaning of the term ‘’virtual reality’, not as a technology but rather as a way of approaching the world, is a gem.

5. Passwords by Jean Baudrillard – what Deleuze does for virtual reality, Baudrillard does for other artifacts of technological language in order to show their place in our mental cosmology. He also discusses virtual reality along the way, though not as thoroughly.

6. Mathematics for 3D Game Programming and Computer Graphics by Eric Lengeyl – this is hardcore math. You will need this. You can buy it used online for about $6. Go do that now.

7. Linear Algebra and Matrix Theory by Robert Stoll – this is a really hard book. Read the Lengeyl before trying this. This book will hurt you, by the way. After struggling with a page of this book, some people end up buying the Manga Guide to Matrix Theory thinking that there is a fun way to learn matrix math. Unfortunately, there isn’t and they always come back to this one.

8. Phenomenology of Perception by Maurice Merleau-Ponty – when it first came out, this work was often seen as an imitation of Heiddeger’s Being and Time. It may be the case that it can only be truly appreciated today when it has become much clearer, thanks to years of psychological research, that the mind reconstructs not only the visual world for us but even the physical world and our perception of 3D spaces. Merleau-Ponty pointed this out decades ago and moreover provides a phenomenology of our physical relationship to the world around us that will become vitally important to anyone trying to understand what happens when more and more of our external world becomes digitized through virtual and augmented reality technologies.

9. Philosophers Explore the Matrix – just as The Matrix is essential viewing for anyone in this field, this collection of essays is essential reading. This is the best treatment available of a pop theme being explored by real philosophers – actually most of the top American philosophers working on theories of consciousness in the 90s. Did you ever think to yourself that The Matrix raised important questions about reality, identity and consciousness? These professional philosophers agree with you.

10. Snow Crash by Neal Stephenson – sometimes to understand a technology, we must extrapolate and imagine how that technology would affect society if it were culturally pervasive and physically ubiquitous. Fortunately Neal Stephenson did that for virtual reality in this amazing book that combines cultural history, computer theory and a fast paced adventure.

What is a HoloCoder?

April 26, 2015 James Ashley

holodeck

Over the past few years we’ve seen the rapid release of innovative consumer technologies that are all loosely related by their ability to scan 3D spaces, interact with 3D spaces or synthesize 3D spaces. These include the Kinect sensor, Leap Motion, Intel Perceptual Computing, Oculus Rift, Google Glass, Magic Leap and HoloLens. Additional related general technologies include projection mapping and 3D printing. Additional related tools include Unity 3D and the Unreal Engine.

Despite a clear family resemblance between all of these technologies, it has been difficult to clearly define what that relationship is. There has been a tendency to categorize all of them as simply being “bleeding edge”, “emerging” or “future”. The problem with these descriptors is that they are ultimately relative to the time at which a technology is released and are not particularly helpful in defining what holds these technologies together in a common gravitational pull.

definitions

I basically want to address this problem by engaging in a bit of word magic. Word magic is a sub-category of magical thinking and is based on a form of psychological manipulation. If you have ever gone out to Martin Fowler’s Bliki then you’ve seen the practice at work. One of the great difficulties of software development is anticipating the unknown: the unknown involved in requirements, the unknown related to timelines, and the unknown concerned with the correct tactics to accomplish tasks. In a field with a limited history and a tendency not to learn from other related fields, the fear of the unknown can utterly cripple projects.

Martin Fowler’s endless enumeration of “patterns” on his bliki takes this on directly by giving names to the unknown. If one reads his blog carefully, however, it quickly becomes clear that most, though not all, of these patterns are illusory: they are written at such an abstract level that they fail to provide any prescriptive advice on how to solve the problems they are intended to address. What they do provide, however, is a sense of relief that there is a “name” that can be used to plug up the hole opened up in time by the fear of the unknown. Solutions architects can return to their teams (or their managers) and pronounce proudly that they have found a pattern to solve the outstanding problem that is hanging over everyone – all that remains is to determine what each “name” actually means.

In this sense, the whole world of software architecture – which Glassdoor ranked as the 11th best job of 2015 — is a modern priesthood devoted to prophetic interpretations of “design patterns”.

I similarly want to use word magic to define the sort of person that works with the sorts of technology I listed at the top of this article. I think I can even do it quite simply with familar imagery.

A holocoder is someone who works with technologies that are inspired by and/or anticipate the Star Trek Holodeck.

interpretations

The part of the definition that states “inspired by and/or anticipate” may seem strange but it is actually quite essential. It is based on a specific temporal-cybernetic theory concerning the dissemination of ideas which I will attempt to describe but which is purely optional with respect to the definition.

But first: how can a theory be both essential and optional? This is an issue that Niels Bohr, one of the fathers of quantum mechanics, tackled frequently. In the early 30’s Bohr was travelling through eastern Europe on a lecture tour. During part of the tour, a former student met him at his inn and noticed him nailing a horse shoe over the door of his room. “Professor Bohr”, he asked, “what are you doing?” Niels Bohr replied, “The Inn Keeper informed me that a horse shoe over the door will bring me luck.” The student was scandalized by this. “But Herr Professor,” the student objected, “surely as a physicist and intellectual such as yourself does not believe in these silly superstitions.” “Of course not,” Bohr answered. “But the Inn Keeper reassured me that the horse shoe will bring me luck whether I believe in it or not.”

Here is the optional theory of the Holodeck. Certain technologies, it seems to me, can have such an influence that they shape the way we think about the world. We have seen many examples of this in our past such as the printing press, the automobile, the personal computer and the cell phone. Furthermore we anticipate the advent of similar major technologies in our future. These technologies have what is called a “psychic resonance” and change the very metaphors we use to describe our world. To give a simple example, whereas we originally used mental metaphors to explain computers in terms of “memory”, “processing” and even “computing”, today we use computer metaphors to help explain how the brain works. The arrival of the personal computer caused a shift and a reversal in what semioticians call the relationship between the explanans and the explanandum.

wesley in the holodeck

Psychic impact is transmitted over carriers called “memes”. Memes are basically theoretical constructs that are phenomenally identical to what we call “ideas” but behave like viruses. Memes travel through air as speech and along light waves as images in order to spread themselves from host to host. Traditionally the psychic impact of a meme is measured by the meme’s density over a given space. Besides density, the psychic impact can also be measured based on the total volume of space it is able to infect. Finally, the effectiveness of a meme can also be measured based on its ability to spread into the future. For instance, works of literature and cultural artifacts such as religions and even famous sayings are examples of memes that have effectively infected the future despite a distance of thousands of years between the point of origin of the infection and the temporal location of the target.

While the natural habitat of bacteria like e coli is in the gastrointestinal tract, the natural habitat of memes is in the brain and this leads to a fascinating third form of mimetic transmission. At the level of microtubules in the brain where memes happen to live, we enter the Planck scale in which classical physics do not apply in the way that they do at the macro level. At this scale, effects like quantum entanglement create spooky behaviors such as quantum communication. While theoretically people still cannot communicate with each other in time since that level of semiotics is still governed by classical physics, there is an opening for mimetic viruses to actually be transmitted backwards in time as if they were entering a transporter in one brain and rematerialized in another brain in the past. This allows for a third manner of mimetic spread: in space, forward in time, and finally backwards in time.

Riker in the Holodeck

As an aside, and as I said above, this is an _optional_ theory of psychic impact through time. A common and totally valid criticism is that it appeals to quantum mystery which tends to be misused to justify anything from ghosts to religious cults. The problem with appeals to “quantum mystery” is that this simply provides a name for a problem rather than prescribing actual ways to make predictions or anticipate behavior. In other words, like Martin Fowler’s bliki, it is word magic that provides interpretations of things but not actual solutions. Against such criticisms, however, it should be pointed out that I am explicitly engaged in an exercise in word magic, in which case using certain techniques of word magic – such as quantum mystery – is perfectly legitimate and even natural.

Through quantum entanglement acting on memes at the microtubule level, a technology from our possible future which resembles the Star Trek holodeck has such a large psychic impact that it resonates backwards in time until it reaches and inhabits the brains of the writers of a futuristic science fiction show in the late 80’s and is introduced into the show as the Holodeck. Through television transmissions, the holodeck meme is then broadcast to millions of teenagers who eventually enter the tech industry, become leaders in the tech industry, and eventually decide to implement various aspects of the holodeck by creating better and better 3D sensors, 3D simulation tools and 3D visualization technologies – both augmented and virtual. In other words, the Holodeck reaches backwards in time to inspire others in order to effectively give birth to itself, ex nihilo. Those that have been touched by the transmission are what I am calling holocoders.

and/or

Alternatively, this theory of where holocoders come from can be taken as a metaphor only. In this case, holocoders are not people being pulled toward a common future but instead people being pushed forward from a common past. Holocoders are people inspired directly or indirectly by a television show from the late 80’s that involved a large room filled with holograms that could be used for entertainment as well as research. Holocoders work on any or all of the wide variety of technologies that could potentially be combined to recreate that imagined experience.

the dreamatorium

Anyways, that’s my theory and I’m sticking to it. More importantly, these technologies are deeply entangled and deserve a good name, whether you want to go with holocoding or something else (though the holodeck people from the future highly encourage you to use the terms “holocoder”, “holocoding” and “holodeck”).

appendix

There are two other important instances of environment simulators which for whatever reason do not have the same impact as the Star Trek holodeck but are nevertheless worth mentioning.

The first is the X-Men Danger Room which is an elaborate obstacle course involving holograms as well as robots used to train the X-Men. While the Danger Room goes back to the 60’s, the inclusion of holograms actually didn’t happen until the early 90’s, and so actually comes after the Star Trek environment simulator.

Clifford D. Simak published Way Station in 1963 (and won a Hugo award for it). It actually anticipates two Star Trek technologies – transporters as well as an environment simulator. Enoch Wallace, the hero of the story, works the earth relay station for intergalactic aliens who transport travelers over vast distances by sending them over shorter hops between the way stations of the title. Because he is so isolated in his job, the aliens support him by allowing him to pursue a hobby. Because Wallace enjoys hunting, the aliens build for him an environment simulator that lets him do big game hunting for dinosaurs.

One Kinect to rule them all: Kinect 2 for XBox One

April 5, 2015 James Ashley

Yes. That’s a bit of a confusing title, but it seems best to lay out the complexity upfront. So far there have been two generations of the Kinect sensor which combine a color camera, a depth sensing camera, an infrared emitter (basically used for the depth sensing camera) and a microphone array which works as a virtual directional shotgun microphone. Additional software called the Kinect SDK then allows you to write programs that read these data feeds as well as interpolating them into 3D animated bodies that are representations of people’s movements.

Microsoft has just announced that they will stop producing separate versions of the Kinect v2, one for windows and one for the XBox One, but will instead encourage developers to purchase the Kinect for Windows Adapter instead to plug their Kinects for XBox One into a PC. In fact, the adapter has been available since last year, but this just makes it official. All in all this is a good thing. With the promise that Universal Windows Apps will be portable to XBox, it makes much more sense if the sensors – and more importantly the firmware installed on them – are exactly the same whether you are on a PC running Windows 8/10 or an XBox running XBox OS.

This announcement also vastly simplifies the overall Kinect hardware story. Up to this point, there weren’t just two generations of Kinect hardware but also two versions of the current Kinect v2 hardware, one for the Xbox and one for Windows (for a total of four different devices). The Kinect hardware, both in 2010 and in 2013, has always been built first as a gaming device. In each case, it was then adapted to be used on Windows machines, in 2012 and 2014 respectively.

The now discontinued Kinect for Windows v2 differed from the Kinect for the Xbox One in both hardware and software. To work with Windows machines, the Kinect for Windows v2 device uses the specialized power adapter to pump additional power to the hardware (there is a splitter in the adapter that attaches the hardware to both a USB port as well as a wall plug). The Xbox One, being proprietary hardware, is able to pump enough juice to its Kinect sensor without needing special adapter. Additionally, the firmware for the original Kinect for Windows v1 sensor diverged over time from the Kinect for Xbox’s firmware – which led to differences in how the two versions of the hardware performed. It is now clear that this will not happen with Kinect v2.

Besides the four hardware devices and their respective firmware, the loose term “Kinect” can also refer to the software APIs used to incorporate Kinect functionality into a software program. Prior to this, there was a Kinect for Windows SDK 1.0 through 1.8 that was used to program against the original Kinect for Windows sensor. For the Kinect for XBox One with the Kinect for Windows Adapter, you will want to use the Kinect for Windows SDK 2.0 (“for Windows” is still part of the title for now, even though you will be using it with a Kinect for XBox One, though of course you can still use it with the Kinect for Windows v2 sensor if you happen to have bought one of those prior to their discontinuation). There are also other SDKs floating around such as OpenNI and Libfreenect.

[Much gratitude to Kinect MVP Bronwen Zande for helping me get the details correct.]

Unity 5 and Kinect 2 Integration

March 27, 2015 James Ashley7 Comments

pointcloud

Until just this month one of the best Kinect 2 integration tools was hidden, like Rappuccini’s daughter, inside a walled garden. Microsoft released a Unity3D plugin for the Kinect 2 in 2014. Unfortunately, Unity 4 only supported plugins (bridges to non-Unity technology) if you owned a Unity Pro license which typically cost over a thousand dollars per year.

On March 3rd, Unity released Unity 5 which includes plugin support in their free Personal edition – making it suddenly very easy to start building otherwise complex experiences like point cloud simulations that would otherwise require a decent knowledge of C++. In this post, I’ll show you how to get started with the plugin and start running a Kinect 2 application in about 15 minutes.

(As an aside, I always have trouble keeping this straight: Unity has plugins, openFrameworks as add-ins, while Cinder has bricks. Visual Studio has extensions and add-ins as well as NuGet packages after a confusing few years of rebranding efforts. There may be a difference between them but I can’t tell.)

1. First you are going to need a Kinect 2 and the Unity 5 software. If you already have a Kinect 2 attached to your XBox One, then this part is easy. You’ll just need to buy a Kinect Adapter Kit from the Microsoft store. This will allow you to plug your XBox One Kinect into your PC. The Kinect for Windows 2 SDK is available from the K4W2 website, though everything you need should automatically install when you first plug your Kinect into your computer. You don’t even need Visual Studio for this. Finally, you can download Unity 5 from the Unity website.

2. The Kinect 2 plugin for Unity is a bit hard to find. You can go to this Kinect documentation page and scroll half-way down to find the link called Unity Pro Packages. Aternatively, here is a direct link to the most current version of the plugin as of this writing.

3. After you finish downloading the zip file (currently called KinectForWindows_UnityPro_2.0.1410.zip), extract it to a known location. I like to use $\Documents\Unity. Inside you will find three plugins as well as two sample scenes. The three Kinect plugins are the basic one, a face recognition plugin, and a gesture builder plugin, each wrapping functionality from the Kinect 2 SDK.

4. Fire up Unity 5 and create a new project in your known folder. In my case, I’m creating a project called “KinectUnityProject” in the $\Documents\Unity folder where I extracted the Kinect plugins and related assets.

5. Now we will add the Kinect plugin into our new project. When the Unity IDE opens, select Assets from the top menu and then select Import Package | Custom Package …

6. Navigate to the folder where you extracted the KinectforWindows_Unity components and select the Kinect2.0.xxxxx.unitypackage file. That’s our plugin along with all the scripts needed to build a Kinect-enabled Unity 5 application. After clicking on “Open”, an additional dialog window will open up in the Unity IDE called “Importing Package” with lots of files checked off. Just click on the “Import” button at the lower right corner of the dialog to finish the import process. Two new folders will now be added to your Project window under the Assets folder called Plugins and Standard Assets. This is the baseline configuration for any Kinect project in Unity.

7. Now we’ll get a Kinect with Unity project quickly going by simply copying one of the sample projects provided by the Microsoft Kinect team. Go into file explorer and copy the folder called “KinectView” out of the KinectforWindows_Unity folder where you extracted the plugins and paste it into the Assets directory in your project folder. Then return to the Unity 5 IDE. A warning message will pop up letting you know that there are compatibility issues between the plugin and the newest version of Unity and that files will automatically be updated. Go ahead and lie to the Unity IDE. Click on “I Made a Backup.”

8. A new folder has been added to your Project window under Assets called KinectView. Select KinectView and then double click on the MainScene scene contained inside it. This should open up your Kinect-enabled scene inside the game window. Click on the single arrow near the top center of the IDE to see your application in action. The Kinect will automatically turn on and you should see a color image, an infrared image, a rendering of any bodies in the scene and finally a point cloud simulation.

9. To build the app, select File | Build & Run from the top menu. Select Windows as your target platform in the next dialog and click the Build & Run button at the lower right corner. Another dialog appears asking you to select a location for your executable and a name. After selecting an executable name, click on Save in order to reach the final dialog window. Just accept the default configuration options for now and click on “Play!”. Congratulations. You’ve just built your first Kinect-enabled Unity 5 application!

The Next Book

March 3, 2015 James Ashley

min_lib

The development community deserves a great book on the Kinect 2 sensor. Sadly, I no longer feel I am the person to write that book. Instead, I am abandoning the Kinect book project I’ve been working on and off over the past year in order to devote myself to a book on the Microsoft holographic computing platform and HoloLens SDK. I will be reworking the material I’ve so far collected for the Kinect book as blog posts over the next couple of months.

As anyone who follows this blog will know, my imagination has of late been captivated and ensorcelled by augmented reality scenarios. The book I intend to write is not just a how-to guide, however. While I recognize the folly of this, my intention is to write something that is part technical manual and part design guide, part math tutorial, part travel guide and part cookbook. While working on the Kinect book I came to realize that it is impossible to talk about gestural computing without entering into a dialog with Maurice Merleau-Ponty’s Phenomenology of Perception and Umberto Eco’s A Theory of Semiotics. At the same time, a good book on future technologies should also cover the renaissance in theories of consciousness that occurred in the mid-90’s and which culminated with David Chalmers’ masterwork The Conscious Mind. Descartes, Bergson, Deleuze, Guattari and Baudrillard obviously cannot be overlooked either in a book dealing with the topic of the virtual, though I can perhaps elide a bit.

A contemporary book on technology can no longer stay within the narrow limits of a single technology as was common 10 or so years ago. Things move at too fast a pace and there are too many different ways to accomplish a given task that choosing between them depends not only on that old saw ‘the right tool for the job’ but also on taste, extended community and prior knowledge. To write a book on augmented reality technology, even when sticking to one device like the HoloLens, will require covering and uncovering to the uninitiated such wonderful platforms as openFrameworks, Cinder, Arduino, Unity, the Unreal Engine and WPF. It will have to cover C#, since that is by and large the preferred language in the Microsoft world, but also help C# developers to overcome their fear of modern C++ and provide a roadmap from one to the other. It will also need to expose the underlying mathematics that developers need to grasp in order to work in a 3D world – and astonishingly, software developers know very little math.

Finally, as holographic computing is a wide new world and the developers who take to it will be taking up a completely new role in the workforce, the book will have to find its way to the right sort of people who will have the aptitude and desire to take up this mantle. This requires a discussion of non-obvious skills such as a taste for cooking and travel, an eye for the visual, a grounding in architecture and an understanding of how empty spaces are constructed, a general knowledge of literary and social theory. The people who create the next world, the augmented world, cannot be mere engineers. They will also need to be poets and madmen.

I want to write a book for them.

The Coming Holo Wars and How to Survive Them

February 6, 2015 James Ashley

this is the way the RL world ends

cloud atlas: new seoul

We are the holo men,

We are the stuffed men.

Leaning together

Headpiece filled with straw. Alas!

Our dried voices, when

We whisper together,

Are quiet and meaningless

As wind in dry grass

Or rat’s feet over broken glass

In our dry cellar.

— T. S. Eliot

“Disruptive technology” is one of the most over-used phrases in contemporary marketing hyper-speech. Borrowing liberally from previous generations’ research into the nature of political and scientific revolutions (Leon Trotsky, Georges Sorel, Thomas Kuhn), self-promoting second raters have pillaged the libraries of these scholars of disruption and have co-opted their intellects in the service of filling the world with useless gadgets and vaporware. When everything is a disruptive technology, nothing is.

Just as Sorel drew on historical examples of general strikes to form his narrative of idealized proletarian revolution and Kuhn identified three examples of scientific revolution: the transition from the Ptolemaic to the Copernican model of the solar system, the abandoning of phlogiston theory, and the shift from Newtonian to relativistic physics – to distill his theory of the “paradigm shift”, we can similarly take one step back in order to find the treasure hidden in the morass of marketing opportunism.

There have been three* major shakeups in the tech sector over the past several decades; each one was marked by the invocation of the “war” metaphor, the leveraging of large sums of money and massive shifts in the fortunes of well known companies.

The PC Wars – the commoditization of the personal computer in the 80s led to the diminishing of IBM and a surprising victor, Microsoft, which realized that the key to winning the PC Wars lay not with the hardware but with the operating system that made the hardware accessible. Following that model, the mid- to late-90s saw the rise of the Internet, various attempts to create portal solutions, and a pitched battle between Netscape and Microsoft to produce the dominant browser.

The Browser Wars – the Browser Wars saw the rise and fall of companies like Yahoo! and AOL and the eventual victor turned out not be the best browser but the best search engine: Google. More recently we’ve been going through the Mobile Wars in which Apple has been the clear winner – but also Amazon, Twitter and Facebook.

The Mobile Wars – covering both the rise of smart phones as well as tablet devices, the Mobile Wars have born fruit in the way we view consumer experiences, have shifted software development from desktop to web development, have made JavaScript a first class language, have made responsive design the de facto standard, have made the freelance creative designer the Renaissance person of the 21st century, and perhaps most important have accelerated geolocation technology. Geolocation, as will be shown below, is a key player in the next technology war.

between the idea and the reality

jupiter ascending

Shape without form, shade without color,

Paralyzed force, gesture without motion;

As a devotee of Adam Sandler movies, I was pleased to see him teamed with Judd Apatow and Seth Rogan in 2009’s Funny Men. Adam Sandler movies are up there with “Pretty Woman” and “Dumb and Dumber” in the cable industry as movies that can be shown at any time of day and still be guaranteed to draw viewers. There is a false moment in the middle of the movie, however, in which Adam Sandler and Seth Rogan are flown out to perform at a private party for MySpace. What’s MySpace you ask? It was a social network that was crushed in the dust by Facebook, of which you have probably heard, along with other even more obscure networks like Friendster and Bebo. MySpace are portrayed in the movie as an up-and-rising social network through a last-gasp cross-marketing placement with Universal Studios.

A major characteristic of today’s tech wars is that we do not remember the losers. It does not even matter how big these corporations were during their period of being winners. Once they are gone, it is as if they are completely erased from the timeline, their reputations liquidated in the same fashion as their Aeron chairs and stock options.

To be a winner in the tech wars is to be a survivor of the tech wars. This applies not just to corporations but also to the marketing, business and technical people who are carried in the wake of rising and falling technology trends. IT groups across the US now face the problem of trends they have ignored finally reaching the C-levels as they are being asked about their mobile strategies and why their applications are not designed to be responsive – and perhaps even whey they continue to be written in vb6 or delphi.

These casualties of the Mobile Wars must be wondering what choices they could have made differently over the past several years and what choices they should be making over the next. How does one survive the conflict that comes after the Mobile Wars?

between the motion and the act

2001 a space odyssey

Those who have crossed

With direct eyes, to death’s other kingdom

Remember us — if at all — not as lost

Violent souls, but only

As the holo men,

The stuffed men.

Surviving and even thriving in the coming Holo Wars is possible if you keep an eye out for the contours of future history – if you know what is coming. The first key is knowing who the major players are: Microsoft, Facebook, Google – though there is no guarantee any of them will still be standing when the Holo Wars are over.

Microsoft has catapulted to the front of the Holo Wars with its announcement of the HoloLens on January 21st. HoloLens is the brainchild of Alex Kipman, who also spearheaded the product development of the Kinect. It is expected to be built on some of the technology developed for the Kinect v2 sensor combined with new holographic display technology – possibly involving eye movement tracking – that has yet to be revealed.

Facebook became a participant in the Holo Wars when it bought Palmer Luckey’s company Oculus VR in mid-2014. The Oculus Rift, a virtual reality headset, is basically two mobile display screens placed in front of a user’s eyeballs in order to show stereoscopic digital visualizations. The key to this technology is John Cormack’s ingenious use of sensors to track and anticipate head movements to rotate and skew images in a realistic way in the virtual world revealed by the Rift.

Google participates in several ways. Even though the explorer program is now closed, Google Glass arrived with great fanfare and created excitement around the fashion and consumer uses of this heads-up display technology. Following Google’s major investment in Rony Abovitz’s Magic Leap in October 2014, a maker of mysterious augmented reality technology, it now appears that this is the more likely future direction of Google Glass or whatever it is eventually called. Magic Leap, in turn, has added some amazing names to its payroll including Gary Bradski of OpenCV fame and Neal Stephenson, the author of Snow Crash. The third leg of Google’s investment in a holographic future is the expertise in geolocation it has acquired over the past decade.

The next key to surviving the Holo Wars is to understand what skills will be needed when the fighting starts. The first skill is a deeper knowledge of computer graphics. Since the rise of the graphical user interface, software development platforms have increasingly abstracted away the details of generating pixels and managing human-computer interactions. Future demands for spatially aware pixels will force developers to relearn basic mathematical concepts, linear algebra, trigonometry and matrix math.

In addition to mathematics, machine learning will be important as a way of making overwhelming amounts of data manageable. Modern computer interactions are relatively simple. Users sit in one place, in a fixed position respective to the machine, and rarely deviate from this position. Input is passed through transducers that reduce desire and intent into simple signals. Digital reality experiences, on the other hand, not only receive gestural information which must be interpreted but also physical orientation, world coordinates, facial expressions and speech commands. A basic knowledge of Bayesian probability and stochastic calculus will be part of the tool chest of anyone who wants to successfully navigate the Holo joblists of the future.

To reforge ourselves with skills for surviving the next seven years, designers must also become better programmers and software programmers must become more creative. The freelance creative, a job role that expanded dramatically during the Mobile Wars, will have an even brighter future in a world pervaded by augmented reality experiences. In order to make the shift, however, creatives will need to move beyond their comfort zone of creating PSDs in Photoshop and learn motion graphics as well as basic computer programming. Programmers likewise will need to move beyond the conceit that coding is an inherently creative activity; moving data around from point A to point B is no more creative than moving books around a sprawling Amazon warehouse and then packing them up for shipping is a poetic.

Real creative coding involves learning how to construct digital-to-physical experiences with Arduino, how to program self-generating visual algorithms with Processing, how to create 3D worlds in Unity and how to create complex visual interactions with openFrameworks and Cinder. These activities will become the common vocabulary of the future programmers of augmented experiences. Hiring managers and recruiters will expect to find them on resumes and without them, otherwise experienced tech workers be unhireable or worse, relegated to maintaining legacy web applications.

not with a bang but a whimper

enders game

The eyes are not here

There are no eyes here

In this valley of dying stars

In this holo valley

This broken jaw of our lost kingdoms

In this last of meeting places

We grope together

How can one tell if these prescriptions for the future Holo Wars are real and actionable or simply more marketing hype attempting to take advantage of people’s natural gullibility regarding technical gadgets? Aren’t we always being burned by overly optimistic portrayals of the future that never come to pass? Where are our flying cars? Where are our remote work locations?

In order for the Holo Wars to play out, certain milestones need to be achieved. Consequently, if you start seeing these milestones realized, you will know that you are in fact living through a fight over the next disruptive technology that will destroy some major tech corporations while affirming others at the apex of the tech world, one that will also reward those that have positioned themselves with useful skills for this future economy and punish those who do not. These milestones are: technology, monetization, persistent holographic objects, belief circles, overlapping dissociative realities.

Technology: the first phase is occurring now with the three major players discussed above and several additional players such as Metaio, Qualcomm and Samsung engaged in building up consumer augmented reality hardware and supporting technologies such as geolocation and gestural interfaces.

Monetization: innovation costs money. The initial hardware and infrastructure effort will likely be subsidized by the major players. Over time, the monetization model will likely follow what we see on the internet with “free” consumer experiences being subsidized by ads. There will be a struggle between premium subscription based experiences offering to remove the ads while providing better, higher resolution experiences with better content. These portal solutions will also contend against free and low-cost plug-in content provided by hackers and freelance creatives. How this plays out will depend largely on whether the premium content providers will be able to block out independents through standards and compatibility issues as well as whether hackers will find ways to overcome these roadblocks. There is also the possibility that some of the players might be looking at a much longer game and will foster an open AR content generation community rather than attempt to crush it. If the AR economy opens up in this way, a new service sector will grow made up of one set of people generating digital worlds for another set to live in.

Persistent Holographic Objects: virtual worlds are typically subjective experiences. They can be made inter-subjective, as they are in MMOs, by creating virtual topology in which people co-exist and co-operate. In augmented worlds, on the other hand, shared topology is an inherent feature. AR shared topology is called reality. In order to make AR worlds truly inter-subjective, rather than simply objective or subjective, shared holo objects must be part of the experience. Pesistent holo objects such as a digital fountain, a digital garden, or a digital work of art will have a set location and orientation in the world. AR players will need to travel to these locations physically in order to experience them. Unlike private AR or VR experiences in which each player views copies of the same digital object, with a shared experience each player can be said to be looking at the same persistent holo object from different points of view. In order to achieve persistent holographic objects, we will require finer grained geolocation than we currently have. AR gear must also be improved to become more usable in direct sunlight.

Belief Circles: a healthy indie creative fringe-economy and persistent holographic objects will make it possible to customize intersubjective experiences. People have a natural tendency to form cliques, parties and communities. Belief circles, a term coined by Vernor Vinge, will provide coherent community experiences for different guilds based on shared interests and shared aspirations. Users will opt in and out of various belief circles as they see fit. The same persistent holographic objects may appear differently to members of different circles and yet be recognized as sharing a common space and perhaps a common purpose. For instance, the holosign in front of the local Starbucks will have a permanent location and consistent semantic purpose, in AR space, but a polymorphic appearance. To paraphrase a truism, beauty will be in the eye of one’s belief circle.

Overlapping Dissociative Realities: divergent intersubjectivities will produce both a greater awareness of synchronicity – and a sense of deja vu as AR content is copied freely into multiple locations — as well as an increased sense of cognitive dissonance. Consider the example of going into Starbucks for coffee. The people waiting in line will likely each be members of varying belief circles and consequently will be having different experiences of the wait. This is not a large departure since we typically do not care about what other people in line are doing and even avoid paying attention unless they take too long making a selection. In this case, divergent belief circles make it easier to follow our natural instinct to avoid each other. Everyone in the holo valley is anonymous if they want to be. When one arrives at the head of the line, however, something more interesting happens. Even though the customer and the barista likely belong to different belief circles, they must interact, communicate, and perform an economic exchange; these two creatures from different worlds. What will that be like? Will one then lift a corner of the holo lenses in order to rub a sore eye only to discover that this isn’t a Starbucks at all but really a Dunkin’ Donuts which had silently bought out the other chain in a hostile takeover the previous week? Will your coffee taste any different if it looks exactly the same?

* 1996 was witness to a small skirmish between OpenGL and Direct3D that has subsequently come to be known as the API Wars. While the API Wars have had long lasting ripples, I don’t see them as having the tectonic effect of the other historical phenomena I am describing – plus anyways Thomas Kuhn only provides three major examples of his thesis and I wanted to stick to that particular design pattern.

[Much gratitude to Joel and Nate for collaborating on these scenarios over a highly entertaining lunch.]

Top 21 HoloLens Ideas

January 26, 2015 James Ashley

The image above is a best guess at the underlying technology being used in Microsoft’s new HoloLens headset. It’s not even that great a guess since the technology appears to still be in the prototype stage. On the other hand, the product is tied to the Windows 10 release date, so we may be seeing a consumer version – or at the very least a dev version – sometime in the fall.

Here are some things we can surmise about HoloLens:

a) the name may change – HoloLens is a good product name but isn’t quite where we might like it to be, in a league with Kinect, Silverlight or Surface for branding genius. In fact, Surface was such a good name, it was taken from one product group and simply given to another in a strange twist on the build vs buy vs borrow quandary. On the other hand, HoloLens sounds more official than the internal code name, Baraboo — isn’t that a party hippies throw themselves in the desert?

johnny mnemonic

b) this is augmented reality rather than virtual reality. Facebook’s Oculus Rift, which is an immersive fully digital experience, is an example of virtual reality. Other fictional examples include The Oasis from Ernest Cline’s Ready Player One, The Mataverse from Neal Stephenson’s Snow Crash, William Gibson’s Cyberspace and the VR simulation from The Lawnmower Man. Augmented reality involves overlaying digital experience on top of the real world. This can be accomplished using holography, transparent displays, or projectors. A great example of projector based AR is the RoomAlive project by Hrvoje Benko, Eyal Ofek and Andy Wilson at Microsoft Research. HoloLens uses glasses or a head-rig – depending on how generous you feel – to implement AR. Magic Leap – with heavy investment from Google – appears to be doing the same thing. The now dormant Google Glass was neither AR nor VR, but was instead a heads-up display.

kgirl

c) HoloLens uses Kinect technology under the plastic covers. While the depth sensor in the Kinect v2 has a field of view of 70 degrees by about 60 degrees, the depth capability in HoloLens is reported to include a field of view of 120 degrees by 120 degrees. This indicates that HoloLens will be using the Time-of-Flight technology used in Kinect v2 rather than the structured light from Kinect v1. This set up requires both an IR emitter as well as a depth camera combined with a sophisticated timing and phase technology to efficiently and relatively inexpensively calculate depth.

hands

d) the depth camera is being used for multiple purposes. The first is for gesture detection. One of the issues that faced both Oculus and Google Glass was that they were primarily display technologies. But a computer monitor is useless without a keyboard or mouse. Similarly, Oculus and Glass needed decent interaction metaphors. Glass relied primarily on speech commands and tapping and clicking. Oculus had nothing until their recent acquisition of the NimbleVR . NimbleVR provides a depth camera optimized for hand and finger detection over a small range. This can be mounted in front of the Oculus headset. Conceptually, this allows people to use hand gestures and finger manipulations in front of the device. A virtual hand can be created as an affordance in the virtual world of the Oculus display, allowing users to interact with virtual objects and virtual interactive menus in virtro.

The depth sensor in HoloLens would work in a similar way except that instead of a virtual hand as affordance, it’s just your hand. You will use your hand to manipulate digital objects displayed on the AR lenses or to interact with AR menus using gestures.

An interesting question is how many IR sensors are going to be on the HoloLens device. From the pictures that have been released, it looks like we will have a color camera and a depth sensor for each eye, for a total of two depth cameras and two RGB cameras located near the joint between the lenses and the headband.

holo_minecraft

e) HoloLens is also using depth data for 3d reconstruction of real world surfaces. These surfaces are then used as virtual projection surfaces for digital textures. Finally, the blitted image is displayed on the transparent lenses.

ra1

ra_2

This sort of reconstruction is a common problem in projection mapping scenarios. A great example of applying this sort of reconstruction can be found in the Halloween edition of Microsoft Research’s RoomAlive project. In the first image above, you are seeing the experience from the correct perspective. In the second image, the image is captured from a different perspective than the one that is being projected. From the incorrect perspective, it can be seen that the image is actually being projected on multiple surfaces – the various planes of the chair as well as the wall behind it – but foreshortened digitally and even color corrected to make the image appear cohesive to a viewer sitting at the correct position. One or more Kinects must be used to calibrate the projections appropriately against these multiple surfaces. If you watch the full video, you’ll see that Kinect sensors are used to track the viewer as she moves through the room and the foreshortening / skewing occurs dynamically to adjust to her changing position.

The Minecraft AR experience being used to show the capabilities of HoloLens requires similar techniques. The depth sensor is required not only to calibrate and synchronize the digital display to line up correctly with the table and other planes in the room, but also to constantly adjust the display as the player moves around the room.

eye-tracking

f) are the display lenses stereoscopic or holographic? At this point no one is completely sure, though indications are that this is something more than the stereoscopic display technique used in the Oculus Rift. While a stereoscopic display will create the illusion of depth and parallax by creating a different image for each lens, something holographic would actually be creating multiple images per lens and smoothly shifting through them based on the location of each pupil staring through its respective lens and the orientation and position of the player’s head.

One way of achieving this sort of holographic display is to have multiple layers of lenses pressed against each other and using interference shift the light projected into each pupil as the pupil moves. It turns out that the average person’s pupils typically move around rapidly in saccades, mapping and reconstructing images for the brain, even though we do not realize this motion is occurring. Accurately capturing these motions and shifting digital projections appropriately to compensate would create a highly realistic experience typically missing from stereoscopic reconstructions. It is rumored in the industry that Magic Leap is pursuing this type of digital holography.

On the other hand, it has also been reported that HoloLens is equipped with eye-tracking cameras on the inside of the frames, apparently to aid with gestural interactions. It would be extremely interesting if Microsoft’s route to achieving true holographic displays involved eye-tracking combined with a high display refresh rate rather than coherent light interference display technology as many people assume. Or, then again, it could just be stereoscopic displays after all.

occlusion

g) occlusion is generally considered a problem for interactive experiences. For augmented reality experiences, however, it is a feature. Consider a physical-to-digital interaction in which you use your finger/hand to manipulate a holographic menu. The illusion we want to see is of the hand coming between the player’s eyes and the digital menu. The player’s hand should block and obscure portions of the menu as he interacts with it.

The difficulty with creating this illusion is that the player’s hand isn’t really between the menu and the eyes. Really, the player’s hand is on the far side of the menu, and the menu is being displayed on the HoloLens between the player’s eyes and his hand. Visually, the hologram of the menu will bleed through and appear on top of the hand.

In order to re-establish the illusion of the menu being located on the far side of the hand, we need depth-sensors to accurately map an outline of the hand and arm and then cut a hand and arm shape out of the menu where the hand should be occluding it. This process has to be repeated as the hand moves in real-time and it’s kind of a hard problem.

borg

h) misc sensors : best guess is that in addition to depth sensors, color cameras and eye-tracking cameras, we’ll also get a directional microphone, gyroscope, accelerometer and magnetometer. Some sort of 3D sound has been announced, so it makes sense that there is a directional microphone or microphone array to complement it. This is something that is available on both the Kinect v1 and Kinect v2. The gyroscope, accelerometer and magnetometer are also guesses – but the Oculus hardware has them to track quick head movements, head position and head orientation. It makes sense that HoloLens will need them also.

bono

i) the current form factor looks a little big – bigger than the Magic Leap is supposed to be but smaller than the current Oculus dev units. The goal – really everyone’s goal, from Microsoft to Facebook to Google – is to continue to bring down the size of sensors so we can eventually have heavy glasses rather than light-weight head gear.

j) vampires, infrared sensors and transparent displays are all sensitive to direct sunlight. This consideration can affect the viability of some AR scenarios.

k) like all innovative technologies, the success of HoloLens will depend primarily on what people use it for. the myth of the killer app is probably not very useful anymore, but the notion that you need an app store to sell a device is a generally accepted universal constant. The success of the HoloLens will depend on what developers build for it and what consumers can imagine doing with it.

Top 21 Ideas

Many of these ideas are borrowed from other VR and AR technology. In most cases, HoloLens will simply provide a better way to implement these notions. These ideas come from movies, from art installations, and from many years working at an innovative marketing agency where we prototyped these ideas day in and day out.

1. Shopping

Amazon made one click shopping make sense. Shopping and the psychology of shopping changes when we make it more convenient, effectively turning instant gratification into a marketing strategy. Using HoloLens AR, we can remodel a room with virtual furniture and then purchase all the pieces on an interactive menu floating in the air in front of us when we find the configuration we want. We can try and buy virtual clothes. With a wave of the hand we can stock our pantry, stock our refrigerator … wait, come to think of it, with decent AR, do we even need furniture or clothes anymore?

2. Gaming

IllumiRoom was a Microsoft project that never quite made it to product but was a huge hit on the web. The notion was to extend the XBox One console with projections that reacted to what was occurring in the game but could also extend the visuals of the game into the entire living room. IllumiRoom (which I was fortunate enough to see live the last time I was in Redmond) also uses a Kinect sensor to scan the room in order to calibrate projection mapping onto surfaces like bookshelves, tables and potted plants. As you can guess, this is the same team that came up with RoomAlive. A setup that includes a $1,500 projector and a Kinect is a bit complicated, especially when a similar effect can now be created using a single unit HoloLens.

hud

The HoloLens device could also be used for in-game Heads-Up notifications or even as a second screen. It would make a lot of sense if XBox integration is on the roadmap and would set XBox apart as the clear leader in the console wars.

3. Communication

‘nuff said.

4. Home Automation

clapper

Home automation has come a long way and you can now easily turn lights on and off with your smart phone from miles away. Turning your lights on and off from inside your own house may still involve actually touching a light switch. Devices like the Kinect have the limitation that they can only sense a portion of a room at a time. Many ideas have been thrown out to create better gesture recognition sensors for the home, including using wifi signals that go through walls to detect gestures in other rooms. If you were actually wearing a gestural device around with you, this would no longer be a problem. Point at a bulb, make a fist, “put out the light, and then put out the light” to quote the Bard.

5. Education

Microsoft-future-vision

While cool visuals will make education more interesting, the biggest benefit of HoloLens for education is simple access. Children in rural areas in the US have to travel long distances to achieve a decent education. Around the world, the problem of rural education is even worse. What if educators could be brought to the children instead? This is one of the stated goals of Facebook’s purchase of Oculus Rift and HoloLens can do the same job just as well and probably better.

6. Medical Care

Technology can be used for interesting diagnostic and rehabilitation functions. The depth sensors that come with HoloLens will no doubt be used in these ways eventually. But like education, one of the great problems in medical care right now is access. If we can’t bring the patient to the doctor, let’s bring the GP to the patient and do regular check ups.

7. Holodeck

matrix-i-know-kung-fu

The RoomAlive project points the way toward building a Holodeck. All we have to do is replace Kinect sensors with HoloLens sensors, projectors with holographic displays, and then try now to break the HoloLens strapped to our heads as we learn Kung Fu.

8. Windows

window

Have you ever wished you could look out your window and be somewhere else? HoloLens can make that happen. You’ll have to block out natural light by replacing your windows with sheetrock, but after that HoloLens can give you any view you want.

But why stop at windows. You can digitize all your walls if you want, and HoloLens’ depth technology will take care of the rest.

9. Movies and Television

vr-cinema-3d

Oculus Rift and Samsung Gear VR have apps that let you watch movies in your own virtual theater. But wouldn’t it be more fun to watch a movie with your entire family? With HoloLens we can all be together on the couch but watch different things. They can watch Barney on the flatscreen while I watch an overlay of Meet the Press superimposed on the screen. Then again, with HoloLens maybe I could replace my expensive 60” plasma TV with a piece of cardboard and just watch that instead.

10. Therapy

It’s commonly accepted that white noise and muted colors relax us. Controlling our environment helps us to regulate our inner states. Behavioral psychology is based on such ideas and the father of behavioral psychology, B. F. Skinner, even created the Skinner box to research these ideas – though I personally prefer Wilhelm Reich’s Orgone box. With 3D audio and lenses that extend over most of your field of view, HoloLens can recreate just the right experience to block out the world after a busy day and just relax. shhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.

11. Concerts

burning-man-festival-nevada

Once a year in the Nevada desert a magical music festival is held called Baraboo. Or, I don’t know, maybe it’s held in Tennessee. In any case, getting to festivals is really hard and usually involves being around people who aren’t wearing enough deodorant, large crowds, and buying plastic bottles of water for $20. Wouldn’t it be great to have an immersive festival experience without all the things that get in the way. Of course, there are those who believe that all that other stuff is essential to the experience. They can still go and be part of the background for me.

12. Avatars

Gamification is a huge buzzword at digital marketing agencies. Undergirding the hype is the realization that our digital and RL experiences overlap and that it sometimes can be hard to find the seams. Vernor Vinge’s 2001 novella Fast Times at Fairmont High draws out the implications of this merging of physical and digital realities and the potential for the constant self reinvention we are used to on the internet bleeding over into the real world. Why continue with the name your parents gave you when you can live your AR life as ByteMst3r9999? Why be constrained by your biological appearance when you can project your inner self through a fun and bespoke avatar representation? AR can ensure that other people only see you the way that you want them to.

13. Blocking Other People’s Avatars

BlackBlock

The flip side of an AR society invested in an avatar culture is the ability to block people who are griefing us. Parents can call a time out and block their children for ten minutes periods. Husbands can block their wives. We could all start blocking our co-workers on occasion. For serious offenses, people face permanent blocking as a legal sanction for bad behavior by the game masters of our augmented reality world. The concept was brilliantly played out in the recent Black Mirror Christmas special starring Jon Hamm. If you haven’t been keeping up with Black Mirror, go check it out. I’ll wait for you to finish.

14. Augmented Media

fiducial

Augmented reality today typically involves a smart phone or tablet and and a fiducial marker. The fiducial is a tag or bar code that indicates to the app on your phone where an AR experience should be placed. Typically you’ll find the fiducial in a magazine ad that encourages you to download an app to see the hidden augmented content. It’s novel and fun. The problem involves having to hold up your tablet or phone for a period of time just to see what is sometimes a disappointing experience. It would be much more interesting to have these augmented media experiences always available. HoloLens can be always on and searching for these types of augmented experiences as you read the latest New Yorker or Wired. They needn’t be confined to ads, either. Why can’t the whole magazine be filled with AR content? And why stop at magazines? Comic books with additional AR content would change the genre in fascinating ways (Marvel’s online version already offers something like this, though rudimentary). And then imagine opening a popup book where all the popups are augmented, a children’s book where all the illustrations are animated, or a textbook that changes on the fly and updates itself every year with the latest relevant information – a kindle on steroids. You can read about that possibility in Neal Stephenson’s Diamond Age – only available in non-augmented formats for now.

15. Terminator Vision

robocop

This is what we thought Google Glass was supposed to provide – but then it didn’t. That’s okay. With vision recognition software and the two RGB cameras on HoloLens, you’ll never forget a name again. Instant information will appear telling you about your surroundings. Maps and directions will appear when you gesture for them. Shopping associates will no longer have to wing it when engaging with customers. Instead, HoloLens will provide them with conversation cues and decision trees that will help the associate close the sale efficiently and effectively. Dates will be more interesting as you pull up the publicly available medical, education and legal histories of anyone who is with you at dinner. And of course, with the heartbeat monitor and ability to detect small fluctuations in skin tone, no one will ever be able to lie to you again, making salary negotiations and buying a car a snap.

16. Wealth Management

With instant tracking of the DOW, S&P and NASDAQ along with a gestural interface that goes wherever you go, you can become a day trader extraordinaire. Lose and gain thousands of dollars with a flick of your finger.

17. Clippit

clippit

Call him Jarvis if it helps. Some sort of AI personal assistant has always been in the cards. Immersive AR will make it a reality.

18. Impossible UIs

minority_report

phone

3dtouch

cloud atlas floating computer

I don’t watch movies the way other people do. Whenever I go to see a futuristic movie, I try to figure out how to recreate the fantasy user experiences portrayed in it. Minority Report is an easy one – it’s a large area display, possibly projection, with Kinect-like gestural sensors. The communication device from the Total Recall reboot is a transparent screen and either capacitive touch or more likely a color camera doing blob recognition. The 3D touchscreen from Pacific Rim has always had me stumped. Possibly some sort of leap motion device attached to a Pepper’s Ghost display? The one fantasy UX I could never figure out until I saw HoloLens is the “Orison” computer made up of floating disks in Cloud Atlas. The Orison screens are clearly digital devices in a physical space – beautiful, elegant, and the sort of intuitive UX for which we should strive. Until now, they would have been impossible to recreate. Now, I’m just waiting to get my hands on a dev device to try to make working Orison displays.

19. Wiki World

wikipedia

Wiki World is a simple extension of terminator vision. Facts floating before your eyes, always available, always on. No one will ever have to look up the correct spelling for a word again or strain his memory for a sports statistic. What movie was that actor in? Is grouper ethical to eat? Is Javascript an object-oriented language? Wiki world will make memorization obsolete and obviate all arguments – well, except for edit wars between Wikipedia editors, of course.

20. Belief Circles

wwc

Belief circles are a concept from Vernor Vinge’s Hugo award winning novel Rainbows End. Augmented reality lends itself to self-organizing communal affiliations that will create inter-subjective realities that are shared. Some people will share sci-fi themes. Others might go the MMO route and share a fantasy setting with a fictional history, origin story, guilds and factions. Others will prefer football. Some will share a common religion or political vision. All of these belief circles will overlap and interpenetrate. Taking advantage of these self-generating belief circles for content creation and marketing will open up new opportunities for freelance creatives and entrepreneurs over the next ten years.

21. Theater of Memory

Giulio Camillo’s memory theater belongs to a long tradition of mnemonic technology going back to Roman times and used by orators and lawyers to memorize long speeches. The scholar Frances Yates argued that it also belonged to another Renaissance tradition of neoplatonic magic that has since been superseded by science in the same way that memory technology has been superseded by books, magazines and computers. What Frances Yates – and after her Ioan Couliano – tried to show, however, was that in dismissing these obsolete modes of understanding the world, we also lose access to a deeper, metaphoric and humanistic way of seeing the world and are the poorer for it. The theater of memory is like Feng Shui – also discredited – in that it assumes that the way we construct our surroundings also affects our inner lives and that there is a sympathetic relationship between the macrocosm of our environment and the microcosm of our emotional lives. I’m sounding too new agey so I’ll just stop now. I will be creating my own digital theater of memory as soon as I can, though, as a personal project just for me.