Recent rumors circling around Pokémon Go suggest that they will delay their next major update until next year. It was previously believed that they would be including additional game elements, creatures and levels beyond level 40 sometime in December.
A large gap between releases like this would seem to leave the door open to other copy cat games to move into the opening that Niantec is providing them. And maybe this wouldn’t be such a bad thing. While World of Warcraft is the most successful MMORPG, for instance, it certainly wasn’t the first. Dark Age of Camelot, Everquest, Asheron’s Call and Ultima Online all preceded it. What WoW did was perhaps to collect the best features of all these games while also ride the right graphics card cycle to success.
A similar student-becomes-the-master trope can play out for other franchise owners, since the only thing that seems to be required to get a game similar to Pokemon going is a pre-existing storyline (like WoW had) and 3D assets either available or easily created to go into the game. With Azure and AWS cloud computing easily available, even infrastructure isn’t such a challenge as it was when the early MMORPGs were starting. Possible franchise holders that could make the leap into geographically-aware augmented reality games include Disney, Wow itself, Yu-Gi-Oh!, Magic the Gathering, and Star Wars.
Imagine going to the park one day and asking someone else face down staring at their phone if they know where the bulbasaur showing up on the nearby is and having them not knowing what you are talking about because they are looking for Captain Hook or a jawa on their nearby?
This sort of experience is exemplary of what Vernor Vinge calls belief circles in his book about augmented reality, Rainbow’s End. Belief circles describe groups of people who share a collaborative AR experience. Because they also share a common real life world with others, their belief circles may conflict with other people’s belief circles. What’s even more peculiar is that members of different belief circles do not have access to each other’s augmented worlds – a peculiar twist on the problem of other minds. So while a person in H.P. Lovecraft’s belief circle can encounter someone in Terry Pratchett’s Discworld belief circle at a Starbuck’s, it isn’t at all clear how they will ultimately interact with one another. Starbuck’s itself may provide virtual assets that can be incorporated into either belief circle in order to attract customers from different worlds and backgrounds – basically multi-tier marketing of the future. Will different things be emphasized in the store based on our self-selected belief circles? Will our drinks have different names and ingredients? How will trademark and copyright laws impact the ability to incorporate franchises into the muti-aspect branding of coffee houses, restaurants and other mall stores?
But most of all, how will people talk to each other? One of the great pleasures of playing Pokemon today is encountering and chatting with people I otherwise wouldn’t meet and having a common set of interests that trump our political and social differences. Belief circles in the AR future of five to ten years may simply encourage the opposite trend of community Balkanization in interest zones. Will high concept belief circles based on art, literature and genre fiction simply devolve into Democrat and Republican belief circles at some point?
Pokémon Go is the first big augmented reality hit. It also challenges our understanding of what augmented reality means. While it has AR modes for catching as well as battling pokémon, it feels like an augmentation of reality even when these modes are disabled.
Pokémon Go in large part is a game overlaid on top of a maps application. Maps apps, in turn, are an augmentation overlaid on top of our physical world that track our position inside of the digital representation of streets and roads. More than anything else, it is the fully successful cartography referred to George Luis Borges’s story On Exactitude in Science, prominently referred to in Baudrillard’s monograph Simulacra and Simulation.
Pokémon Go’s digital world is also the world’s largest game world. Games like Fallout 4 and Grand Theft Auto V boast of worlds that encompass 40 sq miles and 50 sq miles, respectively. Pokémon Go’s world, on the other hand, is co-extensive with the mapped world (or the known world, as we once called it). It has a scale of one kilometer to one kilometer.
Pokémon Go is an augmented reality game even when we have AR turned off. Players share the same geographic space we do but live, simultaneously, in a different world revealed to them through their portable devices. It makes the real world more interesting – so much so that the sedentary will participate in exercise while the normally cautious will risk sunburn and heatstroke in post-climate change summers around the world in order to participate in an alternative reality. In other words, it shapes behavior by creating new goals. It creates new fiscal economies in the process.
Which is all a way of saying that Pokémon Go does what marketing has always wanted to do. It generates a desire for things that, up to a month ago, did not exist.
A desire for things which, literally, do not exist today.
What more fitting moniker to describe a desire for something that does not exist than Pokémonography. Here are some pics from my personal collection.
This is the first of a multipart blog series covering re-reads of popular media about Virtual and Augmented Reality. In future installments, I plan to cover classics like William Gibson’s Neuromancer, Vernor Vinge’s Rainbow’s End, Ridley Scott’s Blade Runner and the anime series Ghost in the Shell. The worn premise of the series is that our collective vision of the future was formed long ago in the past and we are, in many ways, simply walking the path others have set for us in their imaginations. That being the case, the best way to navigate our own futures is by raiding popular fringe culture in order to find the blueprints. In other words, this is an excuse to revisit some of my favorite books, movies and anime. Each entry in the series will provide a summary of the work, an overview of the AR or VR technology represented, and an analysis of the impact of the work on contemporary virtual world technology – in other words, what lessons can be drawn from the work.
Non-spoilerish summary of the work
Ernst Cline’s novel takes place in a dystopic 2044 where the world economy has collapsed, corporations have taken over, and the hacker hero, 17 year old Wade Watts, spends most of his life plugged into a virtual reality world called the OASIS while waiting to graduate from high school. He has also spent the past five years of his life as a Gunter – someone on a quest to find the video game easter egg left behind by the creator of the OASIS, James Halliday, somewhere inside his massive online virtual universe. To whomever discovers his easter egg, James Halliday has bequeathed his vast fortune of hundreds of billions of dollars.
The secret to solving Halliday’s puzzles turns out to be an understanding of Halliday’s love for the 80’s, the decade in which he grew up, and an encyclopedic understanding of the popular movies, music, and video games of the 80’s as well as a decent familiarity with D&D. Wade, along with a motley band of friends, fights an evil mega-corporation for control of Halliday’s inheritance as well as control of his online world.
How it works
The VR hardware consists of a hi-rez virtual reality stereoscopic headset and haptic gloves connected to a custom game console. Basically the Oculus Rift or HTC Vive with much higher resolution. An internet connection is required. Instead of servers, the shared simulation runs using some sort of peer-to-peer networked computing system that borrows compute time from all the machines of all the players. The virtual world itself is a single shared world rather than a series of shards. Everyone who plays, which turns out to be almost everyone in the world, is in the OASIS at the same time.
Monetization turns out to be a major aspect of the plot. The OASIS is not subscription based and does not have ads. Instead, Halliday’s company makes its income through in-app purchases and transportation fees for teleportation from one part of the OASIS universe to another.
This is because the OASIS universe is huge and consists of thousands of planets filled mostly with user created content. To get from one planet to another requires an in-game spaceship or travel through a transporter. Different worlds, and even sectors of OASIS space, are governed by different themes. This is perhaps the most interesting part of Ernst Cline’s VR universe. The OASIS is ultimately a pastiche of imaginary worlds from science fiction and fantasy. The Star Wars sector is just next door to the Star Trek sector of space. Firefly has it’s own area. The Lord of the Rings, Dragonriders of Pern and World of Warcraft each have at least one planet devoted to them. Different sectors of VR space even work under different physical laws, so magic will not work in some while technology will not work in others.
Cline’s VR universe doesn’t involve world-building, as such, but rather a huge mash-up project to recover and preserve all past efforts at world-building.
Ponder the lessons to be learned at your own risk!!!!
The eighties were my decade and a book devoted to recovering obscure details about it is inherently fun for me. In fact, it feels like it is written for me.
That said, the notion of a future overwhelmed by nostalgia is troubling. Not only is the OASIS effectively a museum for retro-futurism, but the quest at the center of the book is an attempt by a Howard Hughes figure to make others obsess over his teenage years as much as he does.
What if virtual reality is an old man’s game? We all hope that future technology will create new worlds and open up new possibilities, but what if all the potential for VR and AR is ultimately overwhelmed by the obsessions of the past and guided by what we have wanted VR to be since Star Wars movie first appeared on movie house screens?
What if this is the ultimately paradox of emerging technologies: that new technology is always created to solve the problems and fill the appetites of yesterday? We can make our first lesson from the history of VR a paraphrase of George Santayana’s famous saying.
Maxim 1 – In the virtual world, those who can’t let go of the past are doomed to repeat it.
[Update 4/23 – this turns out to be just a re-appropriation of an extension name. Kinect studio doesn’t recognize the HoloLens XEF format and vice-versa.]
The HoloLens documentation reveals interesting connections with the Kinect sensor. As most people by now know, the man behind the HoloLens, Alex Kipman, was also behind the Kinect v1 and Kinect v2 sensors.
One of the more interesting features of the Kinect was its ability to perform a scan and then play that scan back later like a 3D movie. The Kinect v2 even came with a recording and playback tool for this called Kinect Studio. Kinect Studio v2 serialized recordings in a file format known as the eXtended Event File format. This basically recorded depth point information over time – along with audio and color video if specified.
Now a few years later we have HoloLens. Just as the Kinect included a depth camera, the HoloLens also has a depth camera that it uses to perform spatial mapping of the area in front of the user. These spatmaps are turned into simulations that are then combined with code so that in the final visualization, 2D apps appear to be pinned to globally fixed positions while 3D objects and characters seem to be aware of physical objects in the room and interact with them appropriately.
Deep in the documentation on the HoloLens emulator is fascinating information about the ability to play back previously scanned rooms in the emulator. If you have a physical headset, it turns out you can also record surface reconstructions using the windows device portal.
The serialization format, it turns out, is the same one that is used in Kinect Studio v2: *.xef .
An interesting fact about the XEF format is that Microsoft never released any documentation about what the xef format looked like. When I open up a saved xef file in Notepad++, this is what it looks like:
Microsoft also never released a library to deserialize depth data from the xef format, which forced many people trying to make recordings to come up with their own, idiosyncratic formats for saving depth information.
Hopefully, now that the same format is being used across devices, Microsoft will be able to finally release a lib for general use – and if not that, then at least a spec of how xef is formed.
The HTC Vive, Oculus Rift and Microsoft HoloLens all opened for pre-orders in 2016 with plans to ship in early April (or late March in the case of the Oculus). All have run into fulfillment problems creating general confusion for their most ardent fans.
I won’t try to go into all the details of what each company originally promised and then what each company has done to explain their delays. I honestly barely understand it. Oculus says there were component shortages and is contacting people through email to update them. Oculus also refunded some shipping costs for some purchasers as compensation. HTC had issues with their day one ordering process and is using its blog for updates. Microsoft hasn’t acknowledged a problem but is using its developer forum to clarify the shipping timeline.
Maybe it’s time to acknowledge that spinning up production for expensive devices in relatively small batches is really, really hard. Early promises from 2015 followed by CES in January 2016 and then GDC in March probably created an artificial timeline that was difficult to hit.
On top of this, internal corporate pressure has probably also driven each product group to hype to the point that it is difficult to meet production goals. HTC probably has the most experience with international production lines for high tech gear and even they stumbled a bit.
Maybe it’s also time to stop blaming each of these companies as they reach out for the future. All that’s happened is that some early adopters aren’t getting to be as early as they want to be (including me, admittedly).
As William Gibson said, “The future is already here — it’s just not very evenly distributed.”
These experiences sketches are an initial concept exploration for a pen and paper role playing game like Dungeons & Dragons augmented by mixed reality devices. The first inklings for this were worked out on the HoloLens forums and I want to thank everyone who was kind enough to offer their creative suggestions there.
I’ve always felt that the very game mechanics that make D&D playable are also one of the major barriers to getting a game going. The D&D mechanics were eventually translated into a variety of video games that made progressing through an adventure much easier. Instead of spending half an hour or more working out all the math for a battle, a computer can do it in a fraction of the time and also throw in nice particle effects to boot.
What gets lost in the process is the story telling element as well as the social aspect of playing. So how to we reintroduce the dungeon master and socialization elements of role playing without having to deal with all the bookkeeping?
With augmented reality, we can do much of the math and bookkeeping in the background, allowing players to spend more time talking to each other, making bad puns and getting into their characters. Instead of physical playing pieces, I think we could use flat player bases instead imprinted with QR codes that identify the characters. 3D animated models can be virtually projected onto the bases. Players can then move their bases on the underlying (virtual) hex maps and the holograms will continue to be oriented on their bases correctly.
All viewpoints are calibrated for the different players so everyone seems the same things happening – just from different POVs. The experience can also be enhanced with voice commands so when your magic user says “magic missile” everyone gets to see appropriate particle effects on the game table shooting from the magic user character’s hands at a target.
I feel that the dice should be physical objects. The feel of the various dice and the sound of the dice are an essential component of pen and paper role playing. Instead, I want to use computer vision to calculate the outcomes, present digital visualizations of successful and unsuccessful rolls over the physical dice, and then perform automatic calculations in the background based on the outcome.
The player character’s stats and relevant info should over over him on the game table. As rolls are made, the health points and stats should update automatically.
While some of the holograms on the table are shared between all players, some are only for the dungeon master. As players move from area to area, opening them up through a visual fog of war, the dungeon master will be able to see secret content like the location of traps and the stats for NPCs. It may also be cool to enable remote DMs who appear virtually to host a game. The thought here is that a good DM is hard to find and in high demand. It would be interesting to use AR technology to invite celebrity DMs or paid DMs to help get a regular game going.
When I started planning out how a holographic D&D game would work, there was still some confusion over the visible range of holograms with HoloLens. I was concerned that digital D&D pieces would fade or blur at close ranges – but this turns out not to be true. The main concern seems to be that looking at a hologram less than a meter away for extended periods of time will trigger vergence-accomodation mismatch for some people. In a typical D&D game, however, this shouldn’t be a problem since players can lean forward to move their pieces and then recline again to talk through the game.
AR can also be used to help with calorie control for that other important aspect of D&D – snack foods and sodas.
Please add your suggestions, criticisms and observations in the comments section. The next step for me is creating some prototypes in Unity of gameplay. I’ll post these as they become ready.
And just in case it’s been a long time and you don’t remember what’s so fun about role playing games, here’s an episode of the web series Table Top with Will Wheaton, Chris Hardwick and Sam Witwer playing the Dragon Age role playing game.
There seems to be some confusion over what the HoloLens’s depth sensor does, on the one hand, and what the HoloLens’s waveguides do, on the other. Ultimately the HoloLens HPU fuses these two two streams into a single image, but understanding the component pieces separately are essential for creating good holographic UX.
From what I can gather (though I could be wrong), HoloLens uses a single depth camera similar to the Kinect to perform spatial mapping of the surfaces around a HoloLens user. If you are coming from Kinect development, this is the same thing as surface reconstruction with that device. Different surfaces in the room are scanned by the depth cameras. Multiple passes of the scan are stitched together and merged over time (even as you walk through the room) to create a 3D mesh of the many planes in the room.
These surfaces can then be used to provide collision detection information for 3D models (holograms) in your HoloLens application. The range of the depth spatial mapping cameras is 0.85 meters to 3.1 meters. This means that if a user wants to include a surface that is closer than 0.85 M in their HoloLens experience, they will need to lean back.
The functioning of the depth camera shouldn’t be confused with the functioning of the HoloLens’s four “environment aware” cameras. These cameras are used to help in nailing down the orientation of the HoloLens headset in what is known as inside-out positional tracking. You can read more about that in How HoloLens Sensors Work. It is probably the case that the depth camera is used for finger tracking while the 4 environment aware cameras are devoted to spatial mapping.
The depth camera spatial mapping in effect creates a background context for the virtual objects created by your application. These holograms are the foreground elements.
Another way to make the distinction based on technical functionality rather than on the user experience is to think of the depth camera surface reconstruction data as input , and holograms as output. The camera is a highly evolved version of the keyboard while the waveguide displays are modern CRT monitors.
It has been misreported that the minimum distance for virtual objects in HoloLens is also 0.85 Meters. This is not so.
The minimum range for hologram placement is more like 10 centimeters. The optimal range for hologram placement, however, is 1.25 m to 5 m. In UWP, the ideal range for placing a 2D app in 3D holographic space is 2 m.
Microsoft also discusses another range they call the comfort zone for holograms. This is the range where vergence-accommodation mismatch doesn’t occur (one of the causes of VR sickness for some people). The comfort zone extends from 1 m to infinity.
|Range Name||Minimum Distance||Maximum Distance|
|Depth Camera||0.85 m (2.8 ft)||3.1 m (10 ft)|
|Hologram Placement||0.1 m (4 inches)||infinity|
|Optimal Zone||1.25 m (4 ft)||5 m (16 ft)|
|Comfort Zone||1.0 m (3 ft)||infinity|
The most interesting zone, right now, is of course that short range inside of 1 meter. That 1 meter min comfort distance basically prevents any direct interactions between the user’s arms and holograms. The current documentation even says:
Assume that a user will not be able to touch your holograms and consider ways of gracefully clipping or fading out your content when it gets too close so as not to jar the user into an unexpected experience.
When a user sees a hologram, they will naturally want to get a closer look and inspect the details. When a human being looks at something up close, he typically wants to reach out and touch it.
Tactile reassurance is hardwired into our brains and is a major tool for human beings to interact with the world. Punting on this interaction, as the HoloLens documentation does, is a great way to avoid this basic psychological need in the early days of designing for augmented reality.
We can pretend for now that AR interactions are going to be analogs to mouse click (air-tap) and touch-less display interactions. Eventually, though, that 1 meter range will become a major UX problem for everyone.
[Updated 4/6 after realizing I probably have the functioning of the environment aware cameras and the depth camera reversed.]
[Updated 4/21 – nope. Had it right the first time.]
[Note: this post is entirely my own opinion and purely conjectural.]
Best current guesses are that the HoloLens field of view is somewhere between 32 degrees and 40 degrees diagonal. Is this a problem?
We’d definitely all like to be able to work with a larger field of view. That’s how we’ve come to imagine augmented reality working. It’s how we’ve been told it should work from Vernor Vinge’s Rainbow’s End to Ridley Scott’s Prometheus to the Iron Man trilogy – in fact going back as far as Star Wars in the late 70’s. We want and expect a 160-180 degree FOV.
So is the HoloLens’ field of view (FOV) a problem? Yes it is. But keep in mind that the current FOV is an artifact of the waveguide technology being used.
What’s often lost in the discussions about the HoloLens field of view – in fact the question never asked by the hundreds of online journalists who have covered it – is what sort of trade-off was made so that we have the current FOV.
A common internet rumor – likely inspired by a video by tech evangelist Bruce Harris taken a few months ago – is that it has to do with cost of production and consistency in production. The argument is borrowed from chip manufacturing and, while there might be some truth in it, it is mostly a red herring. An amazingly comprehensive blog post by Oliver Kreylos in August of last year went over the evidence as well as related patents and argued persuasively that while increasing the price of the waveguide material could improve the FOV marginally, the price difference was prohibitively expensive and ultimately nonsensical. At the end of the day, the FOV of the HoloLens developer unit is a physical limitation, not a manufacturing limitation or a power limitation.
But don’t other AR headset manufacturers promise a much larger FOV? Yes. The Meta 2 (shown below) has a 90 degree field of view. The way the technology works, however, involves two LED screens that are then viewed through plastic positioned at 45 degrees to the screens (technically known as a beam splitter, informally known as a piece of glass) that reflects the image into the user’s eyes at approximately half the original brightness while also letting the real world in front of the user (though half of that light is also scattered). This is basically the same technique used to create ghostly images in the Haunted Mansion at Disneyland.
The downside of this increased FOV is you are loosing a lot of brightness through the beam splitter. You are also losing light based on the distance it takes the light to pass through the plastic and get to your eyes. The result is a see-through “hologram”.
But is this what we want? See-through holograms? The visual design team for Iron man decided that this is indeed what they wanted for their movies. The translucent holograms provide a cool ghostly effect, even in a dark room.
The Princess Leia hologram from the original Star Wars, on the other hand, is mostly opaque. That visual design team went in a different direction. Why?
My best guess is that it has to do with the use of color. While the Iron Man hologram has a very limited color palette, the Princess Leia hologram uses a broad range of facial tones to capture her expression – and also so that, dramatically, Luke Skywalker can remark on how beautiful she is (which obviously gets messed up by the Return of the Jedi). Making her transparent would simply wash out the colors and destroy much of the emotional content of the scene.
The idea that opacity is a pre-requisite for color holograms is confirmed in the Star Wars chess scene on the Millennium Falcon. Again, there is just enough transparency to indicate that the chess pieces are holograms and not real objects (digital rather than physical).
So what kind of holograms does the HoloLens provide, transparent or near-opaque? This is something that is hard to describe unless you actually see it for yourself but the HoloLens “holograms” will occlude physical objects when they are placed in front of them. I’ve had the opportunity to experience this several times over the last year. This is possible because these digital images use a very large color palette and, more importantly, are extremely intense. In fact, because the holoLens display technology is currently additive, this occlusion effect actually works best with bright colors. As areas of the screen become darker, they actually appear more transparent.
Bigger field of view = more transparent , duller holograms. Smaller field of view = more opaque, brighter holograms.
I believe Microsoft made the bet that, in order to start designing the AR experiences of the future, we actually want to work with colorful, opaque holograms. The trade-off the technology seems to make in order to achieve this is a more limited field of view in the HoloLens development kits.
At the end of the day, we really want both, though. Fortunately we are currently only working with the Development Kit and not with a consumer device. This is the unit developers and designers will use to experiment and discover what we can do with HoloLens. With all the new attention and money being spent on waveguide displays, we can optimistically expect to see AR headsets with much larger fields of view in the future. Ideally, they’ll also keep the high light intensity and broad color palette that we are coming to expect from the current generation of HoloLens technology.
Microsoft is releasing an avalanche of information about HoloLens this week. Within that heap of gold is, finally, clearer information on the actual hardware in the HoloLens headset.
I’ve updated my earlier post on How HoloLens Sensors Work to reflect the updated spec list. Here’s what I got wrong:
1. Definitely no internal eye tracking camera. I originally thought this is what the “gaze” gesture was. Then I thought it might be used for calibration of interpupillary distance. I was wrong on both counts.
2. There aren’t four depth sensors. Only one. I had originally thought these cameras would be used for spatial mapping. Instead just the one depth camera is, and it maps a 75 degree cone out in front of the headset, with a range of 0.8 M to 3.1 M.
3. The four cameras I saw are probably just grayscale cameras – and it’s these cameras along with cool algorithms that are being used to do inside-out position tracking along with the IMU.
Here are the final sensor specs:
- 1 IMU
- 4 environment understanding cameras
- 1 depth camera
- 1 2MP photo / HD video camera
- Mixed reality capture
- 4 microphones
- 1 ambient light sensor
The mixed reality capture is basically a stream that combines digital objects with the video stream coming through the HD video camera. It is different from the on-stage rigs we’ve seen which can calculate the mixed-reality scene from multiple points of view. The mixed reality capture is from the user’s point of view only. The mixed-reality capture can be used for streaming to additional devices like your phone or TV.
Here are the final display specs:
- See-through holographic lenses (waveguides)
- 2 HD 16:9 light engines
- Automatic pupillary distance calibration
- Holographic Resolution: 2.3M total light points
- Holographic Density: >2.5k radiants (light points per radian)
I’ll try to explain “light points” in a later post – if I can ever figure it out.