VR / AR Fundamentals — Prologue

Michael Naimark
13 min readJan 26, 2018

--

Science with Attitude.

[March 2019: This article, and the entire VR/AR Series, is now available bilingually in Chinese and English courtesy of NYU Shanghai.]

These presentations will be posted weekly beginning next week.

Greetings! And welcome to the first in a series of six weekly pieces on Virtual Reality and Augmented Reality “fundamentals”. They’re meant to be fast, brief, holistic, and accessible to non-engineers.

The motivation is practical. I get calls from novice VR/AR producers and from check-writing executives worried about expensive camera stitching, or wanting awesome spatial sound, or feeling obliged to always fill the entire 360 degree view. But all too often they are unaware of zero-parallax points and computational photography, of phase differences and binaural audio, or of creative alternatives to full 360s from the arts community. It’s like, for VR and AR, no one knows, or apparently cares, what’s under the hood. The results are often avoidable disasters and thwarted creativity, not good for embryonic systems anywhere, where small changes can have lasting effects.

This is an easy one for me: I’m teaching it right now, as visiting faculty at NYU Shanghai. Over the next six weeks, I’ll be giving these presentations here and will post them in sync with the class. They’ll be a little scruffy and unfinished, which is fine. Your feedback is welcome!

Be positively critical (and have a good bullshit detector)

One more thing. I’m instilling in the students the value of being “positively critical:” to be simultaneously enthusiastic, sensitive, and kind while being distanced, direct, and fearless. And, as in the “rules to live by” of both the late great Red Burns and the brilliant Laurie Anderson and luminous husband Lou Reed, to also have a good bullshit detector.

So I will call things as I see them. Positively.

Today is the prologue. The presentations begin next week.

Prologue

I. The Dark Side: The Fog of VR

1. Delirium

I saw the best minds of my generation destroyed by madness. (Allen Ginsberg, Howl)

Twice.

I’ve seen this now twice, at least regarding VR.

It seems something very strange happens when “virtual” and “reality,” two already loaded words in philosophy and critical theory circles, are placed together. It’s not unlike the ring in The Hobbit — “MIIINE!” — or the monster in the Frankenstein legend — “LIIIFE!”. When something is positioned as the greatest, most significant, world-changing thing ever, even the best minds can become, well, a little insane.

The first time was during the “VR 1.0” wave in the early 1990s. We were all convinced that VR was going to change the world. We were seduced by the press spotlight. And, being closer to the 1960s, we saw the connection between VR and counterculture. Significantly, the excitement was more conceptual than financial, and we were, by and large, tripping, perhaps in a good way. Most of us had survival strategies to continue work, often in universities, corporate labs, day jobs, and the arts.

The second time was triggered, as the birth-legend goes, by Facebook’s $2B acquisition of Oculus in 2014. That’s a lot of money for a small group of young male gamers. VR meetups were sold out. VR expos felt obliged to rent lasers and smoke machines. Panels like “VR in 100 Years,” while maybe amusing, had absolutely no bearing or relevance on anything useful. Several tech-bro scandals in the VR community surfaced.

The bottom line: VR hasn’t met its market expectations.

Maybe it took billions of wasted dollars and some deep breathing to reassess and recalibrate. Maybe AR will do better.

2. Naïveté

A couple years ago, a VR accelerator of a VR tech giant asked on its online application: “Years of VR experience?” There were three options on the pulldown menu: “<1 Year, 1–2 Years, and >2 Years.” I found this a bit alarming and contacted the head, a friend, to let him know. He thanked me and the next day, the pulldown menu had changed to six options: “<1, 1–2, 2–5, 6–10, 11–20, and 20+ years”. Perhaps better, but nevertheless reflecting a rather innocent, some might say clueless, perspective about timeframes.

“Beginner’s Mind” is a phrase heard often in Silicon Valley, particularly at the tech giants. The idea is to keep the mind open, eager, and without any preconceptions. The phrase comes from Zen Buddhism and was popularized by Shunryu Suzuki, who wrote a book called Zen Mind Beginner’s Mind and founded the San Francisco Zen Center. I doubt most of the people who use the phrase know that. Beginner’s Mind is often used to justify ignoring prior art and expertise.

In 2010, the San Francisco Zen Center realized the irony, particularly around Silicon Valley, and organized a series of workshops called “The Expert Mind,” quoting cognitive linguist George Lakoff : “To be an expert is always to be a beginner, because expertise opens up new worlds constantly.”

I’ve had more than one debate with tech giant leadership about how Beginner’s Mind is often used as an excuse for lack of rigor and not doing homework. At the very least, I’d argue, it’s very, very costly.

But there’s a bigger issue. To say “I’m working in VR or AR and nothing before 5 years ago is relevant” is to ignore the creativity inherent in making abstract connections. Here’s an example.

360 degree dialogue tests from VR Cinematography Studies for Google (2016)

When the subjects were both directly opposite the camera looking “through you” (1), it wasn’t very convincing. And when they were relatively close to each other (4), why bother using 360 VR? The middle clips were most interesting.

from “Lawrence of Arabia,” an early widescreen movie (1962)

As television penetrated American households, Hollywood responded by finding ways to make enhanced movie theater experiences. One such way was widescreen (others such as 3D, 70mm, and, alas, 360 are equally interesting). Director David Lean exploited this new format to compose shots like this, literally impossible beforehand. In 1963, Lawrence of Arabia won seven Academy Awards including Best Picture, Best Director, and Best Cinematography.

Making abstract connections equals creativity.

In the academic media arts community, we often talk about how to encourage students to make connections to their current work from prior art and from other disciplines. The more abstract they can connect, the stronger it will inform their work, and the stronger their work will be. It’s that simple.

3. Hype

Gartner is a multibillion dollar research and advisory firm for the tech sector which annually produces a much-anticipated hype cycle graph. Here’s the latest one. Note where VR and AR are. Those poor suckers working on Deep Learning, Quantum Computing, and Smart Dust may not be ready for the nosedive into the Trough of Disillusionment.

Several years ago, my colleagues at USC and I pitched a novel method for integrating photos into 3D models such as Google Earth. One pitch was to a prominent and seasoned VC known for early round funding. He watched our demo and said “That’s a solid 4X idea,” meaning that he believed investors will have a solid chance of getting a fourfold return on their investment. Then he said:

We don’t do 4X (ROI). We’re interested in 40X.

We’ve heard this regularly since, and some would say it’s become the norm for high tech investing. Now consider the implications, as it relates to the Gartner Cycle and hype: pitches which are impossible to support with data, a gladiator mentality (usually by 20-something alpha male tech bros), greed-driven more than vision-driven, out of touch with the real world. Some might call this behavior psychotic.

And woe is the poor entrepreneur who pitches that investors can double their money in a year. Old guard say tech investing used to be like that, but those days are gone.

As someone who’s spent several decades mostly based in San Francisco and working regularly in Silicon Valley and Hollywood, I’ll call this one out: This is bullshit.

The Gartner graph condones and perpetuates overshoot, as if it’s a law of nature and investors and entrepreneurs play along. Imagine if your home thermostat worked like this, where it first turns the heat way up then way down before hitting the desired setting. What would you call it? BROKEN.

I actually suspect that 4Xers have a better chance of hitting 40X than 40Xers, since they are more dedicated, empassioned, and resourceful. But we’ll need the Business School study of this first.

And so it has gone with VR and AR.

II. Highest-Level Things to Know about VR and AR

Here’s my take on “highest-level” things to know about VR and AR which are often overlooked.

1. VR and AR both have 3 fundamentally different methods for experience: hand-held mobile, headset, and external largescale display.

VR today has largely been equated with headsets. Historically, audiovisual immersion was achieved through public space venues going back to 19th Century Cycloramas, evolving into “special venue” cinema such as Imax and CircleVision, and continuing with “location-based” VR today which uses projection. A surprise to many has been the popularity of “magic window” VR, in which a smart phone is either waved or thumb-scrolled for a 360 degree experience. Magic window VR is obviously more convenient and accessible, no additional gear needed, but less immersive with its small field of view and monoscopic image. The VR headset people appreciate the wide, stereoscopic field of view and are currently caught in a chicken-egg situation between immersive content and inexpensive, high-quality hardware.

AR today is largely equated with hand-held mobile. Beyond Pokemon Go, mobile AR is white hot, with big players making cheap, powerful, easy-to-use tools such as Google’s ARCore and Apple’s ARKit. Like magic window VR, mobile AR may be less immersive but is most convenient. AR external largescale display is mainly the domain of projection mapping (and I can’t resist humbly pointing out that an early art project of mine is noted as the second instance ever, between Disney’s Haunted Mansion (#1) and Ramesh Raskar’s landmark work at University of North Carolina (#3)).

As for AR headsets, we’ll get to in a moment.

2. There appears to be consensus that AR and VR headsets will be the same thing and look like everyday eyeglasses, but not for about 5 years.

Once the display technologies for immersive headsets are perfected, the only difference between AR and VR is whether the displays allow see-through (for AR) or become opaque (for VR). I’m hearing “5 years” as a projected timeline for such things, in terms of quality and price for mass adoption.

I’m also hearing that most people inside the tech giants believe everyone will wear such AR/VR glasses all of their waking hours, just like eyeglasses today, with no reason to ever take them off. Should we take bets?

3. The biggest question today regarding AR headsets is whether see-through opacity is doable.

Promotional image from AR headset company Magic Leap, the most well-funded startup in VR/AR today ($1.9B)

Note how the whale in this Magic Leap image (which is a simulation) appears plenty real, perfectly integrated into the actual space. Particularly note how it’s not transparent or ghost-like but appears opaque, visually blocking everything behind it. The question of whether that’s possible through any thin material (as opposed to “pass though” via a video camera), at least in the foreseeable future, is contestable.

Magic Leap claims to do 3 interesting things. First, they use “retinal scanning” to display the image close to the eyes. Retinal scanning is something new, in that it’s unlike any other displays, but not that new, and Magic Leap did not invent it. Second, they claim to be able to change the focus of the displays, so that the lenses inside our eyes actually change focus like they do when viewing something close or far away. (The technical term for focus is accommodation, which we’ll be discussing next week.) This kind of variable accommodation is indeed new, and Magic Leap has several patents for it. The few outsiders that have seen demos have verified it, and it sounds amazing. Their third claim is see-through opacity, like the whale above, and is more difficult to assess. True see-through opacity would work regardless how bright the background is. Conversely, no see-through opacity may look like it’s working if the background is dim. Currently, Magic Leap controls all demos, so the jury is out until the product reaches developers, announced to be early 2018.

4. VR Cinema and VR Games come from different lineages and different cultures.

With almost no exceptions, donning a VR headset today and looking at VR titles will be driven one of two very different ways. Either it’s “cinematic VR,” playing frame after frame of linear material, or it’s “VR games” driven by a game engine. Cinematic VR may be 360 degrees and may be stereoscopic or even volumetric (which we’ll discuss in two weeks) but the only interaction is PLAY, PAUSE, REWIND, etc., with the exception of branching or cutting from one linear clip to another linear clip. The attraction of cinematic VR, simply put, is that “it looks real” in that it’s photorealistic and was shot by a camera.

VR games, on the other hand, are driven by models, usually 3D models rather than 2D video frames, and can offer full interactive control. You can move the virtual chair, blow up the virtual watermelon, or fight with the virtual soldier. But, by and large, they look “cartoonish.”

The real-but-linear offerings of cinematic VR and the interactive-but-cartoonish offerings of VR games is a huge juncture in lineage, culture, technology, style, and content. It’s a sensitive topic and often misunderstood. (By now, many of the cinematic VR readers are thinking “we can do interactive” via branching and many of the VR games readers are thinking “don’t call our high-end animation “cartoons!”) Much has to do with understanding the cultural differences between Hollywood and Silicon Valley.

Clearly, the hybrid region between cinema and games is a place of interest. Importing camera-based imagery into model-based game engines offers truly “new media” opportunities. Our students have recently made some provocative headway here, which we’ll show soon. “Live” and “social” VR / AR also fall in this hybrid region (which we’ll address in the final module).

The takeaway is that understanding the lineages and cultures of cinema and games, above and beyond the actual technology, is important and a strategic advantage moving forward.

III. My Approach

How do you know I’m not a movie?

I once asked this to a class of 10 year olds. I set up a portable screen, stood in front of it asked. Silence. I asked again. Silence. Finally I asked “DO you know I’m not a movie?” and heads nodded vigorously. “OK, so how do you know?”

Replies began. “Your legs are below the screen.” “You’re standing in front of the screen.” “You’re answering our questions.” One kid from the rear: “I can tell you’re not an actor.”

One can liken knowing or not knowing how you know I’m not a movie to a mixing board: If all of the “sliders” that determine sensory inputs are known (such as spatial resolution, temporal resolution, etc.), and if all of the sliders are turned up to “10,” then by definition, the representation will be indistinguishable from the subject.

My approach is to break things down to the most simple and basic elements, starting with the most simple:

Module 1: Audiovisual Resolution & Fidelity

How can you fool one static eye and one static ear, looking at a framed image, to believe the image is real.

Consider an apple. Should be simple, no? (I polled the class and about half said yes and half said no.) Me? Yes, I think this is possible.

Consider a human face, or the kitty. For a variety of reasons (we’ll discuss next week), maybe iffy.

Topics to understand include pixels, color, dynamic range, accommodation, orthoscopy, sound, and motion.

Module 2: Audiovisual Spatiality & Immerson

Next, how can you then fool two static eyes and two static ears, looking at a framed image, to believe the representation is real. More difficult, but there’s a rich history and new technologies associated with stereoscopic imagery and stereophonic sound. Even more difficult, fooling moving eyes and ears, a particularly hot topic today relating to “light field” and “volumetric video.” And even more difficult, fooling the eyes and ears when the scene is frameless, or panoramic.

OK, so now we can understand what it takes to fool the eyes and ears into believing a representation is real.

Module 3: Other Senses (Haptic, Smell, Taste, Mind)

Next, how do we fool our other senses, such as touch and pressure — haptics — and how can we fool our sense of smell and taste. There’s considerable work, and histories, in all of these areas.

Shall we include mind? Is the mind a sensory organ? We’ll discuss what’s out there in 3 weeks.

So now, we understand what it takes to fool all of our senses. But we are a ghost — seeing, hearing, feeling a virtual or augmented world with no affect or control.

Module 4: Input and Interactivity

Next, we ask what are the mechanisms for adding input and interactivity. Topics include input devices and sensors, vision and sound input, UI/UX and the dynamics of interactivity, and issues around control and illusion of control.

So now you can be totally fooled. All of you senses are effected and all of your effectors are sensed.

But you might be in a “dead,” unpopulated world, like a non-networked video game.

Module 5: Live & Social

Finally, we add the elements of liveness and social interaction with other people in a fully immersive, multi-sensory, interactive, virtual or augmented world. Topics include live webcams, teleconferencing, MUDSs, MOOs, and MMORPGs, Second Life, and what’s happening at the tech giants today as we speak.

But we really need to start with a peephole. :)

That’s next week. See you then!

Thanks to Jiayi Wang for the modules sketches, Golan Levin for the mixing board sketch, Charlotte San Juan for editorial coaching, and Anna Greenspan for reviewing.

--

--

Michael Naimark

Michael Naimark has worked in immersive and interactive media for over four decades.