Big Tech’s Race to the Ground (View)

Michael Naimark
11 min readMay 24, 2022

--

Same old (yet fixable) problems.
Bold new game-changing possibilities.
Billions of dollars at stake.

Google Earth “ground-level view” from a 3D model, above, Google Street View, from a camera-based 2D image, below.

Have you ever wondered why Google has both Google Earth and Google Street View? The reason is that Google Earth is, like most video games, a 3D computer model, with unconstrained navigation but often with distortions, or “artifacts,” resulting in a less-than-real look. Street View, like a movie, is a sequence of photographic images with a photorealistic look but constrained navigation, limited to morphing from one frame to the next, often in some very clever ways. Certainly, if Google could combine the best of both of these worlds into a single experience, it would.

But it can’t. Neither can Apple, Facebook/Meta, Microsoft, or any place else, at least to be as uncompromisingly navigable and photorealistic as Google Earth and Street View. There’s been impressive progress making rich, photorealistic 3D models from thousands of harvested photos using machine learning, but the ground-level view shown above is the best we have available today.

Why is the ground-level view so important? Well, it’s where most of us spend most of our lives, and having ground-level 3D Earth models offers some unprecedented opportunities for working, playing, learning, and creative expression. Some are described below, and there are certainly many more. But these are not without challenges, some of which mirror similar challenges already encountered in search engines and social media. Maybe this time we can do better.

Who’s doing what?

In terms of 3D Earth models, Google Earth has been the gold standard since it launched in 2001. Over the past year, Apple has quietly built out Apple Maps with visually rich 3D models of several cities, but it only lets you zoom down to around 100 feet above the ground. Its own version of Street View, called Look Around and barely flaunted so far, surpasses Street View in terms of seamlessness and speed. Microsoft’s Azure Digital Twins is a service which allows customers to create their own digital 3D representations of real-world places, things, and people but so far with no indication of any integrated Earth model. Earlier this year, Epic Games, maker of the powerful Unreal Engine, announced a collaboration with Cesium, a 3D geospatial software company, ostensibly to make its games more real-world and photorealistic.

Facebook/Meta is perhaps taking the most ambitious approach to modeling the ground view. Its vision, called Project Aria and Livemaps, is to build a giant collective 3D Earth model, inside as well as outside, to help you navigate around a city or find missing objects in your home. Project Aria has developed a pair of glasses for data collection. It emphasizes that the glasses are a research tool and not a consumer AR device, and that it wants to understand privacy and policy issues “way before we bring AR glasses into the world.” According to its announcement last fall, Project Aria glasses “will be made available to about 3,000 Meta employees and contractors, as well as external, paid research participants.” Just how ambitious is Project Aria? We don’t know, except we do know that Facebook/Meta’s VR/AR division now employs over 17,000 people, more than 1/5 of the entire company, and up 7,000 in a little more than one year.

Facebook / Meta’s Project Aria video, simulating 3D Earth mapping with dots outside and with tags inside.

Niantic, the Google spinoff that makes Pokémon Go, deserves special mention on several fronts. This month it announced plans to launch its Visual Positioning System for AR devices, a system that will build city-scale 3D models which could place augmented imagery in front and behind actual objects such as city buildings. Noteworthy, the hugely successful Pokémon Go entirely relies (for now) on “mobile AR,” a ubiquitous but non-immersive form of AR which uses direction and position sensors inside a smartphone. Though Niantic hasn’t said for sure, it’s likely that the data to build city-scale models will come from users’ smartphones. Equally noteworthy, Niantic’s CEO John Hanke, who originally headed Google Earth, has called The Metaverse a “dystopian nightmare,” but more on that later.

Niantic’s city-scale Visual Positioning System can place cartoon clouds and flying ships behind actual buildings.

Same old (yet fixable) challenges

How we make and use 3D Earth models is shaping up to have analogous challenges to those with social media and search engines, but with significantly different spins. Perhaps we can get a leg up on confronting and addressing these challenges early on, and perhaps it will help inform these analogous challenges.

Privacy. Despite Facebook/Meta’s stated priority to respect privacy regarding Project Aria, not everyone feels comfortable, especially given various privacy laws and policy relating to public spaces and “The Commons.” Much of privacy challenges boil down to how you feel about “hiding a camera” (or at least not disclosing an operating camera) in public, especially if the camera is being worn on your face. Many people working in the tech community, and indeed, many people working on wearable cameras, believe hidden wearable cameras are inevitable and have a “get over it” attitude, or worse, a “trust the little red light” attitude. Alternative solutions exist.

One solution worth considering? Lens caps.

Lens caps, or some equally physical gesture, physically visible and possibly electro-mechanically controlled. The first big tech company to incorporate something like this will win the kudos of the world.

(I have some unique experience here: many years ago, a art-based research project of mine into “camera zapping” was featured in a New York Times story, and in the ensuing two days, its website received over 600,000 hits and I received several purported death threats. Even now, more than twenty years later, this work comes up #1 on Google searches for camera zapper and I still receive a regular trickle of email from people seeking advice. The lesson: people are very concerned about cameras.)

Interoperability. Another challenge about the race to make ground-level 3D Earth models is interoperability. Much has already been espoused about interoperability and the Metaverse. It mostly focusses on being able to take your paid-for digital asset, such as a sword or hat or avatar, from one company’s 3D virtual world to another company’s virtual world. But interoperability for 3D Earth models is uniquely different because they all share a common coordinate set: latitude, longitude, altitude, and time. Some of the actual places are privately owned and some are not. One of the earliest (and inevitable) examples of creative mobile AR was virtual graffiti. Given the common nature of Earth coordinates, interoperability may best be addressed by a nonprofit umbrella organization, ideally sooner and proactively rather than later.

Snapchat’s AR collaboration with Jeff Koons in NY Central Park tagged by graffiti artist Sebastian Errazuriz.

Ownership. When you take a picture with your smartphone, who owns the spatial metadata of where your camera is pointing? Your image and this metadata, we’re told, can be anonymized and used to build a giant 3D Earth model. Computer vision people will say that the spatial metadata isn’t even needed since good vision algorithms can “find the pose” of your camera based entirely on the contents of the image. Still, your images are part of the Big Dataset that the 3D Earth model builders currently get for free. Image and metadata ownership falls into a similar category as harvesting social media streams and search terms. It needs to be addressed as such.

Bold new game-changing possibilities

Predictably, the race for ground-level views appears to be driven by mobile applications now, like for street directions, and by wearable AR headsets in the future, like for games and information. The possibilities discussed next are less predictable and intended to be more provocative, possibly with very high impact and nicely surprising outcomes.

Having your own VR Camera. If AR headsets need to collect location data via outward-facing cameras, we may all of a sudden realize that we can record or stream what we see. One intriguing design we’ll be seeing more and more is “pass-through” AR, basically a VR headset with two tiny fisheye-like cameras in front matching our eyes. With high enough resolution and wide enough field of view, pass-through AR will be “invisible” and like looking through conventional optical glasses. Because it’s digital rather than optical, pass-through augmented imagery can easily be made to appear opaque using standard video effects. All current optical AR headsets suffer from augmented imagery which appears transparent and “ghost-like.”

Optical AR headset (Magic Leap) and pass-through AR headset (Varjo)

An unintended consequence of pass-through AR may be the ability to stream or record what you see and hear. It could be stereoscopic 3D given the two cameras and stereophonic binaural sound given two mics, and could fill the field of view of any VR or AR headset. The viewer may need to be “slaved” to the producer’s head, though there are some clever ways around that.

Today, not many people own VR cameras. Good ones cost thousands of dollars. What happens if VR cameras become “free?” Would people make first-person POV movies in VR? VR TikTok videos? Live-streaming VR? Two-way live “eyeball” exchanges?

Before the video camcorder, the only way to make home movies was with film, like “super8” film. Film cartridges recorded three minutes, costed $20 plus another $20 for processing at a lab, (overnight at best), and editing was performed by hand using transparent tape. Then came the luggable, black-and-white, reel-to-reel video “Portapak” in the late 1960s for $2,000, followed by an increasingly cheaper and better array of hand-held camcorders for several hundred dollars, eventually followed by 4k video cameras in today’s smartphones, along with easy-to-use digital editing tools.

As the ratio of production tools to consumption tools dropped from thousands-to-one to essentially one-to-one (or lower), a creative explosion resulted. It lead to YouTube, Instagram, and TikTok. It’s tempting to believe that VR and AR content will also explode when everyone with an AR headset has a VR camera for free, and the much-touted “killer titles” will snowball.

VR Virtual Travel. “VR virtual travel” seems to have been on everyone’s VR killer titles list since the beginning. The scenario is that you are at home or school immersed in a real-world travel or tourist destination via a VR headset. You may be there with friends, who may be sitting next to you or elsewhere, wearing their own VR headsets. You may also have some degree of control over navigation and some interaction with guides, animals, and objects.

Michael Naimark, May 2014.

What we’ve seen so far are largely VR extensions of documentary movies, which are linear and not interactive, and VR extensions of 3D models and games, with varying degrees of photorealism and interactivity. You can find curated lists from National Geographic, TravelMag, and Oculus.

Like so much so far with VR and AR, hype has exceeded actuality. Are these VR virtual travel experiences really “Just Like Being There?” Of course not. But can they offer powerful experiences which are more immersive than documentaries and coffee-table books? Yes.

Ground-level views of 3D Earth Models can connect all travel experiences together. You would be a Superman or Superwoman, leaping up and over from one place to another. This sensation is more than a gimmick. People who spend hours “leaping” around Google Earth today report feeling strong visceral body sensations which often remain with them as they go to bed at night, like after a day of swimming or a night of dancing. Google Earth in VR is even more visceral and compelling, even without anything “happening.”

It’s worth mentioning that AR for travel, where travelers and tourists are physically at a destination with AR glasses, or a location-based AR kiosk, is an especially powerful experience, as if we feel the sacredness of places special to us. Years ago, the most-used “for example” in the research community was an AR kiosk, like coin-operated binoculars, in front of the Bastille in Paris where viewers could watch a re-enactment of the Storming. Of course, it’s been realized (2016).

AR-like VR kiosk showing a historical re-enactment, by Timescope, Paris.

Virtual AR. Perhaps the most intriguing application for ground-level 3D Earth models is that they allow all AR to be viewed in VR, using a matching viewpoint in the model.

With a good ground-level 3D Earth model, all AR can be experienced as VR, anywhere.

Take Pokémon Go. Onsite participants currently play using mobile AR on their smartphones, which at some point will be replaced with an AR headset. Other participants can play remotely with their VR headset. Given an accurate, matching 3D Earth model, everything that onsite participants see, home participants see too, and can interact as well. They may even appear as avatars to onsite participants as well.

Will it be “just like being there” for these virtual participants? No. But will it possibly convey a strong visceral sense of place? Maybe. It may even encourage some of them to some day go there, just as high quality coffee-table books of a sacred or special place may inspire an actual pilgrimage.

Even without being physical present, virtual AR is ultimately about engaging with, rather than escaping from, Planet Earth. It is, in principle, in agreement with Niantic CEO John Hanke’s assertion that the Metaverse is a dystopian nightmare, as it is popularly envisioned today.

Billions of Dollars at Stake

VR and AR haven’t gone exactly on-plan so far, and it borders on amusing how many fundamental questions remain unanswered and are the source of confusion and internal disputes. Is less-than-360 degree imagery sometimes adequate? Will people want to wear future AR/VR headsets all of their waking hours? Is a small tethered box to a headset which provides more power worth the affordance? Has the covid pandemic changed the playing field?

These are all billion dollar questions, as is how the “race to the ground” for good 3D Earth models plays out. This race though, has a unique freshness and healthiness. Why? Because it’s about Planet Earth. And that broadens the field of players and possibilities.

Michael Naimark’s recent projects include producing a six-hour intensive presentation series on mediated presence (check availability) (2022); spending four years developing a VR/AR curriculum and directing research into live, online, tele-immersion (and a cheap hack) at NYU Shanghai (2017–21); and producing VR Cinematography Studies as Google’s first-ever VR Resident Artist (2015–2016).

For over four decades, Michael has consulted and directed projects with support from Apple, Disney, Microsoft, Atari, Panavision, Lucasfilm, and Paul Allen’s Interval Research Corporation; and from National Geographic, UNESCO, the Rockefeller Foundation, NY MoMA, the Banff Centre, Ars Electronica, and the Paris Metro.

--

--

Michael Naimark

Michael Naimark has worked in immersive and interactive media for over four decades.