with David Santiano, Deyin Mia Zhang, Xinran Azena Fan, and Yunru Casey Pan, NYU Shanghai; Barak Chamo, New York; Paolo Salvagione, Sausalito.
We’ve developed a cheap and simple way to mount a smartphone directly above a laptop display and dedicate it primarily to “Speaker View” during live teleconferences. Though many potential audiences exist, our target audience are college students who spend hours in classes with a host teacher giving presentations or holding seminars.
We were pleasantly surprised by how well a quick prototype worked. We looked extensively into the world of smartphone clamps, brackets, and mounts, and found nothing optimized for our needs, so we designed our own. It has room for improvement, but if it works for us here at NYU Shanghai, it should work for you elsewhere.
Many of you have access to 3D printers, so we are freely sharing our designs and look forward to your feedback and improvements.
We recognize that many of you expect to be online only, hopefully, through the spring. Thus, this is a MacGyver-style hack designed for rapid deployment. It is informed by a pre-existing 3-year project at NYU Shanghai on immersive one-on-one teleconferencing.
What It Is
It’s a sliding two-part bracket to hold almost any smartphone, with or without a case, positioned directly above the laptop’s camera. The current version for Zoom simply requires both laptop and smartphone to log into the same account, then to minimize Speaker or Matrix Views during Screen Share on the laptop and to swipe to Speaker View on the smartphone screen. The laptop camera remains in use: the smartphone only serves as display.
Our current version is “Rev 8” of a resin-printed 3D model. We’ve also used a 3D filament printer but prefer the strength and heft of resin. For the strap, we’re using dual-lock velcro and exploring stretchable nylon reusable cable ties. Adhering the bracket to the back of the laptop display is tricky. We decided early on to neither use anything opaque (our students are very proud and protective of the expressive “real estate” of their laptop backs) nor magnets (even if any fear is unwarranted). We’re currently using mini-suction cup pads and found that they stick better to some surfaces than others. The option of a reusable clear adhesive film completely solves any problems with minimum evasion.
Why It Works
Ponder for a moment what it’s like to have a face-to-face interaction with someone.
Digitally representing in-person face-to-face interactions has a host of challenges. First, the spatial and temporal resolutions should be high, higher than representing many other things like apples, houses, or mountains, perhaps because of how deeply intricate, nuanced, and emotive human faces can be. (Consider how disconcerting it is to see television news remote interviews between a high-res anchor and a noticeably lower-res interviewee side-by-side.)
Ideally, face representations should be 3D stereoscopic (left and right eye each seeing their respective viewpoints), 3D “multi-view” (different viewpoints as the viewer’s head moves), orthoscopically correct (a fancy way of saying “not too big” and “not too small” but properly scaled), and properly accommodated (viewers’ eyes focus at the same virtual distance of the representation). Between two teleconferencers, there should also be spatial consistency, lighting consistency (even catch-lighting consistency), and of course, eye contact. All of these characteristics are doable — we’ve explored them in our research — but at varying costs, and some matter more than others.
Our hack does not deliver all of these characteristics but noticeably edges ahead for some. For example, even though smartphones are 2D rather than 3D, they typically have double the pixel density than laptop displays. The represented face may not be orthoscopically correct — it’s still too small — but it’s generally larger than what’s displayed on the side of a screen share on a laptop. Also, a face displayed on a smartphone directly above the laptop camera is closer to the camera than a face displayed on the laptop, so eye contact appears closer. There’s also a surprising pleasant effect created when eyes traverse the laptop view and the speaker view with the camera in between, for viewers at both ends.
Perhaps not surprising, separating a face view from a screen share view — often a screen full of text, diagrams, bullets, or code — feels good. It’s like experiencing a face and experiencing a presentation activate different parts of the brain and that displaying them separately minimizes the neurological load.
How We Got There
In spring 2018, we proposed an internal research project at NYU Shanghai called “Telewindow.” The idea was to take a desktop display and surround it with special depth cameras, lights, mics, and speakers, plus some special optics in front — and build two of them and hook them up together and see how close we could come to faithfully replicating an in-person one-on-one interaction. “By the way, we’re going to fail,” I promised our funder, reminding her that companies like Cisco and Microsoft have spent years and a great deal of money on this holy grail. But given new developments in cameras, depth sensing, computational photography, and machine learning, we felt that it would be an exciting space to freely explore.
During the summers of 2018 and 2019, we worked with recent graduates from Interactive Media Arts and Computer Science from NYU Shanghai, Abu Dhabi, and New York.
Then spring 2020 arrived. The covid pandemic was erupting (and alas, we were in China) and it was clear that much of the world would be going online. We decided to focus what we learned on something practical for college students and their online classes since a) they were “us,” and b) we could safely assume most all of them had a laptop and a smartphone.
We initially explored the predictable high-tech paths — computational view synthesis, re-lighting, and gaze correction; streaming volumetric video, multi-camera point clouds merging, and autostereoscopic “glasses-free” 3D; and spatial sound — and tracked some exciting (and some not-so-exciting) new work before trying out our hack, which we initially concluded was uninteresting and trivial. After all, it’s simply dealing with display “real estate.” What’s the big deal?
Until we tried it.
One of the joys of working with emerging media is how counter-intuitive results may be. Things that you’re certain should work don’t and others that seem to have no chance of success do. The only way to know is through creative experimentation. “Exploration over exploitation” (Alison Gopnik). “Art as unsupervised research” (Benjamin Bratton). “Working under the radar” (Red Burns). That artsy, cool-driven research can pivot to focussed, needs-driven solutions and produce novel results is a narrative to be encouraged.
Try it yourself!
We’re grateful for support for Telewindow from NYU’s Technology-Enhanced Education Fund Clay Shirky and NYU Shanghai Library Xiaojing Zu, and especially grateful for a “rapid deployment” Special Research Grant from NYU Shanghai Vice Chancellor Jeffrey Lehman for this project.
Students and recent graduates who have worked with us from NYU Shanghai, Abu Dhabi, and New York include: Ada Zhao, Barak Chamo, Cameron Ballard, Danxiaomeng Ivy Huang, Deyin Mia Zhang, Diana Yaming Xu, Grace Huang, Jingtian Zong, Mateo Juvera Molina, Nathalia Lin, Runxin Bruce Luo, Tianyu Michael Zhang, Xincheng Huang, Xinran Azena Fan, Yufeng Mars Zhao, Yunru Casey Pan, and Zhanghao Chen.
The current team includes students Deyin Mia Zhang and Xinran Azena Fan, recent graduate Yunru Casey Pan, and external contractors artist/designers Barak Chamo and Paolo Salvagione.
Ignazio Moresco, designer and NYU ITP alum, gave our team early inspiration and practical advice.
Marie Sester, a third year online Psychology PhD student, convinced us to revisit this project physically after we dismissed it intellectually.
David Santiano is NYU Shanghai Research Associate for Telepresence.
Michael Naimark is NYU Shanghai Visiting Associate Arts Professor.
Michael and David teach VR/AR Fundamentals going on its fourth year.