The Natural User Interface | Gestures, Forgivability, Learnability
A review of current research and journal articles looking at the Natural User Interface and its learnability / usability across multiple demographics.
Cameron Friedlander
June 3rd. 2010
I called over to my 19-month old son the other day and asked him how big do elephants grow, he stretched his arms out as wide as they could go. I followed this up by having him size up an ant, he brought his hands together and made a tiny gesture as to the size. His first interaction with his environment, was with his hands. He would point to any item he saw fit and gesture until I told him the name. This was well before he started developing any sort of vocabulary. These gestures seemed to be rooted in his learning. The more things I pointed out the more he began to interact with them. It came down to his hands, his main communication platform before his vocabulary started to kick in. The gesture.
When we look at learnability, in HCI, across multiple user interfaces we see a progression. The computer desktop interface (GUI) is far better than the command line (CLI) for the average demographic, but yet it is still built upon these old ideas, principles, and values for when the computer first arrived on the market. This desktop metaphor, as it is frequently called, was there to help usher in generations of new users who were not familiar with this new computing device, the personal computer; a single input entry, the mouse, backed up by design patterns and metaphors to help us feel safe. However as the average user demographic and familiarity change, in a world surrounded by information and computers, there has been a want, a need, to be able to interact with the information differently, naturally. This has ushered in the trend of the Natural User Interface, or the NUI. This conversation is being forced to the table through the sales of the Apple iPad, which is by no means the only NUI device on the market. But why are users adopting this device? Is there anything natural about the NUI? What are the critics saying about gestures? How are gestures integral to human behavior? Is this device naturally learnable and potentially easier to use?
The NUI / gestural interface is in its infancy stage, clearly, but it is apparent that the NUI / tablet is in our near future. If there is any doubt as to how the tablet and the NUI is being embraced by our society look no further than the sales of netbooks. Fortune magazine recently looked at the sales of netbooks from today to one year ago. A year ago netbook sales were on the rise by 641%, whereas now they are at 5%, a staggering decline over a one year period. The article looks at the influence of tablet and its effect over the market across desktops, laptops, and iPods. At CES, in 2008, Stephen Prentice Gartner came out with the following article; Gestural Computing: The End of the Mouse. He discusses the recent trends, at the time, and suggests to all businesses, companies to the do the following: start interacting with gestural devices immediately, whether it’s a Nintendo Wii or an iPhone; immediately suspend all skepticism of gestural computing; and lastly to “relax” this is going to be a “long-term trend.” He summarizes the movement as a paradigm shift, while it won’t outright replace the keyboard, it will eventually. He even looks at enterprise systems and states that the NUI could be integrated within 3-5 years, from 2008. In essence the NUI isn’t going anywhere. But why? Why NUI, gestures?
Dan Saffer, of Designing Gestural Interfaces, set out to create a textbook around gestures since there didn’t appear to be a single source of information on the topic. He used, as a comparison, the art of learning music; done through gestures of the hands. How difficult it would be to learn the cello if one could not use their hands to express the music and notes. To him “gestures create meaning” (Designing Gestural Interfaces, pg xv). However the gestures that are to be integrated into a user interface need to support the motion, mechanics, and limitations of the physical body. He suggests the study of the body for would-be-designers of a NUI to get a better understanding of these constraints and limitations.
The Microsoft Surface Table [2] was an early pioneer into the NUI front, and by no means flawless, but they did take into account careful consideration of the metaphors used to help create a learnable environment that lends itself to discoverability instead of instruction and rote learning. In Using Metaphors to Create a Natural User Interface for Microsoft Surface (2010) Hoffmeester and Wixon explore these metaphors and the usability testing they did to bring these to life. They looked at the idea of a magnet board; this board is known as ‘the hub’; a communication space / bulletin board. The next metaphor was ‘the circle’, this circle is used to literally define the physical space in which a user is able to move within the environment. They enhanced ‘the circle’ by bringing in a metaphor called ‘my personal moon’ this is defined by a users solar system of planets, circles, that users can manipulate contextually; similar items equals orbiting planets. Lastly they looked at ‘fireside chats’, how users can gather around a campfire and participate, tell stories, and explore thoughts. Together they created the magnet board, the circle, the personal moon, and fireside chat to bring their metaphors to life. They set out to design a new interface around these concepts and test them to see which ones would be brought to the surface. They settled on a combination of ‘the circle’, and ‘my personal moon’ to define the metaphor for the concept Sphere; with circles they could show a clear sense of hierarchy that users could recognize. They combined this Spherical functionality with magnets; so properties, applications, and contextual menus would stick to their related Spheres. So what are the basic gestural interactions to use these metaphors?
Saffer opens up his section on design patterns with the following quote (Chapter 3):
"It is the tactile sense that demands the greatest interplay of all the senses."
—Marshall McLuhan, interviewed in Playboy, March 1960
Never underestimate the power of touch. While exploring my own user observation with the iPad I had a young adult who had never used a paint program, like Adobe Photoshop [3]. The end result was that she was able to learn the basic tools of this paint program easily and have fun doing it.. Because she was able to paint with her hands, there was something far more natural about this than using a mouse. Even as she explored the menu looking for an item she would find new functions as she did so. This brought multiple learning’s to this observation; not only was she teaching herself how to use it and having fun, she was learning new metaphors and design concepts for later recall. Saffer’s gestural design metaphors/patterns are basic but they highlight the crux, of the gestural interface. The tap; this pattern can be used in multiple ways and users tend to have little difficulty with its functionality switch, it’s about context. This basic gesture can be parlayed over to my own observations of my 19-month old son, one of his first interactive physical gestures, was to point. When I purchased an Apple iPad for him it was only natural that he wanted to tap it and was easily able to understand this concept. The tap is not only used to open / activate but is also used to select an item. The drag components appears to be the most basic but also the most natural, the tactile idea that one can touch something with their finger and move it. In the YouTube video, A 2.5 Year-Old Has A First Encounter with An iPad (Telstarlogistics, 2010), the young girl easily picks up on this within minutes of her introduction to the iPad (1:40 minute mark). The flicking gesture surprised me only in the sense that I never expected my son to learn this, perhaps accidently (forgivability), without being taught but by combining a quick version of select, drag, and release he noticed that items would fly across the screen; furthering the relationship between the physical world and the interactive one. Simply tapping the item again would halt it, naturally. More complex items that I have not introduced to him but are basic to the gestural devices is the pinch; to shrink, and the spread; to enlarge. While adults appear to not have an issue with this I have yet to test, or show, my own son this concept.
How integral is the gesture to learning and the idea of learnability? The idea, I can only assume, is that gesturing helps people learn. What would happen if we weren’t allowed to use gestures? Would this hinder us? Help us? Cook, Mitchell, Goldin-Meadow, set out to answer that question (Gesturing Makes Learning Last, 2008). They knew that gesturing was a basic part of human behavior through adult observation, noting that even persons with no sight since birth, use their hands to tell stories. In a study focused on children they saw that kids who were allowed to gesture during the learning phase did significantly better than kids who were not allowed to. In later recalls the children who were allowed and encouraged to use gestures were able maintain their learning, 4 weeks later, 85% of the time, while kids who could not use gestures faired much worse, at 33%. The data suggested that gestures supported knowledge change in casual learning however they could not say how it did so, but this was not the point. The point simply was to see the impact, short-term and long-term, on learning with and without gestures.
However are gestures easily learnable, in the sense that there must be a common set, and or environment in-which a child grows up in order to know what to do? Does extending a person’s arms out mimic flying, is this culturally relevant on a global scale? In the paper “Children’s Intuitive Gestures in Vision Based Action Games” they set out to create a fully immersed 3D gestural environment in which a child could interact. They wanted to test how easily children could pick up on gestures to complete the game. They found that be using subtle animation to mimic the gestures they were looking for within the gaming environment that the children were able to inherently pick up on this without having to run through a tutorial. So if they needed to have a bird to fly, the act of flapping its wings directed the children in the right direction if they didn’t know how. The other fascinating result within this game testing environment was that the avatar the children were controlling didn’t always have to mimic the exact gestural movement, if the bird (QuiQui as it was called) needed to swim children opted to use different styles of swimming to do so, it made no different that the bird was actual swimming sideways on the screen as they ran through doggy-paddle, breast stroke, and crawl. The children sought forgivablity within the interface itself, as mentioned earlier. An item for further exploration would be the connection between learnability and a forgivable interface.
But what of the single-point and click of the traditional desktop, how do children fair using common gestures, such as drag-and-drop, with these traditional means? A spelling game, on the Apple iPad [1], encourages kids to take their finger and place it on a particular letter for a word and drag it over to the correct placement. In my own child observation with this game he was able to quickly figure out how to move the letters around, and toss letters across the screen be flicking, instead of dragging. In a study of kindergartners and first graders, by Afke Donker and Pieter Reitsma (Drag-and-drop errors in young children’s use of the mouse, 2005), they set out to determine why in traditional desktop children’s educational software the motion of “drag-and-drop” had been all but abandoned. The initial hypothesis was that it was too hard for a child to maintain pressure on the button of a mouse and move it over a particular distance, so perhaps they could design a device that would make this easier. However the research showed that the act of selection and dropping were the areas that had the least amount of forgiveness; much like a fork when a young child/toddler is eating, it is harder for them to stab a particular item than it is to pick it up outright. So while the initial hypotheses was disproved the problems persisted. The mouse, as a device, that can easily be digested and understood by children appears to be an issue, but what is a child could actually touch the item, as in the iPad game?
Looking at the opposite end of the spectrum how do the older generations fare with a NUI device, are they able to comprehend the design metaphors behind the gestures easily? In the observational YouTube video, Virginia's new iPad (Sackr, 2010), a 100 year old begins to read books and write again using the iPad, the NUI doesn’t seem to be an issue at all. Does this learnable forgivable interface actually help her out? In 2006 a group of researchers / software engineers evaluated a group of elderly and teens using a new surface tool for photograph manipulation (Apted, Kay, Quigly, Tabletop Sharing of Digital Photographs for the Elderly). They did the study in multiple parts, sometimes explaining the new interface or by giving them a set of one-sheet of instructions. The overall results highlighted that the teens completed the tasks quicker but that all of the elderly evaluated were able to complete the tasks with limited frustration. This new device was a custom solution, not a Microsoft Surface table, nor an Apple product, so there appeared to be certain flaws in some the new logic that they were creating. They did note that two-handed gestures appeared to be harder for the elderly to handle, but it was not disclosed whether this was a flaw of the design of the observation or an issue that persisted universally across all gestures. The main takeaway was that even with this somewhat flawed gestural devices the participants were all able to complete the task and recall it.
Does the NUI device have a natural learnability? A common complaint from the Nielsen Norman Group (NNG) is the lack of visual cues and menus (Norman and Neilson, Gestural Interfaces a Step Backwards in Usability). However a recent posting sensation on YouTube, approaching 1 million views, showcases a 2 ½ year old using a tablet for the first time. It can be said that she appears to master the basic gestures of the interface as well as various applications within the first 5 minutes (A 2.5 Year-Old Has A First Encounter with An iPad, Telstarlogistics). As observed with my 19 month old who, with no instruction, was able to master drag, swipe, and flick. With a tad bit of further instruction he has gotten the hang of basic drag and drop functionality. There are issues and inherent flaws within the Apple iPad NUI design and it’s no wonder, because it’s so young. Unlike the Microsoft Surface table that allows multiple touch points the Apple products only allow one action / touch at a time. Having observed with my own son, as well as the YouTube video (A 2.5 Year-Old Has A First Encounter with An iPad, Telstarlogistics ) that kids like to ideally rest one hand on the screen as they proceed to do an action / gesture with the other hand. This flaw can sometimes lead to frustration on the child’s part as they struggle to comprehend why an action that used to work no longer does, note that within a week he was resting the tablet on the ground to limit this unwanted feedback.
So why is NNG so against the gestural interface, with its learnability, ageless adoption, and trending market? Nielsen’s take appears to be more focused on the flaws of the iPad vs. a gesturing system. In iPad Usability: First Findings From User Testing (May 2010) he looks at the inconsistent design features on various applications; pressing a photo on one app might do different things vs. another. Can we not say the same is true for the webspace metaphors as well? Nielsen went into more details about how the tab bar at the bottom of the screen will, of course, be blocked by your arm when using the iPad. Saffer even states within Designing Gestural Interfaces that this is an obvious no-no for contextually related menu items; this is not the functionality of the iPad tab bar however. Nielsen also lists out other flaws such as its forgivability, various gestures can be done in multiple ways to complete the same action, or lead to unintended feedback. As mentioned in Children’s Intuitive Gestures in Vision Based Action Games a child could swim but there are many ways to actually swim, can this be a benefit; to make it easily adoptable to younger and older demographics, by not having to do things exactly one way (with no forgiveness or strict constraints) enhance its learnability? When observing my son I noticed that his swiping gesture would take on different manifestations as he processed it; long hard swipe or a tap, flick in a particular direction, but it always had the same result; success. As my son became better at learning the gestures he would hit it more dead on but by experimenting and learning, through success, he never got to a moment of complete frustration, so much so that he would stop using it. This is the crux of learnability within a gestural environment. Teaching an elderly adult to use a desktop interface is quite chore, to get them to come back to it is almost an impossible feat. Yet here is Virginia, 100 years old, writing limericks and reading books on her iPad. What’s even more interesting is that Nielsen doesn’t even explore the positive aspects of the device, he merely dismisses it outright due to its flawed architecture. Donald Norman goes so far as to identify a crisis within the HCI community and generalizes towards all gestural interfaces. Why is this?
Norman’s take in Natural User Interfaces are not Natural is more high-level. He makes various comparisons to instances where the gestures do not follow nature directly. In one example he highlights the Nintendo Wii bowling game, with the often, natural, side effect of releasing the controller, as if it were a ball, when playing. He states that although he sees a place for gestures it is neither better nor worse than the keyboard and mouse. His argument is that they are both unnatural, therefore the gestures should follow the convention of the keyboard and mouse. Really? Is the gesture not part of how we learn, as discussed in Gesturing Makes Learning Last by Cook, Mitchell, Goldin-Meadow? Where exactly does the mouse fit in to all of this, what is the comparable metaphor? Norman does have respect for the gestures but in a later article, to be published in Interactions Magazine he gets bolder: “Gestural Interfaces: A Step Backwards in Usability.” He starts this article off with the following words “One step forward, two steps back.” The implication is that we are now worse off with the trending of gesturing devices, on May 28th, 2010, at around 12pm central @NNgroup started tweeting this. His concerns are as follows, again attacking gestures through the usability of iPad, as if this is the sole gesturing device (see Nielsen’s iPad Usability: First Findings From User Testing): No set of established guidelines, he then calls out Apple and Google for ignoring these established conventions, and he attacks the developer community of these gestural applications for ignoring the said rules of conventional HCI. Is there not an inherent conflict of interest here? Can we, should we listen to the calls of Neilson and Norman seeing as to how invested they are in the GUI? Is the NUI not innovative and emerging? Where is the root of innovation? Is it in the universal design principle?
In a 2009 BBC interview (Cellan-Jones , Rory. “Listening to Mr. iPhone.”) with the iPhone designer, Jonathan Ive, he stated that they are solely about the product and innovation. If you spend too much time defining everything you end up with nothing. “If you have to spend time institutionalizing that, talking about it, you end up chasing your tail.” How do they maintain to be so customer centric with their devices? “’We don't do focus groups,’ he said firmly, explaining that they resulted in bland products designed not to offend anyone.” The article itself quotes Henry Ford who said that if he asked people what they would want they would have said “a faster horse.” In essence the iPad is in its infancy, it’s a child we will look back on and learn from. NNG should not be bashing the early pioneer into this foray of new interface design. It’s not as if Apple is without rules when it comes to the NUI. In fact they are very protective of it, so much so that they are refusing to put Adobe Flash into their devices because it lead to usability chaos. A gesture that means one thing can be overwritten to mean another, Steve Jobs talks to this in his open letter to Adobe (Thoughts on Flash, 2010), where he discusses how the gestures and meanings behind them can be over-written within the Flash framework, creating a bad user experience. It’s a start.
So what does the future of gestural computing look like? We can understand the learnability of it across multiple demographics and in various instances from education, to gaming, to browsing, reading and other everyday tasks. Sales of these devices are increasing tremendously, putting a halt to other devices such as the netbook; a miniature desktop PC utilizing the existing point-click desktop metaphor that seems even more flawed on a tiny device. There are clearly some forefathers of usability that take issue with this new interface, but at the same time there does seem to be a respect for it. Recent technology developments in 3D, particularly Augmented Reality may in fact bring on the next revolution or paradigm shift in the interface design. Augmented reality (AR) allows users to, literally augment, their physical reality in various ways using a device such as a phone or webcam. Either by manipulating elements that do not exist within their computer webcam or by helping users navigate their reality through a phone by receiving contextual information through GPS and their video player to display it. Petersen, Nils, and Stricker, Didier, in Continuous Natural User Interface: Reducing the Gap Between Real and Digital World, look to bring gestures and augmented reality together to create the next generation of interfaces, the continuous natural user interface (CNUI). The issue with augmented reality is that the user needs to break the paradigm in order to interact with reality vs. AR; using real-world gestures within an augmented reality environment one would need gloves and other devices. Their proposed experimental system takes real-world artifacts, including gestures, and is able to bring them into the AR experience. Essentially if you can take items from the physical space and bring them into the AR space without stepping out of context, and switching paradigms, from NUI to desktop to scan a document; we would be one step closer to a literal augmented reality. So by employing all the same natural gestures of the physical-world and allowing items to be scanned on the fly through what is known as “The Hub” they are able to transfer elements from one space to the next, as a user you are able to manipulate them the same way as you would, say a piece of paper in the real-world. Henderson and Feiner look to do a similar things within the AR space (Tabletop Opportunistic controls: leveraging natural affordances as tangible user interfaces for augmented reality, 2010) by focusing on similarities with how we manipulate real-world objects that will be augmented with contextual information. In this particular study they looked at motors, that have various needs and affordances that need to pay their way into the AR space. Item examples would be dials, knobs, screws and such that the user would be able to manipulate within the virtual AR space and receive contextual aid information as they do so. They call these opportunistic controls (OC) and look to use them across multiple verticals and categories. So as the NUI begins its journey to becoming the dominant interface people are already thinking about the next future step of such an interface, augmented reality. It is not hard to imagine a future where we travel through the physical space with augmented support, using both literal gestures to manipulate the physical space, backed up by virtual gestures (NUI) to manipulate objects, information, or affordances in the virtual space (AR). The future is coming Nielsen and Norman; are you with us?
[1] FirstWords Animals <http://itunes.apple.com/us/app/firstwords-animals/id294536447?mt=8 >
[2] Microsoft Surface <http://www.microsoft.com/surface>
[3] Learnability and the NUI / iPad <http://posterous.nullintovoid.com/user-observation-learnability-and-the-nui-ipa>
Petersen, Nils, and Stricker, Didier. “Continuous Natural User Interface: Reducing the Gap between Real and Digital World.” Proceedings of the 2009 8th IEEE International Symposium on Mixed and Augmented Reality. Oct 2009
Cook, Susan Wagner, and Mitchell, Zachary, and Goldin-Meadow, Susan, “Gesturing Makes Learning Last.” University of Rochester, Department of Brain and Cognitive Sciences. Cognition, Vol 106(2), Feb, 2008. pp. 1047-1058.
Norman, Donald A. “The Way I See It: Natural User Interfaces Are Not Natural.” Interactions. May 2010
Apted, Trent, and Kay, Judy, and Quigley, Aaron. “Tabletop Sharing of Digital Photographs for the Elderly.” Proceedings of the SIGCHI conference on Human Factors in computing systems. Apr 2006
Henderson, Steve J., and Feiner, Steven. “Tabletop Opportunistic controls: leveraging natural affordances as tangible user interfaces for augmented reality.” Proceedings of the 2008 ACM symposium on Virtual reality software and technology. Oct 2008
Höysniemi, Johanna, and Hämäläinen, Perttu, and Turkki, Laura, and Rouvi, Teppo. “Children’s intuitive gestures in vision-based action games." Communications of the ACM Volume 48, Issue 1 (January 2005) Pages: 44 - 50.
Hofmeester, Kay, and Wixon, Dennis. “Using metaphors to create a natural user interface for Microsoft Surface.” Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems. April 2010
Saffer, Dan. Designing Gestural Interfaces. Sebastopol, CA. O’Reilly Media, Inc., 2009.
Prentice, Stephen. “Gestural Computing: The End of the Mouse." Gartner Feb. 2008
Donker, Afke, and Reitsma, Pieter. “Drag-and-drop errors in young children’s use of the mouse.” PI Research. 26 July 2005
Cellan-Jones , Rory. “Listening to Mr. iPhone.” BBC News. 1 July 2009. 31 May 2010 <http://www.bbc.co.uk/blogs/technology/2009/07/listening_to_mr_iphone.html>
Nielsen, Jakob. “iPad Usability: First Findings From User Testing.” Useit.com Alertbox, 10 May 2010. 20 May 2010 <http://www.useit.com/alertbox/ipad.html>
Norman, Donald, and Nielsen, Jakob. “GESTURAL INTERFACES: A STEP BACKWARDS IN USABILITY” Nielsen Norman Group. 30 May 2010 <http://jnd.org/dn.mss/gestural_interfaces_a_step_backwards_in_usability_1.html>
Jobs, Steve. “Thoughts on Flash” Apple. April 2010. 31 May 2010 < http://www.apple.com/hotnews/thoughts-on-flash/>
Beecher, Fred. “Usabilility Ain’t Everything – A Response to Jakob Nielsen’s iPad Usability Study” It’s All About Interaction. 26 May 2010, 30 May 2010 <http://johnnyholland.org/2010/05/26/usability-ain%E2%80%99t-everything-a-resp...>
Buckleitner, Warren “Is the iPad a Child’s Best Friend” The New York Times: Gadgetwise. 30 Jan 2010. 30 May 2010 <http://gadgetwise.blogs.nytimes.com/2010/01/30/is-the-ipad-a-kids-best-friend/>
Elmer-DeWitt, Philip “How the iPad gobbles up netbook sales” CNN Money: Fortune Magazine. 6 May 2010. 31 May 2010 <http://tech.fortune.cnn.com/2010/05/06/how-the-ipad-gobbles-up-netbook-sales/>
"Virginia's new iPad." Video. YouTube. Sackr. 7 April 2010. 30 May 2010 <http://www.youtube.com/watch?v=ndkIP7ec3O8>.
"A 2.5 Year-Old Has a First Encounter with An iPad." Video. YouTube. telstarlogistics. 5 April 2010. 30 May 2010 <http://www.youtube.com/watch?v=pT4EbM7dCMs>.