Turning Research into Practice: Characteristics of Display-Based Interaction

Marita Franzke*

Institute of Cognitive Science
University of Colorado, Boulder, CO 80304

*The author can now be reached at:
U S WEST Technologies,
4001 Discovery Drive, Boulder, CO 80303
e-mail: mfranzk@advtech.uswest.com

Abstract

This research investigates how several characteristics of display-based systems support or hinder the exploration and retention of the functions needed to perform tasks in a new application. In particular it is shown how the combination of the type of interface action, the number of interaction objects presented on the screen, and the quality of the label associated with these objects interact in supporting discovery and retention of the functionality embedded in those systems. An experiment is reported which provides empirical evidence for Polson & Lewis's CE+ theory of exploratory learning of computer systems [11]. It also extends this theory and therefore leads to a refinement of the cognitive walkthrough procedure that was derived from it. The study uses an experimental method that combines observations from realistically complex task scenarios with a detailed analysis of the observed performance.

Keywords:

exploration, retention, display-based systems, direct manipulation, cognitive theory, cognitive walkthrough, experimental method.

Introduction

Researchers agree today that learning by guided exploration, is not only a preferred mode of knowledge acquisition, but also one of the more successful ones [3,13]. The study reported here builds on this assumption and investigates the process of task-oriented exploration at a small grain size. It asks how design decisions embedded in commercially available display-based systems assist in exploratory search, how exploratory performance is related to forgetting, and how systems could be designed to support exploration and retention better.

Characteristics of discretionary use of software

At present, a growing number of users of new software are computer-literate. These users will have used a variety of computer applications before encountering a new one, and are therefore more likely to attempt learning new functionality by exploration rather than by reading manuals and tutorials [13,14]. These discretionary users of systems may also use systems sporadically, since many applications support very specialized functions that are not needed every day. Graphing applications are one example of these specialized systems. If applications are only used occasionally, retention of once discovered functionality comes to be a usability issue. Hence, two activities, discovery and retention of functionality, are two important characteristics of discretionary use of software.

This means that design of application software should support exploratory activities of computer literates, even for applications that are more complex than the often cited example for walk-up-and-use systems, ATM-machines. Secondly, interfaces should provide ample cues to casual users, so that once-discovered functionality can be easily remembered or rediscovered after a longer interval of non- use.

Graphical user interfaces (GUI's), or display-based systems, in our terminology, were once hailed to solve these two problems of explorability and retainability simply by way of displaying relevant information to the user [e.g. 15]. Early empirical work, however, showed that exploration of display-based systems may be difficult, if not impossible, at least for novice users [2]. A cognitive model of explanatory behavior with computer systems will help to explain why these computer users failed to discover important functionality.

Exploration

Polson & Lewis's [11, 12] CE+ addresses the problem of explorability of interfaces in the framework of a theory of problem solving. It is assumed that task-centered exploration of a new interface is guided by general task goals, such as "make a new graph from the data in spreadsheet". Having formed such a goal, users will search the interface for an object that promises progress towards that goal. This search is guided by label-goal matches. If an interface object (such as a menu item, a button, or an icon) displays a label that matches the goal (for example, 'graph', or 'chart'), users will select it, given that they know how to take an action on this particular object. After executing an action, users evaluate the system feedback. If they conclude that they made progress towards their goal, the search-action cycle continues, until they have completed their goal in this manner.

There is empirical evidence for both, the importance of well-matched labels [e.g., 4], as well as the importance of differentiating and recognizing interface objects, and knowing which actions are available on them [2]. The CE+ theory identifies four critical points at which the exploratory search can fail: (1) Users can form an inadequate goal, (2) they might not find the correct interface object (because of poor label match), (3) users may not know how to execute an action on the relevant interface object, and (4) they may receive inappropriate and misleading feedback. The experiment reported here focuses on points two and three. It asks whether the discovery of the appropriate interface object is also dependent on the number of distracting interface objects, and on the type of action that will be necessary. In particular, its aim is to identify whether the goodness of the label, the type of action, and the number of interface objects interact. The result of the experiment should lead to a refinement of the model of the search process.

Retention

Several empirical studies have shown that recall of important parts of display-based interfaces is poor even for frequent users of these system, who can demonstrate virtually flawless performance in using these same features in the context of the application [9, 10]. Theoretical accounts of display-based interaction are therefore based on the assumption that recognition rather than recall of commands and interface objects drives the interaction [e.g., 7, 8, 10]. The experiment described here compares performance at a short and a longer retention interval. This will answer the question of whether display-based computer skill is indeed robust against forgetting. It also concerns itself with the question of whether the particular design features listed above (label match, number of interface objects, and type of action) will have an effect on the retainability of particular interactions.

Turning research into practice I: the cognitive walkthrough

The cognitive theory of Polson and Lewis [11] discussed above has previously been extended into a method for the evaluation of walk-up-and-use interfaces, the cognitive walkthrough [12, 17]. This method helps interface designers and engineers to evaluate a system or system specification by decomposing various user goals into chains of interface actions (such as menu selections, button presses, etc.). For each goal and for each action in accomplishment of a goal, the evaluator considers a series of criteria, derived from the four-step cognitive model: (1) whether the user will have trouble forming an appropriate task goal, (2) whether an action is clearly available and whether the label associated with the action matches the users' representation of the task goal, (3) whether the user will know how to execute an action, and (4) whether the system feedback is clearly interpretable.

The method has been introduced into industrial use and has received various criticisms in the literature [e.g. 16,17]. Most of these reviews center on procedural aspects of the method which have been taken into account in newer versions of the cognitive walkthrough [17]. Some other criticisms concern the methods' ability to detect a broad range of usability problems. One of the points brought forward [16] is that the method is too narrowly focused on identifying linguistic difficulties, namely mismatches between the users' (linguistic) representation of a task goal, and the (linguistically expressed) label of the goal. The method gives little guidance in determining whether a graphical layout of the screen design is more or less conducive to finding an important object. Furthermore, it is difficult to estimate whether a particular interface action (a button press, for example) will be known to a group of users. This is especially problematic when the new application is for a large and relatively diverse user-group. The experimental results, reported in this paper, by extending CE+, can also be used to add more specific evaluation criteria to the walkthrough procedure. In particular, the results will inform us about how the number and grouping of interface objects affects search, and will provide us with information about the relative difficulty of a range of different interface actions. The results of the retention trials will also show whether difficulties identified with the walkthrough procedure will only be problematic during exploration or also for application after longer time delays. This information will be integrated into the walkthrough in the discussion section.

Overview of the study

In the experiment, familiar users of Macintosh systems learned a new application, one of four graphing systems (see below). They were asked to create a graph and do several modifications to the default graph that the system brought up. The subjects participated in two of these trials, each time with different data and superficially different modifications. The first of these trials was the exploration trial, here the subjects had to discover the necessary functions on their own. The second trial (experienced performance) was administered either after a short (a few minutes) or a long (one week) retention interval. Comparison between these two delay conditions allowed for an observation of the effect of forgetting on performance.

METHOD

Subjects

Thirty-three males and forty-three females participated in this experiment. The subjects had an average of 2.8 years of Macintosh experience, and were familiar with an average of 3 different Macintosh applications. The majority of subjects (72%) had additional experience with PC's. On the average, subjects had 1.6 years experience with PC's, and knew 1.8 PC applications. None of the subjects had used any graphing applications before. The age range was 15 to 44 years, with an average of 25 years. Subjects were paid $15 for participation. The data of four subjects were excluded from the analysis, because of failures to complete the task in the exploration phase, for a total of 72 subjects.

Design and Materials

Subjects were randomly assigned to one of eight experimental groups (four interfaces by two delay conditions). The interfaces were CGI , CGIII , EXC tool , and EXC menu . Subjects were assigned to one interface and used it throughout the whole experiment. Half of the subjects performed the second (experienced) trial after a short delay (approximately ten minutes), and the other half after a long (one week) delay. Subjects worked on a Apple Macintosh II cx with a 13'' color monitor, set to black and white. The screen interactions were videotaped over the subjects' shoulders and an audio-track was recorded.

Task and Procedure

Tasks: Subjects were provided with a HyperCard stack of instructions for both tasks. They were told which subgoals to complete in which order. Specifically, they had to (1) create a line graph from data in a file provided to them. Then they received a sample graph and instructions on what formatting changes to perform to match the default graph to the sample graph. Subjects were instructed to (2) move the legend to a different location, (3) change the font size of the legend text, (4) change the line and symbol style of the plotted graph, (5) change the font and style of graph title, (6/7) change the font and style of both axes, and (8/9) edit the title and x-axis title content. The instructions provided subjects with detailed information on what to do, but no hints on how to accomplish it. The tasks in the exploratory and experienced session were isomorphic, but subjects were provided with different sets of data and different sample graphs in both cases. The instructions, and sample and default graphs were identical in all interface conditions. The instructions were provided in a HyperCard stack, which overlapped with the application window. If the subjects wanted to read the instructions, they had to click on the stack to bring it to the front (see FIGURE 1). This procedure allowed us to account for the time subjects spent in reading the instructions.

FIGURE 1 Example of instruction card.

Procedure: On arrival, subjects filled out a brief questionnaire about their computing background. After this they completed a simple editing task in which they were warmed up to the window switching procedure involving the instructions. They then started the first graphing task. The experimenter stayed in the room with the subject during these tasks and provided brief, action-oriented hints if subjects had not made any progress toward the next correct action for more than two minutes. After completion, half of the subjects received another editing task (as distracter), and the second graphing task (the experienced trial). The other half of the subjects did these two tasks (the second editing and graphing tasks) one week later.

For each correct action step in fulfillment of the tasks we will report on the action time (time to find the correct action - time to view instructions) and number of hints needed. The results are reported in two sections: First, global results that concern overall performance measures are summarized. Second, results from the detailed analyses are reported. In that section a description of the coding of the design parameters and their effect on the subjects' performance is given. A more complete set of analyses can be found in [6].

RESULTS

Global Results

Effects of Training and Delay: Mixed two-factorial MANOVA's (trial, repeated; condition, between-subjects) with a covariate controlling for Macintosh experience were performed on action times and the number of hints. See FIGURE 2 for an illustration of the group means underlying these analyses.

There was a main effect of trial for action times (F (1,59) = 95.83, p.<.01) and for hints (F(1,54) = 63.76, p.<.01). Subjects' performance showed a sharp improvement between the first and the second trial across interfaces and delay conditions (see FIGURE 2). Overall, subjects were able to cut their action times in half, from an average of about fifteen to an average of about seven minutes for the whole task. This results shows that interface literate users were able to discover functionality in a new system in a reasonable amount of time, without extensive use of external help, and were able to use the discovered methods efficiently in a second trial.

There were no significant main effects for delay for action times (F(1, 59) = .24, p. > .05), nor for hints (F(1, 54) = 2.21, p. > .05). However, the interaction between trial and condition was significant for action times (F(1, 59) = 4.64, p. < .05). This result shows that the overall performance time decrease was influenced by the delay. The performance time decrease was smaller when the second trail happened after a one week delay (see FIGURE 2).

There was no significant interaction between trial and delay condition for the number of hints (F(1, 54) = .15, p. > .10). These results indicate that while learning effects are strong, as can be expected on the background of theories of skill acquisition [e.g., 1], forgetting plays a surprisingly little role for performance with display-based systems. If users have not used a new system for a longer time period, they need more time to perform the same tasks. However, the observed forgetting effects are relatively small when compared to the large savings between the first and second trial and their significance could be debated on practical grounds. In a previous study that investigated performance on a transfer task, we had found that forgetting may play a role only in complex interactions, but not in simple ones [5]. The analyses reported below investigate the exact locus of the delay effects, and try to relate the difficulties in exploration and the observed forgetting effects to several design parameters.

Overall performance analyses also showed significant differences between the four systems, but these differences disappeared, when the number of action-steps for the task were controlled for [6].

FIGURE 2. Effect of delay on second task. Long = one week delay, short = ten minutes delay between first and second task.

Detailed Analyses and Results

If the current task artifact analysis is to inform further designs of display-based applications, we need to refine our level of analysis. For any type of design recommendations we need to know which type of interactions were difficult to discover during the exploration phase (first task), and which interactions were responsible for the forgetting effects observed in the global results.

Analysis by Subgoals: To answer these questions, the analysis was first taken to the level of subgoals. For this level of analysis we will only report results for the action times to reduce complexity in the presentation.

Simple ANOVA's on action times associated with trial 1 (exploration) and on differences between the delay conditions on trial 2 show significant effects due to subgoals, F(9,630) = 49.00, p. <.01) for exploration times and F(9,621) = 3.41, p. <.01) for the differences between delay conditions. For an illustration of this effect see

FIGURE 3. Separate analyses of variance were performed on the overall action times associated with each subgoal, testing whether there were differences between the two delay conditions. The arrows in FIGURE 3 point to the subgoals where delay differences were statistically as well as practically significant, F(1,70) = 13.27, p. <.01 for 'create graph', F(1,70) = 11.15, p. <.01 for 'change legend', and F(1,70) = 10.95, p. <.01 for 'edit title 1'.

FIGURE 3. Action times during task 1 (exploration) and task 2.

Arrows indicate significant delay effects. An inspection of the graph in FIGURE 3 illustrates that these subgoals were also associated with particularly long exploration times. All three subgoals introduced situations in which subjects had to discover and learn a completely new method. They had never created a graph before, they had to acquire a general method for modifying objects in the graph (change legend) and they had to discover another method for editing text associated with graph objects. Along these lines we suggest that long exploration times and some forgetting effects due to long retention delays may appear in situations where completely new methods need to be discovered. In all other subgoals where transfer of old (move) or newly learned methods (e.g. change title font) were possible, exploration times were lower, and no forgetting due to the delay appeared. One exception to this rule appears to be the subgoal 'changing line and symbol style', where performance on the second task was very poor for both conditions.

Analysis by Action Steps: To determine exploration and retention difficulties further, the analysis was finally taken down to the level of individual action steps. For this, we included individual action steps that comprised the three subgoals of interest in the regression analyses described below. FIGURE 4(a/b) provides an example for the level of detail of this analysis. FIGUREs 4a displays the first action step for the subgoal 'create graph' for Cricket Graph III: select menu-bar item 'graph'. FIGURE 4b shows the second action-step: select menu-bar item 'new graph'. For each action step (e.g. 'click on menu-bar item 'graph'', FIGURE 4a) three variables were encoded, (1) the type of action, (2) the semantic distance between goal and label, and (3) the number of objects competing for attention. For the type of action we recorded what type of interaction needed to be performed by the subject (e.g.: menu bar selection, button click, move operation, tool selection, etc.).

FIGURE 4. First two action-steps to create a graph in CGIII.

For the semantic distance it was assumed that subjects represented their active goal in terms of the task description. For example, if the assumed goal was to 'create a graph', then a menu item with the label graph' was defined as semantic difference of 0, a menu item with the label 'chart' as a value of 1 (for synonym), the label 'drawing tools' as a value of 2 (semantically related, but inference required), the label 'file' received a value of 3 (no direct semantic link, connection has to be learned). Finally, for the objects competing for attention, each object in the relevant object group was counted, as well as the number of competing object groups (e.g. for FIGURE 4a: 0 competing menu items (greyed out) + 1 for the menu bar + 1 column labels in spreadsheet + 1 for spreadsheet entries = 3). Exploration and Experienced Performance. The design parameters derived above were used in three sets of simple and multiple regressions, to explore their effects on the action times. Details on these analyses and their results are reported in [6]. Here, we will focus on the illustration and discussion of the significant effects.

FIGUREs 5-9 demonstrate that the three coded design parameters indeed had the predicted effects on subjects' action time performance. FIGURE 5 shows, that there was an effect of label quality, so that action times were longer with a larger semantic difference between the labeling of an object and the subjects' representation of the goal.

FIGURE 5. Time per action step by label quality and trial.

FIGURE 6. Time per action by number of objects and trial.

Furthermore, there was a three-way interaction between these two design parameters (number of objects and semantic difference) and trial, so that search time during trial one due to semantic difference was elevated additionally if there were many objects to search (FIGURE 7). If the semantic differences between labels and goals were small, there is no effect of the number of objects on display, and in fact, the action times during trial one did not seem to be very different from the action times during trial two.

There was also a significant effect of the type of interactions on action times that varied with trial. During trial one (exploration) some interactions are especially difficult to discover; FIGURE 8 displays the types of interactions in order of difficulty. Some of the more difficult interactions were dragging and dropping of items, clicking on tool icons, or double- and single-clicking on objects.

FIGURE 7. Time per action by number of objects, quality of label and trial.

On the low end of the action time spectrum were selections of buttons, selections of menu items and of lists in dialog boxes. None of the more difficult interactions were labeled in any meaningful way. The easier interactions however, were associated with labels. It seems that in searching a new application for methods, subjects oriented themselves towards reading of labels, rather than blindly trying direct manipulation operations on objects or icons. The order of difficulty of these interaction types also explains the strong correlation between semantic distance and interaction type that was reported in the beginning of this section.

FIGURE 8. Time per action by type of interaction and trial.

Finally, we observed a triple interaction of type of interaction, the number of objects on display, and trial. If there were many objects to search, action times during trial one were inflated especially for interactions that were difficult to discover. FIGURE 9 displays this effect for several interaction types. Unfortunately the combinatorial set of all action types with all numbers of display objects was not complete in our sample.

FIGURE 9. Type of interaction by number of objects and trial.

Exploration Summary: We found that the three coded design parameters, semantic difference, number of objects on display, and type of interaction, had effects on action times. Action times during the exploration trial were affected more strongly than times during later performance (trial two). Once subjects had acquired a new action they seemed to be able to use this knowledge immediately in further interactions. Additionally, there were interactions between some of the design parameters. The pattern of these interactions suggests that subjects first search interface objects with labels, and only consider unlabeled choices (e.g. iconic tools) later. Second, well-labeled objects will be considered and found faster, and the number of objects in the display does not seem to affect search times for such situations. If the relevant object is labeled poorly, more items will be considered in depth, so that the number of objects to search will elevate the action times even further.

Delay Performance. In the analyses by subgoals we noted that performance on subgoals where new methods had to be discovered was more prone to performance decrease due to the delay manipulation. The following analyses investigate these effects at a lower level of detail. Only the actions from subgoals (3) create graph, (4) change legend, and (9) edit title 1, from trial two were included in these analyses. As with exploration performance, we found strong effects of all three parameters on the differences between the two delay conditions. The same factors that lead to poor performance during search also lead to poorer performance after a week delay between the exploration and second session. Subjects were more prone to forgetting when they had to use direct manipulation techniques, when the action was poorly labeled, and when there were many objects to search. In no case was this performance close to the long search times during exploration, however. In other words, the more correct interpretation of the display depends on retrieval of specifically learned information from memory (learned meanings of labels, learned interaction techniques), the stronger the influence of forgetting on performance.

Overall Summary: We were able to show that exploration and delay performance varied by subgoal, and the results suggest that longer exploration times and poorer delay performance are both associated with subgoals where new methods have to be learned. Analysis of action times and numbers of hints provided at the level of individual action steps showed that the type of interaction, the semantic match between label and goal, and the number of objects on the display all are reliable predictors of action times and the numbers of hints provided. Furthermore, all three design parameters interacted with trial, so that they influenced performance most clearly during the exploration trial. An analysis of the interactions between the parameters, where possible, suggested that the number of objects to search impairs exploration performance only when a simple semantic match between the label and the goal is not possible. Finally, we found that the three design parameters also influence performance at a week-long delay. After a week delay, performance is worse (action times are longer, and more hints have to be provided) when interaction objects are not or poorly labeled, and when many objects have to be searched.

DISCUSSION

In this section we will first discuss the theoretical implications of the results. Then the extended model of exploratory search will be used to refine the cognitive walkthrough evaluation criteria.

Lessons about exploratory search

First, we have found that discretionary users of display- based systems are able to learn a new application by task- oriented exploration in a reasonable amount of time, if support in the form of occasional hints is available. On the average, the subjects in our study were able to get through the first task in about fifteen minutes, and needed approximately six hints to do so. Furthermore, they were able to use the knowledge acquired during this exploratory sequence in a second task, where they cut their performance time in half, and needed only about one hint per trial.

We have also found ample evidence for the label-following strategy suggested by Polson and colleagues [4, 11]. Action times, which can be interpreted as search times for the correct action, were clearly related to the quality of the labels provided in the interface. If there was a clear overlap between the subjects' goal description and the label in the interface, search times were minimal, even for the exploration trial (in the order of ten to twenty seconds per action). However, if the match between the label and goal was less evident, search times were greatly increased (up to 90 seconds).

Furthermore, we saw that this result is modified by the number of simultaneously displayed objects that the users have to search. If the label matches the goal representation well, the number of objects competing for attention do no alter the search time significantly. Even with as many as ten objects on screen, search times are no longer than thirty seconds. If the label match is poor, however, the number of objects on screen considerably changes the search times, so that a combination of poor matches and many objects can produce search times up to two minutes.

Finally, there was evidence to suggest that the particular type of interaction required also influences the ease with which an action will be discovered. The subjects in this study, who all had previous MS Word experience, considered labeled actions, such as menu-interactions, button-presses and the like, first, with search times under 30 seconds. They took longer to discover direct manipulation interactions on unlabeled objects, such as double-clicks on graph objects, drag-and-drop operations, etc., and the more alternative objects were presented on the screen, the longer were the search times (up to two minutes).

This suggests the following scan-search process for exploratory search of interfaces (an extension on the step 2 in Polson & Lewis' CE+): After forming a task goal, users will quickly scan the interface for interface objects containing a semantically 'promising' label. If such a label is provided, subjects will find it quickly; it seems to 'pop- out' at the user, no matter how many distracting objects are on the screen. If no good match with any provided label is possible, a slower, more careful evaluation of the interpretation of labels has to follow. This process is sensitive to the number of options to consider. Finally, when all labeled choices have been considered and found inappropriate, the users in this study turned their attention to unlabeled action choices and try direct manipulation operations on display-objects. Again, this is a slow process that is sensitive to the number of simultaneously displayed options.

Lessons about retention

Our results show that retention of newly discovered functions is indeed quite good in interaction with display- based applications, even after a week long delay between the first and the second use of the system. Subjects were able to cut their performance times in half, and even though there were differences in performance dependent on the length of the delay, these were relatively small in comparison to the great overall improvement from the first to the second trial. The detailed analyses showed that forgetting occurs mainly in the context of completely new subgoals, and can be related to the failure to retrieve newly learned information from memory [8].

From research to praxis II: Refinement of the cognitive walkthrough

In the introduction we pointed out how the four-step processing model of Polson and Lewis was translated into a four-step evaluation method of human-computer- interaction. We also pointed to some of the criticisms of the cognitive walkthrough method, namely that it does not provide guidelines about what makes an action 'clearly available' to a user, and which types of actions will be known and considered by a broad class of users.

The results offer answers to these two concerns. First, the goodness of the label match really does determine, whether an action will seem readily available to a user. Good label matches assure that users will discover the correct action choice. This points to the necessity of good participatory design: The better the designers understand their users' task goals and language, the better they will be able to provide labels that matches the users' expectations. If a label match can not be insured (because of a large variability in the user population), the number of options to consider should be kept to a minimum. The more options the user has to ponder, the longer it will take him or her to find the correct choice, and the more likely it is that he or she will make the wrong choice. If a design team is therefore unsure about the labeling of an action, it should try to consult the user population for suggestions, or try to minimize the alternative choices for this action-step of the interaction.

Second, direct manipulation actions will be considered last by a user who is familiar with standard off-the shelf systems such as MS Word or MS Excel, which are mainly menu-driven. If a design team finds that an action sequence necessitates the use of direct manipulation, they should ask again carefully, how many alternative actions are available at the time. As a general guideline, missing labels (direct manipulation), or poor label-goal matches, in combination with a number of alternative actions to search, will lead to long search times and user frustration. In our case, the experimenter had to intervene in situations like this, because the subjects were unable to make any progress on themselves. Design teams should also ask careful questions about the characteristics of the respective user-group that they are designing for. We have additional evidence [6] to suggest users might be more willing to consider direct manipulation operations when they are working in a system context that uses this type of interaction frequently. FIGURE 10 summarizes these evaluation recommendations for each one of the steps in a action sequence Writing in plain print states the walkthrough questions as summarized in [17]. Bold text refers to the refining questions derived from this study.

1) Will the user try to achieve the right effect (form the right goal)?
2) Will the user notice that the correct action is available?
o Is the action labeled or unlabeled?
o If the action is labeled, is there a poor match between the label and the users' representation?
o If in doubt, talk to your users!
o Are there more than 10 screen actions to be considered at this time?
o Is there a combination of labeling problems and a large number of objects?
The more questions are answered with NO, the more 'obvious' will the correct choice appear to the user.
3) Will the user associate the correct action with the effect trying to be achieved?
4) If the correct action is performed, will the user see that progress is being made?

FIGURE 10. Original walkthrough questions and refinements.

Finally, proofing an interface with the cognitive walkthrough in this way should not only help identify and fix problems that occur during exploration (the walkthrough was originally designed for this), but it should also help to smooth intermittent use of systems. Actions that were easy to discover in the first place will either be easily remembered after long time delays, or they will be trivial to rediscover.

CONCLUSIONS

The current study presents new insights into the nature of display-based interaction, namely that learning by task- oriented exploration is a possibility for interface-literate users, both in terms of the initial learning phase, as well as the use of this knowledge in later trials. We showed that forgetting plays a relatively small role in the use of display- based systems, and pointed to some of its possible causes. Furthermore, we presented evidence for a scan-search strategy in service of exploration of new applications, that is dependent on the good labeling of actions, as well as on the number of action options that need to be considered. Direct manipulations on unlabeled objects proved to be especially difficult to discover. These results were discussed in the context of the CE+ theory of exploration with computer systems, and the findings were integrated into the suite of evaluation criteria embedded in the cognitive walkthrough method.

Acknowledgments

Thanks to Peter Polson, Clayton Lewis, John Rieman, Evelyn Ferstl, Sharon Irving, and the members of the CHI'94 doctoral consortium for supporting this work in various ways. The anonymous reviewers of the CHI'95 conference provided appreciated editorial help as well as many thoughtful comments. This work was supported by NSF grant IRI 9116640.

References

[1] Anderson, J.R. (1982). Acquisition of cognitive skill. Psychological Review , 89, 369-406.
[2] Carroll, J.M., & Mazur, S.A. (1986). LisaLearning. IEEE Computer, 91, 35-49.
[3] Charney, D., Reder, L., & Kusbit, G. (1990). Goal setting and procedure selection in acquiring computer skills: a comparison of problem solving and learner exploration. Cognition and Instruction, 7, 323-342.
[4] Engelbeck, G. (1986). Exceptions to generalizations: implications for formal models of human-computer interaction. Unpublished master's thesis, University of Colorado, Department of Psychology, Boulder.
[5] Franzke, M. & Rieman, J. (1993). Natural Training Wheels: Learning and Transfer between two Versions of a Computer Application. In T. Grechenig & Tscheligi (Eds.). Lecture Notes in Computer Science: Human Computer Interaction. Vienna Conference VCHCI. Berlin, FRG: Springer-Verlag.
[6] Franzke, M. (1994). Exploration and experienced performance with display-based systems. Unpublished dissertation, University of Colorado, Department of Psychology, Boulder.
[7] Howes, A. (1994). A model for the acquisition of menu knowledge by exploration. In Proceedings of CHI '94, Boston, MA: ACM, 445-451.
[8] Kitajima, M. & Polson, P.G. (1994). A model-based analysis of errors in HCI. In Conference Companion of CHI '94, Boston, MA: ACM, 301-302.
[9] Mayes, J.T., Draper, S.W., McGregor, A.M., & Oatley, K. (1988). Information flow in a user interface: the effect of experience and context on the recall of MacWrite screens. In Jones, D.M, Einder, R. (Eds.): People and Computers IV, Cambridge UK: Cambridge University Press, 191-220.
[10] Payne, S. (1991). Display-based action at the user interface. International Journal of Man-Machine Studies, 35, 275-289.
[11] Polson, P., & Lewis, C. (1990). Theory-based design for easily learned interfaces. Human-Computer- Interaction, 5, 191-220.
[12] Polson, P., Lewis, C., Rieman, J., & Wharton, C. (1992). Cognitive walkthroughs: A method for theory- based evaluation of interfaces. International Journal for Man-Machine Studies, 36, 741-733.
[13] Rieman, J.F. (submitted). A field study of exploratory learning strategies. Submitted to ACM Transactions on Human Computer Interaction.
[14] Santhanam, R., & Wiedenbeck, S. (1993). Neither novice nor expert: the discretionary user of software. International Journal of Man-Machine Studies, 38, 201-229.
[15] Shneiderman, B. (1983). Direct manipulation: a step beyond programming languages. IEEE Computer, 57-69.
[16] Wharton, C., Bradford, J., Jeffries, R., and Franzke, M. (1992). Applying cognitive walkthroughs to more complex interfaces: Experiences, issues, and recommendations. Proceedings CHI'92 Conference, Monterey, CA, 381-388.
[17] Wharton, C., Rieman, J.R., Lewis, C., and Polson, P. (1994). The cognitive walkthrough method: A practitioner's guide. In J. Nielsen and R. Mack (Eds.), Usability Inspection Methods, John Wiley, NY