Cognitive Issues in Autonomous Spacecraft Control Operations: An Investigation of Software-Mediated Decision Making in a Scaled Environment

 

 

 

Chapter 1.  Introduction

 

     As advances in technology are applied in various complex, partially automated domains, the human controller becomes increasingly distant from the controlled process.   This greater physical and psychological distance may have both helpful and harmful effects on human performance.  Decision making, for example, may be helped by greater objectivity but hurt by a lack of involvement. The research described here investigated human decision making in a situational context modeled after advanced, unmanned spacecraft operations.

1.1 Background

     For over 30 years, the National Aeronautics and Space Administration (NASA) has been placing unmanned spacecraft in low-Earth orbit for various scientific purposes, such as observations of weather systems and oceanographic features. Until very recently, ground monitoring and control of these spacecraft have required the presence of human operators on a 24-hour-a-day basis. As advances in technology have produced spacecraft that can operate more reliably than their predecessors, continuous human monitoring and control of on-board systems are no longer needed (e.g., Abedini, Moriarta, Biroscak, Losik, & Malina, 1995; Aked & Pylyser, 1996).  However, the performance effects of removing humans from the continuous monitoring loop have not received any appreciable experimental attention.

     Advanced capabilities take process-control engineers and analysts into a new on-call paradigm, where they are still part of the system but are needed only when the so-called ÒintelligentÓ automation calls for human help. In this model (Murphy & Norman, 1998; Patterson & Woods, 1997), human analysts do not need to be physically present at any particular support facility, and they are not tasked with monitoring mission operations.  Their role is analogous to that of an on-call physician (Murphy & Norman, 1998).

     Cognitive issues arise, however, when the on-call analyst must intervene because a problem exceeds the scope of the autonomous capabilities. The present research was designed to investigate a sub-set of the cognitive issues faced by decision makers in such environments.

 

 

1.2  Definition of Terms

á      Autonomous system Ð a natural or artificial entity that exercises control over its own behavior and is able to make decisions without requiring specific outside intervention (cf. Takeuchi & Naito, 1995); an artificially autonomous system consists of hardware and software designed to support such functions as limited fault detection and resolution.

á      Artificially intelligent software process (i.e., agent) Ð a programmed capability that exhibits the following characteristics: autonomy, persistence, migration, cloning/spawning, collaboration, and learning (Truszkowski, 1996a, b):

Autonomy Ð capability to perform functions in the  background, without requiring human intervention for routine functioning

 

Persistence - capability to support inter-process activities that extend over significant time periods

 

Migration Ð capability to shift some of their task load across distributed nodes 

 

Cloning/Spawning Ð capability to copy themselves in support of parallel processing

 

Collaboration Ð capability to interact with other software processes and with users

 

Learning Ð capability to add case-by-case experience and user feedback to its knowledge base, showing improved performance over time

 

Such a process is called an ÒagentÓ in the operational environment at NASA-Goddard (e.g., Truszkowski & Odubiyi, 1994).[1]

 

á      Lights-out operations Ð a term from the field, denoting autonomous, unmanned ground control operations; with enough software autonomy built into the on-board and ground systems, control room personnel can turn out the lights and let the system run itself until such time as human intervention is needed; represents a level of automation beyond supervisory control; referred to here as the on-call model (Murphy & Norman, 1998; Peterson & Woods, 1997).

 

á      Out-of-the-loop (OOL) -- describes the on-call analystÕs lack of moment-to-moment involvement with the controlled process; the on-call analyst is out of the traditional feedback loop that provides continuous updates on system status.

 

á      Supervisory control Ð a level of automation that relieves the human operator of manual interaction with the monitored process, until such time as the automated monitoring capabilities signal that human intervention is required; present in the control room, the supervisory controller essentially monitors the system that monitors the controlled process (Moray, 1986; Sheridan, e.g., 1976, 1988a,b, 1997). The supervisory controller is present at the operational site.

 

In the remainder of this discussion, the term advanced software process (ASP) is used to refer to a software process with the characteristics listed above under Òartificially intelligent software process.Ó Although ÒagentÓ is the term used by designers and engineers in the field, it seems to go beyond metaphor in over-anthropomorphizing software processes (Shneiderman, 1997).  Although some authors state that Òcomputerized procedures can be viewed as cognitive agents with their own monitoring strategies and intentionsÓ (Kontogiannis & Hollnagel, 1998), the term ASP is used here to avoid the pitfalls of anthropomorphizing, primarily the danger of attributing human-like judgment to an inanimate tool.

1.3  Literature Review

The cognitive issues in human interaction with autonomous systems have received little direct research attention. The key cognitive issues are reflected in the following topics for review:

á      Effects of automation on human performance

á      Trust versus over-reliance on automation

á      Passive monitoring in supervisory control

á      Cognitive demands in autonomous, ASP-based systems

á      Limitations of decision making

á      Information-display needs in on-call situations

á      Performance effects of spatial visualization ability (SVA)

 

Previous research in these areas helped in formulating the hypotheses investigated in the present experimental context.

 

 

 

1.3.1  Effects of Automation on Human Performance 

 

When a task that humans could perform or have performed is allocated to the computer, that task is said to have been automated (e.g., Parasuraman, 2000). Wickens (1992) sorts the various purposes of automation into the following categories (pp. 531-532):

1.  Performing functions that the human operator cannot perform because of inherent limitations

2.  Performing functions that the human operator can do but performs poorly or at the cost of high workload

3.  Augmenting or assisting performance in areas in which humans show limitations

Automation that fulfills these purposes may be implemented at various levels, from intermediate gradations of semi-automation to full automation (cf. Endsley, 1997; Mitchell & Sundstrom, 1997; Parasuraman, 1997, 2000; Sheridan & Verplanck, 1978). In various domains, full automation of selected task groupings allows advanced software processes to function at some level of independence from direct human interaction.

Many authors have noted that, for all its promise, automation can negatively affect human performance (e.g., Bainbridge, 1987; Ballas, Heitmeyer, and Perez, 1991; Billings, 1990; Bowers, Deaton, Oser, Prince, & Kolb, 1995; Harris, Hancock, Arthur, & Caird, 1995; Hollnagel, 1992; Hollnagel & Woods, 1983; Hopkin, 1987, 1988, 1991; Kontogiannis & Hollnagel, 1998; Mitchell, 1983; Mosier, Skitka, Heers, & Burdick, 1998; Narborough-Hall, 1987; Norman, 1988; OÕHara, 1993; Parasuraman, 1997, 2000; Reason, 1990; Sarter & Woods, 1995a,b, 1997; Shaiken, 1986; Swain, 1987; Wei, Macwan, & Wieringa, 1998; Wiener, 1987; Wiener & Curry, 1980; Woods, 1994; Woods, Sarter, & Billings, 1997). Based on their findings, many of these authors have suggested that the uncritical acceptance of automation may make successful human intervention difficult, if not error prone, when such intervention inevitably becomes necessary. This possibility is of critical importance to human performance in an on-call environment.

 The present research was designed to investigate potential effects of ASP-based automation on human decision making and task-completion time. The negative effects of automation found in previous research suggest, for example, that automated identification of problems and automated selection of visual displays may degrade human decision-making performance.

1.3.2  Trust versus Over-reliance on Automation 

 

The issue of too little or too much trust in the automation has sparked considerable discussion and research (e.g., de Keyser, 1986; Eidelkind, 1995; Lee & Moray, 1992; Muir, 1987, 1994; Muir & Moray, 1996; Zuboff, 1988).  Noting that some human decision makers may not trust automated tools at all, while others may place too much trust in automated aids, Muir (1987) extends models of interpersonal trust to model human trust in automation.

Citing Sheridan and Hennessey (1984), Muir (1987) suggests that operators in supervisory-control environments, Òespecially novices, may be biased toward distrustÓ of the automation (1987, p. 534). Thus, experimental subjects, who do not have time to develop a professional level of expertise, may be biased toward distrust in the automation. Experts, however, may become accustomed to accepting machine diagnoses and problem resolutions if the automation has been highly reliable over a long period of time. 

Indeed, research conducted by Mosier, Skitka, Heers, and Burdick (1998) found that Òincreased experience decreased the likelihood of catching the automation failuresÓ for current pilots of commercial glass-cockpit (highly automated) aircraft (p. 58).  In related research, experienced flight dispatchers and pilots accepted flawed recommendations made by an apparently omniscient computer (Smith, McCoy, & Layton, 1997). Experts, then, may tend to be more trusting than novices are of highly familiar or supposedly competent automation.

In experiments on trust, Muir and other researchers have evaluated her 1987/1994 model of trust in the context of supervisory-control environments.  In their analysis of operatorsÕ self-reported trust ratings, Lee and Moray (1992) found that level of trust was affected by overall system performance as well as by faults that disrupted system performance.  Their experimental data indicate that an operatorÕs use of automatic controllers depends not on trust alone, but on a complex relationship between trust and self-confidence.

In research conducted at NASA-Goddard, Eidelkind (1995) investigated the role of trust in subjectsÕ willingness to delegate a detection task to a semi-autonomous software process. This study is one of the first to explore these issues in the context of ASP-based systems.

Eidelkind suggests that overly high trust in Òa supposedly reliable systemÓ leads to operators taking themselves out of the control loop (p. 47).  This is the issue of complacency: operators can become too trusting and overly reliant on the automation (cf. Wiener, 1987). In a similar vein, Eidelkind discusses the potential costs of delegating tasks to advanced software processes (p. 8):

With the elimination of monitoring, delegation may lead to an actual and/or perceived loss of control over the task by the operator.  In the case of perceived loss, even if the agent is highly reliable, feeling Ôout-of-the-loopÕ may cause unexpected problemsÉ  If the operatorÕs mental model of the agent systemÕs state is actually hindered [by being out of the loop], a total loss of control can occur.  When this occurs, the operator either fails to recognize agent breakdowns or, after positive recognition, lacks the ability to retake manual controlÉ

 

This is the classic view on the dangers of taking the operator out of the loop.

Because EidelkindÕs subjects were engaged in monitoring, they were operating more in a supervisory-control mode than in an on-call mode. In the on-call mode, the out-of-the-loop metaphor is no longer appropriate because the on-call analyst has never been completely in the loop (Murphy & Norman, 1998).  The key issue becomes one of providing displays that support rapid, accurate situation assessment and effective intervention by an on-call analyst.

To investigate the demands on the human and the information/display requirements in the on-call model, the experimental design must specifically NOT allow subjects to monitor anything. The current research was designed primarily to compare performance under monitoring (i.e., supervisory control) and non-monitoring (i.e., lights-out, on-call) conditions. Specifically, the question is whether on-call subjects will respond quickly and effectively when they have not been monitoring system status.

1.3.3  Passive Monitoring in Supervisory Control 

 

Supervisory control has been implemented widely in the control of continuous processes (e.g., oil refining, nuclear power generation), control of vehicles (e.g., air-, sea-, and spacecraft), and robotic manufacturing systems.  This paradigm is the norm in many of todayÕs complex, automated systems, both civilian and military.  Although these systems have been criticized for taking the operator too far out of the control loop (e.g., by Mitchell, 1983), a key characteristic of supervisory control is that humans are continually present and routinely monitoring 24-hour-a-day system operations.

Cognitive issues arise in the supervisory-control paradigm because the operator serves as a passive monitor for long periods of time (e.g., Bushman & Mitchell, 1986; Lee & Moray, 1992; Mitchell, 1981, 1983; Mitchell & Saisi, 1987; Moray, 1986; Sheridan, 1976, 1988b; Wickens & Kessel, 1979).  Vigilance and alertness are known to decline quickly under such conditions (e.g., Mackworth, 1948, 1950; Moray, 1986; Thackray, 1980; Thackray & Touchstone, 1989). When the signals to be detected (i.e., the targets) are infrequent, intermittent, and unpredictable, detection performance declines markedly during the first 30 minutes (Wickens, 1984).  Given research findings on the vigilance decrement, researchers have been concerned that supervisory-control operators will not be ready to respond efficiently and effectively when called upon to deal with an anomaly (e.g., Mitchell, 1983; Wiener, 1987).

A major issue, the so-called out-of-the-loop performance problem, is described concisely by Endsley and Kiris (1995, p. 381):

System operators working with automation have been found to have a diminished ability both to detect system errors and subsequently to perform tasks manually in the face of automation failures, compared with operators who manually perform the same tasks.

 

These authors attribute observed decrements in out-of-the-loop performance to loss of situation awareness (SA) and loss of manual skills.

The between-subjects study conducted by Endsley and Kiris (1995) included five experimental conditions: 

1) manual

2) decision support

3) consensual AI

4) monitored AI

5) full automation

The findings of this study provide strong supporting evidence for the classical view of the effects of the supervisory-control role on operator performance:  Decisions took longer and understanding of system state was lower in the fully-automated condition. These findings support the earlier findings of Parasuraman and his colleagues, who furthered the empirical study of the cognitive effects of increased automation (Parasuraman, Molloy, & Singh, 1993).

Complacency is another cognitive issue associated with the supervisory-control paradigm (e.g., Molloy & Parasuraman, 1996; Parasuraman, Molloy, & Singh, 1993; Riley, 1994).  The concern has been that operators will become overly trusting of the automation and will tend to think that any signal of a problem is really a false alarm (Wiener, 1987; Hopkin, 1988). Complacency has been cited as a contributing cause of the vigilance decrement in monitors of automated systems, i.e., the operator who trusts too much in the automation is less likely to think there is a need to check on what is happening (Bergeron, 1981; Endsley & Kiris, 1995; Parasuraman, Molloy, & Singh, 1993). 

Complex interrelationships among factors such as trust, reliance on automation, and situation awareness underlie human performance in supervisory-control systems, including the more highly automated, transitional versions that incorporate some autonomous software processes.  The present research was designed to investigate and explicate possible relationships between software-reported confidence in the automated problem diagnosis and two dependent variables: subjects' self-reported confidence[2] in their own decisions and subjects' self-reported reliance on the automated diagnosis. 

 

 

 

1.3.4  Cognitive Demands in Autonomous, ASP-based

       Systems 

 

The literature on autonomous spacecraft systems tends to focus on the technical, engineering issues involved in achieving autonomy (e.g., Harvey, 1996; Lecouat & De Saint Vincent, 1996; Klein, Kulp, & Rashkin, 1996) and how to develop adaptive, ASP-based systems (e.g., Barber, Goel, Liu, Macfadzean, Martin, & Ramaswamy, 1997).  Spacecraft engineers generally assume that mission analyst will be able to solve any problem requiring human expertise.  Little or nothing has been said about demands of any kind that might be placed on human cognitive capabilities by this kind of situation. It is assumed that it will be possible to page the expert and that the expert will be able to deal with the problem, perhaps even from home at 2:00 a.m. (e.g., Abedini, Moriarta, Biroscak, Losik, & Malina, 1995; Aked & Pylyser, 1996; Hucteau, 1996). Issues of situation assessment and decision making typically go unmentioned in the engineering literature.

These issues have been articulated from a human-factors perspective (e.g., Truszkowski, 1996b). As described in his presentation, software agents occupy a virtual position between the human-computer interface and the command-and-control system.  Instead of dealing directly with sensor information and historical trends in spacecraft health-and-safety data, the analyst will need information on how the ASPs came to the conclusion they did about the alleged anomaly.

A problem is that the ASPs will not be 100% reliable, and the analyst will need some way of validating the ASPsÕ conclusions. The cognitive demands associated with verifying automated decisions are described by Kontogiannis and Hollnagel (1998, p. 255, italics added):

Checking the reliability of automated decisions

is not an easy taskÉOperators may have to

undertake additional verification tasks, such

as deciding what data have been consulted by

the system, how the system handles unreliable

data, whether the system has perceived the

state of the process correctly, and so on.

 

In the present experimental task, subjects were required to decide whether the systemÕs perception of the onboard situation was correct, that is, the subjects were required to validate the simulated spacecraft's automated decisions.

As noted by Bainbridge (1997), it is sometimes ÒnecessaryÉto interpret the situation, rather than to assume that the situation is only and exactly what can currently be sensed in the environmentÓ (p. 352).  The possibility that information provided by ASPS may be in error or may be incomplete imposes cognitive demands and raises performance issues for the analysts who are charged with fault resolution.

Publications based on a NASA-Goddard field study articulate the key cognitive issues associated with increased mission autonomy. For example, Murphy, Norman, Truszkowski, and Grubb (1997) note that empirical research is needed on the cognitive and human-performance effects of lights-out (on-call) automation. The present experimental environment was developed to begin investigating these effects.

1.3.5  Limitations in Decision Making

 

Human judgment and decision making have been widely studied from the rationalist, normative perspective, with the common finding that people are non-optimal decision makers (e.g., Kahneman & Tversky, 1972, 1973, 1982; Tversky & Kahneman, 1973, 1974). Human decision-makers do not consider all alternatives and their consequences but tend to rely on what has worked in the past.

As shown in the many studies summarized by Von Winterfeldt and Edwards (1986), Òhuman inference is routinely conservativeÓ (p. 533).  This is thought to be the case because people do not fully extract the information available in cues that can reduce uncertainty, i.e., diagnostic cues, and because they consider all cues as Òequally informativeÓ (Wickens, 1992, p. 270).  As suggested by Wickens (1992), this may occur because of a general need to reduce the load on short-term memory.

Given the limitations of human decision making, it would seem that automated aids might be called for.  Designers cannot assume, however, that an automated decision aid will improve performance. For example, an evaluation of a fault-detection aid for a nuclear power plant simulation found that

aided operators performed better when

they had to diagnose malfuctions caused

     by multiple failures, and when they had

     to diagnose malfunctions which they did

not practice during their training

     (Sassen, Buiel, & Hoegee, 1994).

 

In the cited study, having the aid did not improve subjects' diagnosis of malfunctions caused by single failures or their diagnosis of practiced problems; nor did having the automated aid increase the aided groupÕs confidence in their diagnoses.

A further limitation of decision making is people's tendency to fall back on general rules-of-thumb or heuristics when asked to make probability estimates or judgments.  These educated guesses or short cuts will work sometimes, under some circumstances, but come with no guarantee of making a correct decision.  It is thought that people tend to settle on heuristic solutions in part because more complex strategies impose greater demands on limited working memory (e.g., Reason, 1990; Wickens, 1992). Over 25 heuristics and biases have been identified (Kahneman, Slovic, & Tversky, 1982; Sage, 1981). A few examples of heuristics follow (Tversky & Kahneman, 1974):

á      Anchoring Ð in revising a hypothesis or belief, the tendency to shift up or down only slightly from a mental setpoint established by the first item of evidence, the anchor

 

á      Availability Ð judging the probability of A by the ease of bringing instances of A to mind

 

á      Representativeness Ð judging the probability of A by the extent to which it resembles B

 

The present research examines anchoring and adjustment from the displayed level of agent confidence in the problem diagnosis. 

Decision making is further constrained by a low correlation between confidence in one's decision and the accuracy of the decision (Plous, 1993).  People may be extremely confident of a very wrong answer. As Loftus (1979) warns, ÒOne should not take high confidence as an absolute guarantee of anythingÓ (p. 101). In LoftusÕs research context, the eyewitness may be very sure in identifying the wrong person as a criminal. Although the results from eyewitness testimony bear on recall accuracy, they may be understood in terms of judgment and decision accuracy as well: The eyewitness must make a metacognitive judgment about his or her level of certainty and decide whether recall certainty is sufficient.

The present research investigated subjectsÕ confidence in their judgments of the accuracy of ASP-based anomaly reports. The expectation was that subjects' confidence in their decision would not be positively correlated with their accuracy, i.e., that previous findings would be replicated.

1.3.6  Information-Display Needs in On-Call Situations

 

     An assumption made in planning for lights-out operations holds that fault-diagnosis-and-resolution aids should take the form of or incorporate capabilities for providing analysts with two-dimensional (2-D) and three-dimensional (3-D) displays of mission data (e.g., bar graphs, line graphs, pie charts).  Such displays seem intuitively superior to tabular displays, e.g., the rows and columns of numbers traditionally presented to operators in mission control rooms.

     Under some conditions, however, tables are superior to graphs in supporting performance (Meyer, Shinar, & Leiser, 1997).  Thus, it is unclear whether graphical displays will improve usersÕ accuracy in evaluating problems reported by autonomous software processes. The need for graphics may depend on the nature of the problem (Boehm-Davis, Holt, Koll, Yastrop, & Peters, 1989).

The literature on textual versus graphical display design has generally been interpreted to support the conclusion that graphical displays are beneficial to tasks with spatial components, but not to purely symbolic tasks (e.g., Benbasat & Todd, 1993; Vessey, 1991). Research findings demonstrate that different graphical formats have different effects on judgment and decision making (e.g., MacGregor & Slovic, 1986; Meyer, Shinar, & Leiser, 1997). Tasks best supported by textual displays include determining specific numerical values (for example, 3752.0964).  Tasks best supported by graphics include making comparisons that do not require highly specific values; determining the highest/lowest, biggest/smallest component; and extracting trends from data gathered over time (e.g., Boehm-Davis, Holt, Koll, Yastrop, & Peters, 1989; Dickinson, DeSanctis, & McBride, 1986).

As noted by Boehm-Davis and her colleagues (1989) and Shneiderman (1998), choosing an appropriate visual display depends on an understanding of the task to be performed and the kind of data available. In the domain of spacecraft control, where the primary tasks are monitoring for anomalous conditions and diagnosis and resolution of anomalies, it does not appear than anything more sophisticated than a 2-D trend display is considered necessary or operationally suitable by current operational personnel (Murphy, Norman, & Moshinsky, 1999). This seems to be the case even though the performance benefits of highly graphical and dynamic data displays have been demonstrated in NASA-like supervisory-control environments (e.g., Mitchell & Saisi, 1987).

In contrast to NASA's expectation that lights-out analysts will need graphical displays, experience on the Extreme Ultraviolet Explorer (EUVE) mission has been that textual problem files and optionally available trend displays are adequate to support the paged analystÕs process of anomaly isolation and resolution (Stroozas, personal communications, March 26, 1998; April 16, 1998).  It is, thus, unclear whether visual formats other than 2-D trend displays will be needed in any lights-out environment.[3]

Whether implicit or explicit, an objective of all information-display efforts is to reduce requirements for users to remember commands, operations, and navigational paths (Norman, 1994).  The most promising approaches are those that Òmake apparent hidden links and logical contingencies, andÉthat allow the user to perform spatial and intermediate operations on the interface rather than in the headÓ (Norman, 1994, p. 203).  Effective visual displays help to overcome individual differences in spatial abilities that favor one group of users over another.

 The current research compared various 2-D display techniques for their effectiveness in supporting subjects' performance on the experimental tasks. The expectation was that tables and bar charts would be superior to line graphs in supporting performance because the experimental task did not require trend detection.

 

 

 

 

 

1.3.7  Performance Effects of Spatial Visualization

  Ability 

 

Spatial visualization ability has been defined in similar ways by various researchers, e.g.,: 

á      the ability to deal with complex visual problems that require imagining the relative movements of internal parts of a visual image (Pellegrino & Hunt, 1991, p. 205)

á      the ability to manipulate or transform the image of spatial patterns into other arrangements (Ekstrom, French, Harmon, & Dermen, 1976, p. 173)

á      the mental manipulation of spatial information to determine how a given spatial configuration would appear if portions of that configuration were to be rotated, folded, repositioned, or otherwise transformed (Salthouse, Babcock, Skovronek, Mitchell, & Palmon, 1990, p. 128).

Mental manipulation and transformation are common elements of these overlapping definitions.

SVA is becoming widely recognized as Òthe primary cognitive factor driving differences in performance using computersÓ (Norman, 1994, p. 195): Those with high SVA perform well on computer tasks, but those with low SVA perform poorly. SVA was investigated as a mediator of performance in the present research because of its significant relationship to performance in other computer-based research (e.g., Alonso, 1998; Butler, 1990; Norman & Butler, 1989; Vincente, Hayes, & Williges, 1987).

Because low-SVA people are less able to rely on the mental imagery that seems to come naturally to high-SVA people, tasks that impose moderate processing demands on high-SVA subjects may impose high demands on low-SVA subjects (Alonso, 1998; Salthouse, Babcock, Mitchell, Palmon, & Skovronek, 1990).  The strong implication drawn by these researchers is that low-SVA subjects will approach the limits of working memory sooner than will high-SVA subjects.

In a computer-based environment, where navigation of large, intricate data bases and complex menu structures is inescapable (cf. Norman, 1991b), low-SVA subjects may be at a distinct and widening disadvantage (Alonso, 1998; Norman, 1994). This disadvantage may be especially incapacitating when a low-SVA user is faced with a user interface low in apparency, i.e., one that hides the relationships among displayed and non-displayed objects and possible user operations (e.g., Alonso & Norman, 1996). In contrast, a highly apparent or intuitive user interface makes underlying contingencies transparent to the user.

The present research investigated the influence of SVA on performance that was supported by various 2-D approaches to information display. Based on previous findings in the literature, the expectation was that low-SVA subjects would be slower and less accurate compared to the high-SVA subjects.

1.4  Research Design

The overall research design is summarized in Figure 1.

1.4.1  Independent Variables

In this 2 X 3 X 3 X 2 factorial design, some variables were manipulated between-subjects and some within-subjects. The level-of-human-involvement variable was manipulated between subjects to reflect the real-world distinction between supervisory control and on-call environments: The key difference is that human operators are present and monitoring system operations under a supervisory-control paradigm, but they are not present in the control room and typically not monitoring system operations in an on-call context.

Both software-agent confidence and display-selection mode were manipulated within subjects.

 

 

 

 

 

 

Human Involvement Prior to Alert

                     Monitoring |    No Monitoring

 

Display-       Automated    Manual | Automated  Manual

Selection

Mode               

 

Display Type   Table  Bar   Line    |  Table Bar  Line

 

Agent            90    70    50    |   90    70  50

Confidence   

 

                    Figure 1.  Research design

 


These manipulations reflected real-world conditions under which human operators are exposed to varying levels of software reliability and varying levels of automation within the same system (cf. Parasuraman, 2000). Software-based versus manual display selection reflected the likelihood that real-world automation of display selection might be feasible in some cases but not in others. Treating display selection as a within-subjects variable allowed observation of the effects of a varying level of automation on subjectsÕ performance.

     A fourth independent variable, type of display, was manipulated within subjects.  The display types were table, bar chart, and line graph.  For each practice and test problem, the data were available to the simulation in the form of a table, a bar chart, or a line graph.  In the software-based mode of display selection, the choice among the three types was made at random.  In the manual-selection mode, subjects were given the choice of displaying the situational data in tabular form, in a bar chart, or in a line graph.

1.4.2  Dependent Variables

     SubjectsÕ performance was measured in terms of speed (time to respond to an alert, time to complete a task) and the accuracy of their situational assessments.  Response time was measured from the onset of an alert to the subjectÕs acknowledgment of the alert.  Task-completion time was defined as the time from acknowledgment of an alert to submission of a decision, minus the time for entering a free-text rationale for the decision. Accuracy was determined by comparison of the subjectÕs problem diagnosis with an answer key.

     Another dependent variable, type of display selected in manual-selection mode, was recorded as  (1) table, (2) bar chart, (3) or timeline. A confidence rating was collected for each decision made by a subject.  The confidence ratings were reported on a scale from zero to 100 percent, at 10-unit intervals (e.g., 50, 60, 70, 80).

Prior to the experimental session, subjects completed a short questionnaire on their attitudes toward automation and their preferences for graphical versus tabular displays.  After the experimental session, subjects completed a questionnaire designed to measure their attitudes toward the simulated software and their reliance on agent confidence. Both questionnaires presented statements that subjects rated on a nine-point scale for their level of agreement or disagreement (1 = strongly disagree, 9 = strongly agree). Responses to the pre- and post-test questionnaires provided input to a repeated-measures analysis of changes in attitudes toward automation.

1.5  Hypotheses  

 

1.5.1 Hypothesis 1 (H1)

Monitoring subjects will be quicker to submit their answers and more accurate in comparison to subjects in the on-call group.

Rationale:  This hypothesis is based on similar predictions in the literature on monitoring behavior and supervisory control.  This result is expected because the on-call subjects will have been engaged in other, unrelated activities in between problems.  It should take them longer to get back into the problem-solving process and to submit their answers to problem alerts.  According to this line of thinking, accuracy is likely to suffer from forgetting due to interference by the distractor task, which the monitoring subjects will not have been experienced.  The on-call model predicts, however, that performance of the on-call subjects will be at least as good as performance of the monitoring subjects.

 

 

1.5.2  Hypothesis 2 (H2)

Subjects who are given the option of selecting the display type for more than half of the test tasks will choose tables and bar charts more often than they choose timelines.

     Rationale: Subjects will find it easier to extract the values that they need from tables and bar charts as compared to line graphs. 

1.5.3  Hypothesis 3 (H3)

Decision accuracy will be better for tables and bar charts than it will be for line graphs.

     Rationale:  Tables and bar charts are better suited for supporting the comparison of actual values with normal ranges.  Tables give exact values, which the subject can then compare with the values given in the agent rationale.  The height of the bars in a bar chart gives a visual cue to relative value and directly supports comparison with other bars.  Line graphs, however, are better suited to detection of trends over time because each line connects the points at which observations were taken for specific components.

1.5.4  Hypothesis 4 (H4)

The displayed level of agent confidence will serve as an anchor from which subjects will adjust their own level of confidence in their answer to a specific problem. 

Rationale: When given a value in a problem statement, subjects use that value as an ÒanchorÓ and adjust their estimate in either direction from the anchor (Kahneman, Slovic, & Tversky, 1982).  The assumption here is that the value given for agent confidence will serve as an anchor from which subjects will adjust their own confidence level.

1.5.5  Hypothesis 5 (H5)

a.  There will be no significant relationship between subjective confidence ratings and accuracy.

b.  Low subjective confidence will be negatively (inversely) correlated with task-completion times.

Rationale: a. Published research often reports a lack of correlation between subjective confidence and decision accuracy (e.g., Fischhoff, Slovic, & Lichtenstein, 1977; Loftus, 1979; Plous, 1993 ). In some situations, people tend to be overconfident, that is, certain of wrong answers. Highly confident eye witnesses can identify the wrong suspect.  Although domain experts are typically well calibrated, non-experts who are high in confidence will not necessarily be high in accuracy or have low response times. b.  Low confidence logically implies the need to spend more time solving a problem or making a decision.  Thus, low confidence should be associated with high task-completion times. 

1.5.6  Hypothesis 6 (H6)

Self-reported reliance on the automation will be higher in the on-call condition than in the monitoring condition.

Rationale:  Low human involvement makes the subject more dependent on the software.   The on-call subject will be forced to think that the automation is reliable.  The monitoring subject may be less reliant on the software because of greater familiarity with the normal operations of the simulated MOCHA system.

1.5.7  Hypothesis 7 (H7)

On-call subjects will report that automated systems need less monitoring than will subjects in the monitoring condition.

Rationale:  Based on their experience with MOCHA, the on-call subjects can be expected to realize that it is not necessary for them to be paying full attention to MOCHA in order to solve the problems.  However, the monitoring subjects can be expected to develop the attitude that monitoring is necessary because that is what they were required to do.  To some extent, this would justify the severe boredom that they experienced, i.e., if I had to go through that, it must be necessary. People tend to base their assessment of what is needed on their own experiences in similar situations. 

1.5.8  Hypothesis 8 (H8)

High-SVA subjects will perform better in both the monitoring and on-call conditions (i.e., their response times will be shorter and their accuracy better than for low SVA subjects).

Rationale:  High SVA gives an advantage over low SVA for faster completion of computer-based tasks (Vincente, Hayes, & Williges, 1987). Since this effect has been found for various computerÐbased tasks (e.g., Alonso, 1998) and has been described as pervasive (Norman, 1994), there is every reason to expect to find similar effects in the MOCHA environment. 

 

 


Chapter 2. Method

2.1  Participants     

Undergraduate psychology students at the University of Maryland were randomly assigned to the monitoring or on-call conditions and received class credit for their participation.  Fifteen subjects participated in pilot studies and 83 in the experiment.  Data were analyzed for 42 men and 41 women.  Ages ranged from 18 to 28 years old (mean = 19.9, standard deviation = 2.2).  

Class status ranged from freshman to senior (28 freshmen, 29 sophomores, 15 juniors, and 11 seniors). Levels of other demographic variables are summarized in Table 1:

Table 1 

Self-reported Experience on a Nine-point Scale

    (1 = Novice, 9 = Expert) (N = 83)

 

 

 

 

Experience withÉ

Minimum

Maximum

Mean

Std. Deviation

Computers

1.00

9.00

5.74

1.71

Tables

3.00

9.00

5.87

1.58

Line graphs

3.00

9.00

6.23

1.52

Bar graphs

     3.00

      9.00       

   6.51

         1.60

 

Subjects rated the usefulness of graphics at a mean of 6.81 on a nine-point scale (1 = Useless, 9 = Very useful), with a standard deviation of 1.63 (min = 1, max = 9).

2.2  Materials

A consent form approved by the University of Maryland's Institutional Review Board (IRB) was used to document subjects' informed willingness to participate in the MOCHA experiment (Appendix A).  A paper-and-pencil, pre-test survey of attitudes toward automation (Appendix A) included 12 statements, which subjects rated on a nine-point scale (1 = disagree, 9 = agree). A standard test of spatial-visualization ability (SVA)[4] was programmed for on-line presentation to subjects. (See Figure 8 in Appendix B for a sample problem.) Background training materials (Appendix C) were developed by the experimenter to give subjects a basic overview of spacecraft-control operations and to define terms that they would encounter in the experiment. 

An on-line survey of consumer attitudes and expenditures (Appendix D) was used for the distractor task in the on-call condition. Items in the distractor survey were adapted from such sources as USA Today, Information Week, Time Digital, and questionnaires developed by the U. S. Census Bureau. A paper-and-pencil survey of post-test attitudes toward automation included 11 items (Appendix A), which subjects rated on a nine-point scale (1 = disagree, 9 = agree). The post-test items generally substituted "MOCHA" for the term "automation" in some pre-test items, but items were added asking about the extent to which subjects relied on the agent's confidence in making their decisions and in setting their own confidence.

2.3  Simulation Environment

With support from the NASA grant, a simulation for use in experimental research was developed in the Laboratory for Automation Psychology. The experimenter dubbed the simulation "MOCHA", an acronym for Mars Observer Calls Home Again.[5] The design of MOCHA was informed by interviews with personnel at NASA-Goddard Space Flight Center and by materials gathered from the World Wide Web about the Mars Observer and other unmanned spacecraft (e.g., NASA, 1993, 1997). 

Both the design of MOCHA and the experimental conditions were influenced by the Lights-Out Ground Operations System (LOGOS) Program at NASA-Goddard and by discussions with LOGOS personnel.  Other sources were consulted for technical information on spacecraft components and operations (e.g., Carraway, 1996; Fleeter, 1996; Lord, 1996; Marshall, Landshof, & van der Ha, 1996; Morrison, 1996; Neal, Lewis, & Winter, 1995). Personnel at the Johns Hopkins Applied Physics Laboratory described their personal experiences with day-to-day operations and spacecraft anomalies on the NEAR and MSX missions. The experimenter also drew on over 10 years of experience conducting human factors analyses, including cognitive task analyses, in NASA-Goddard's mission control centers, as documented in, e.g., Fischer and Murphy, 1983; Murphy and Mitchell, 1986; Sheppard, Murphy, and Stewart, 1985, 1991; Stewart and Murphy, 1984.

The resulting MOCHA simulation roughly mimics some of the characteristics of actual autonomous spacecraft operations. As such, it represents a "scaled" environment, which retains key features of the real environment but reduces real-world complexity (Ehret, Gray, & Kirschenbaum, 2000). Reflecting some of the capabilities envisioned for the LOGOS prototype, the MOCHA concept includes a simulated fault-detection-and resolution ASP that is, by definition, capable of detecting and resolving simple anomalies (i.e., system faults). The problem statements imply that software of this kind is onboard MOCHA, but there is no actual fault-detection software. Again, by definition, this ASP operates autonomously until unable to resolve an on-board problem. As do its real-world counterparts, the fault-detection ASP alerts the analyst/subject when there is a situation that it cannot handle. Examples of problem alerts and other MOCHA screens are provided in Appendix B (Figures 7 through 16).

As illustrated in Appendix B, MOCHAÕs user interface (UI) provides graphical views of the system data associated with problem reports. One view of the data for each problem is either chosen for presentation by another simulated ASP or requested by the subject. The UI provides a means for the subject to view parameter values and trends in spacecraft health-and-safety data.   The UI permits the subject to select a response to each alert, to enter free-text explanations of responses, and to enter subjective confidence in each response.  Subjects are permitted to change their response an unlimited number of times before submitting it.

 The MOCHA simulation was developed on a Windows NT platform in VisualCafŽ, a development tool for building JavaTM applets and applications (Symantec, 1995). MOCHA displays were presented to subjects on a 20-inch monitor. On-call subjects accessed displays for the distractor task via the World Wide Web using the Netscape browser on a 17-inch, color-synchronized Macintosh monitor. The MOCHA simulation and the distractor survey are available from the Laboratory for Automation Psychology.

2.4  Procedure

2.4.1  Pilot Studies

To evaluate whether subjects were able to perform the experimental tasks and to fine-tune the design, several pilot studies were conducted. Fifteen students participated in these iterative studies. Because it was found that the training portion of the session was taking over one hour, the training materials were condensed, and the number of practice problems was reduced from ten to six.  For the same reason, the number of test problems was reduced from 18 to 10. These changes made it possible to complete an entire session in 90 to 120 minutes. The pilot studies indicated that subjects were able to develop a strategy for solving the problems a