Simulated RN

Virtual Healthcare Agent

Simulated RN
Iowa State University HCI 598 Capstone Project by William Morse

Evaluation

Usability testing.

Full M5 Documentation

Task Scenario Table

Task Scenario Table

The Simulated RN will simulate a discharge instruction process to accommodate a diverse array of primary and secondary user needs. In a Gestalt like manner, the configuration of system tasks is designed to support a unified whole process and not just the needs of a particular user group. The arrangement of prototype task components is derived from the analysis of all stakeholders.


All user tasks are initiated by the virtual healthcare agent, requiring the user to provide a touch based response in a modal dialogue box. The prototype is designed to wait for a period of time (e.g., Task 1: 2-minutes; Tasks 2 & 3: 5-minutes) and if no response is received by the system, the virtual agent will initiate a waiting for response cue. For usability testing, all responses are stored in a database, including the time required to respond as usability metrics.


 

Task Description

Related Requirements

Usability
Performance Metrics

Task 1

The user will be given instructions by the virtual healthcare agent outlining the intended workflow, including how to provide user input and navigation. The first task is to determine whether the user understands the basic instructions by answering the question—Are you ready to begin?


User Task: Allow users to navigate the interface when they are prepared, or  to locate desired information.

Communication, intended workflow, natural speech, simplified navigation, touch-based user inputs

Count frequency related to failure to make a selection.

Count frequency related to the accuracy of a selection (e.g., correct or incorrect intention of selection).

Time required selecting (in seconds).

All of the tasks require touch-based user input or responses.

Task 2

Educational content will load and the virtual healthcare agent will provide additional instructions and teaching material. The Simulated RN shall monitor user preferences. The system can provide information related to user interest making the experience patient-centric.

The second task is designed to determine a user’s preference for feeding the infant (e.g., breastfeeding, formula feeding). The user’s selection will be stored and subsequent educational content will correspond to this selection. For example, bowel movements correlate with feeding preferences. If a user selects formula feeding, the bowel movement section will emphasize information related to the previous selection (patient-centric).


User Task: Offer various instructional opportunities for the user to explore.

 

Instructional framework

Task 3

The virtual healthcare agent will provide additional instructions, education and multimedia content. The third task is another user preference. The Simulated RN will ask the question, “Would you like to an education video related to feeding baby?”  The user will select either yes or no. The interface shall provide educational content to accommodate user learning styles, visual aids and preference. 


User Task: Accommodate the user's learning styles by providing them learning options.

Instructional framework, multisensory, accommodate various user learning styles, visual aids

 

 

Methods

Selection

Five associates, with varying degrees of relationship to the target audience were invited to participate in this study. Participants were from a generalized nonprobability convenience sample selected to represent the primary user (mothers of newborns) and secondary user (maternity experts) groups.


Participants

Five participants (1 male, 4 females) volunteered to partake in the usability study. One participant was a physician, two participants were RNs (1 from labor and delivery), and two participants represented mothers of recent newborns (having given birth within the past year). Therefore, two participants represented the primary user group (mothers of newborns), while three participants represented the secondary user group (maternity experts). All participants were English speaking. The age of the participants included two participants between the ages of 20-29, two participants between the ages of 30-39 and one participant between the ages of 40-49. No control group was used for this study.


Procedures

A usability study was conducted to explore and assess the functionality of the Simulated RN prototype. The study was exploratory seeking to gather information about the prototype and to assess its preliminary design utility. The study required a user-based test to measure a set of representative tasks in the Simulated RN prototype interface. Each participant was scheduled for an individual 20-minute session and all sessions were conduct on the same day.


A controlled setting was used to conduct each session. The testing environment included a small office room with a desk and two chairs. On the desk sat the Simulated RN prototype interface installed on a Dell Inspiron 2320 all-in-one computer. The same volume and screen settings were used for all participants. Participants used the Dell Inspiron's touch screen for all user inputs, no keyboards or mouse were used.


The test moderator sat next to the participant during the test sessions. The test moderator introduced the session and provided the participants with a brief set of instructions. During the session, the frequency count and time measures were obtained automatically by the interface and were stored into a database for each participant. In addition, an accuracy measure was obtained after each user touch input to verify the participant's intention through a follow-up question asked by the test moderator. Participant verification was provided through verbal confirmation and recorded on paper.


Finally, a post-session self-reporting qualitative survey was handed out to each participant to gather subjective data related to interface features. The self-reported measures asked the participants to respond using the rating scale.


Measures

User Study

The measurement criteria were based on Nielsen's Usability Engineering (1993) measureable goals: time to learn how to operate the system, speed of user performance, and the rate of errors made by users and the user's satisfaction with the system. All data will be used to summarize information related to the measureable goals in an attempt to determine system usability.


Quantitative Usability Measures

Quantitative measures were derived through these data sources:

  1. Count frequency related to failure to make a selection;
  2. Count frequency related to the accuracy of a selection (e.g., correct or incorrect intention of selection);
  3. Time required to make a selection in seconds.

Qualitative Usability Measures

Qualitative measures will focus on attitudes toward the agent and satisfaction with the concept. These measures will be self-reported items used to assess overall satisfaction with the agent, ease of system use, and preference for human or agent. A survey instrument was created to capture responds to these items.



 



Results

Quantitative Results

The Task Analysis Table reflects participant data related to their touch interactions with the system interface. The metrics include frequency counts and measures of time in seconds. The participant distribution is the key factor of this table.


Task Analysis Table

 

Task 1

Task 2

Task 3

 

Failure

Accuracy

Time

Failure

Accuracy

Time

Failure

Accuracy

Time

Participant 1

0

1

10

0

1

8

0

1

7

Participant 2

0

1

12

0

1

7

0

1

7

Participant 3

0

1

13

0

1

9

0

1

8

Participant 4

0

1

10

0

1

7

0

1

8

Participant 5

0

1

10

0

1

7

0

1

7

Totals

0

5

55

0

5

38

0

5

37

Mean (in sec)

 

 

11

 

 

7.6

 

 

7.4



The Task Summary Table reflects participant data related to their touch interactions with the system interface. The metrics include frequency counts and measures of time in seconds. The task distribution is the key factor of this table.


Task Summary Table

 

Participants

Failure Count
(failed/attempts)

Accuracy Count
(correct/attempts)

Mean Time Required Selecting (in seconds)

Task 1

5

0 (0 of 5) 0%

5 (5 of 5) 100%

11

Task 2

5

0 (0 of 5) 0%

5 (5 of 5) 100%

7.6

Task 3

5

0 (0 of 5) 0%

5 (5 of 5) 100%

7.4

Mean

5

0%

100%

8.67

Task 1: All users were successful and accurate with their input and averaged 11 seconds to respond.
Task 2: All users were successful and accurate with their input and averaged 7.6 seconds to respond.
Task 3: All users were successful and accurate with their input and averaged 7.4 seconds to respond.



Qualitative Results

The Participant Information Table reflects participant demographic data. The metrics include frequency counts and scaled replies.

Participant Information Table



Gender

Male
1 (20%)

Female
4 (80%)

 

 

 

Age

<20
0 (0%)

20-29
2 (40%)

30-39
2 (40%)

40-49
1 (20%)

 

User group

Primary (mothers of newborns within the past year)

Secondary (maternity experts)

 

 

 

 

2 (40%)

3 (60%)

 

 

 



The General Questions Table reflects participant survey data. The metrics include frequency counts and scaled replies.


 


 

Discussion

The prototype performed well and provided a preliminary level of usability, validated through evidence gathered in the user-based test. All five participants successfully completed all 3 tasks without failure or error. In addition, all participants agreed to some extent that the system was easy to use and navigate.


The mean time for all tasks averaged 8.67 seconds to complete, beginning with an average 11 second time to response in task 1, to a 7.6 second average on task 2 and to a 7.4 second average on task 3. The time required to complete tasks incrementally trended downward, perhaps indicating that the users were getting more familiar with how to operate the system. The effects of prolonged system use could indicate an increase in task efficiency.


In terms of Neilson’s Usability Engineering measureable goals, the time to learn how to operate the system, speed of user performance and rate of errors the data indicates that the system’s functionality supports the user’s tasks. However, the Simulated RN Survey highlighted user’s satisfaction with the system suggested that there is more work to be done.


The virtual healthcare agent performed well, four of five participants claimed they were satisfied working with the virtual agent. Furthermore, four of five participants agreed to some extent that the Simulated RN would improve patient satisfaction. In terms of the high agency and high behavioral realism, three of five participants had difficulty understanding the computer generated speech and three of five disagreed that the movements appeared natural. Participants were asked, if they had a choice between a human agent and a virtual agent, three of five participants chose a human. There was a direct correlation with these three participants, each citing problems related to the quality of speech.


The main implications of the prototype related to the body language and movements (high behavioral realism), and most of all, the quality of the computer generated speech (high agency). All of the areas studied are critically dependent on communication. Surprisingly, despite the apparent deficiency in the quality of speech, it had no influence on user performance. This particular area is of significance and requires more emphasis in future design and testing.


In one sense, the data suggest the high agency and high behavioral realism required more attention in future development, but all participants agreed that the Simulated RN would improve patient comprehension and four of five participants suggested it would enhance the patient discharge process. As one participant stated, “I think with some fine tuning this could really be useful.”


In conclusion, the prototype process and usability study aimed to determine required changes in the structure of the design. These results show the Simulated RN prototype design has feasible utility as a product. However, as systems are characterized by increasing levels of automation, the virtual agent’s activities are increasingly driven by the user’s response to the system. The structure of task performance to user satisfaction will require more work to improve the overall usability. Limitations of this study include a smaller sample size. In addition, the use of a convenient sample may provide some face validity, but could also skew results and the interobserver reliability.   An empirical investigation will be needed in the future to demonstrate the effectiveness of the proposed design on other aspects related to patient acquisition of knowledge, comprehension and satisfaction.