An exploration of the role of visuals and users’ imagery and verbal preferences on goal attainment and coaching chatbot adoption

Terblanche, Nicky; Prywes, Yaron

doi:10.24384/p15v-jq34

International Journal of Evidence Based Coaching and Mentoring
2025, Vol. 23(1), pp.205-221. DOI: 10.24384/p15v-jq34

Abstract
Introduction
Theory and hypotheses
Method
Results
Discussion
Conclusion
References
About the authors

Academic Paper

An exploration of the role of visuals and users’ imagery and verbal preferences on goal attainment and coaching chatbot adoption

Nicky Terblanche ✉ (Stellenbosch Business School, Stellenbosch University, South Africa)
Yaron Prywes (Cglobal Consulting)

PDF

Abstract

Artificial intelligence chatbots could scale coaching, however user adoption is a challenge. We investigate the effect of images on chatbot adoption and coaching efficacy by comparing a text-only coachbot (TextBot, n=126) with a text+images bot (ImageBot, n=116). We measure goal attainment and technology adoption one week apart, as well as users’ preferences for imagery and verbal modes of communication. Perceived goal attainment increased at T2 for both bots. If “Correct word usage” was important, users found the TextBot to be less “fun”. Users with lower “Imagination” also found the TextBot easier to use, and were more likely to use it. We introduce the concept of “AI coach customization”.

Keywords

artificial intelligence chatbots, chatbots, coaching chatbots, AI coaching, technology adoption, UTAUT, visual verbal preference, goal attainment, visual coaching

Article history

Accepted for publication: 03 January 2025
Published online: 03 February 2025

Metrics

Search Google Scholar for this article

Citation

Terblanche, N. and Prywes, Y. (2025) 'An exploration of the role of visuals and users’ imagery and verbal preferences on goal attainment and coaching chatbot adoption', International Journal of Evidence Based Coaching and Mentoring, 23 (1), pp.205-221. DOI: 10.24384/p15v-jq34 (Accessed: 4 April 2025).

Introduction

The advent of large language models (LLMs) and their use in chatbots such as ChatGPT have propelled forward an already vibrant chatbot scene. Some estimate that there are hundreds of thousands currently active chatbots (Johnson, 2018). Although slow to start, chatbots are becoming more popular in the domain of organizational coaching. Organizational coaching is "a human development process that involves structured, focused interaction and the use of appropriate strategies, tools, and techniques to promote desirable and sustainable change for the benefit of the coachee and potentially for other stakeholders" (Bachkirova, Cox, & Clutterbuck, 2014, p. 1).

Non-directive coaching chatbots differ from traditional service chatbots. Their purpose is not to provide answers to users’ questions or requests, but rather to use question and reflective frameworks to facilitate users’ own sense making (Terblanche & Kidd, 2022). For this type of chatbot to be effective, regular engagement is necessary to continuously promote users’ reflective practice since reflection is a crucial ingredient in raising self-awareness and personal growth. One mechanism to encourage chatbot usage is to design the chatbot to appeal to users’ perceptions of ease of use and fun (Venkatesh et al., 2016). The way a chatbot interacts with users plays a significant role in the adoption of this technology and one important aspect of chatbot adoption is its modality of interaction. Interaction modalities refer to the combination of message characteristics and the channel with which they are conveyed (McGuire, 2012).

Despite the existence of general guidelines for designing AI chatbot coaches (for example Strohmann et al., 2023; TerStal et al., 2020; Terblanche, 2020), research on technology adoption factors and coaching chatbot efficacy relating to interaction modalities is still limited. Examples of recent studies that investigated the role of chatbot communication modalities include comparing voice chatbots to text-only chatbots (Terblanche, Wallace & Kidd, 2023) and typing commands versus selecting from a list of options presented to the user (Mai et al., 2022). Both these studies found differences in user’s preferences for chatbot communication modalities.

Visual images have been shown empirically to increase people’s engagement relative to a “words only” approach in a number of coaching-adjacent contexts. In education settings, studies suggest people prefer visuals over a “words alone” approach. For example, audio-visual presentations are rated as more enjoyable and interesting if accompanied by pictures (Levie & Lentz, 1982). In ethnography, photo elicitation – the use of photographs in interviews – has been found to stimulate longer and more comprehensive interviews. Collier, who coined the phrase photo elicitation, noted that using photos while interviewing had a “compelling effect upon the informant … to stimulate and release emotional statements about the informant’s life” (Harper, 2002). Finally, in health education settings, people’s emotional response to pictures has been shown to affect whether they increase or decrease target behaviors. Patients who had a positive emotional response to illustrations were more likely to increase target behaviors (Houts et al., 2006).

To summarize, there are reasons to believe that visual images may positively influence user adoption and engagement with a coaching chatbot. Similar to a presenter giving a talk, an ethnographer conducting an interview, or a doctor educating a patient, the use of a visual image in a coaching chatbot may amplify the impact of the message being sent.

Given this background, the questions that this study asks are: 1) Does adding images to a coaching chatbot increase goal attainment and technology adoption? 2) Do users’ imagination and verbal preference affect goal attainment and adoption when comparing a text-only with a text+image coaching chatbot? Finding answers to these questions may assist with the increased adoption and efficacy of AI chatbot coaching in organizations.

We apply three theoretical lenses to study this phenomenon: dual-code theory, goal theory and technology adoption theory.

Theory and hypotheses

Dual code theory

Dual-code theory (DCT) (Paivio, 1971; Sadoski & Paivio, 2012) postulates that human cognition involves two distinct systems for processing information, one specializing in verbal information and the other non-verbal information, particularly imagery. Combining written and visual information has shown to enhance comprehension, problem-solving, and learning outcomes across different educational domains (Clark & Paivio, 1991; Mayer, 2017). Visual information is also typically perceived as easier, faster, and more enjoyable than written information (Grabe, 2020).

An important nuance to note is that dual-code theory also suggests that not all users will respond to visual information in the same way. Paivio (1990), for example, noted that people have preferred “modes of thinking” where some individuals prefer an imaginal mode while other individuals prefer a verbal mode. Individuals with a preference for an imaginal mode of thinking often use mental pictures to solve problems. Individuals with a preference for a verbal mode of thinking attest that most of their thinking is verbal, as if they are talking to themselves. Neurophysiological differences between people with different cognitive styles has been detected (e.g., Jawed, Amin, Malik & Faye, 2018; Paivio, 2014), and eye-tracking studies have found different eye-movement patterns between people with higher or lower scores on the visual-verbal dimension (Koc-Januchta et al., 2019).

Goal theory

Goal theory, with its robust foundation in empirical research and practical application, posits that goal setting is inherently motivating, linking the challenge of a goal with performance and the effort required (Locke & Latham, 2002). This theory is underpinned by five key principles for setting goals: specificity and clarity; an appropriate level of difficulty; initial and ongoing commitment; consistent feedback on progress; and manageable complexity (Locke & Latham, 1990). Goals represent the personal desire for specific outcomes (Austin et al., 1996). To improve the likelihood of achieving goals, several strategies are advocated, such as writing down goals, assigning clear metrics and timelines, and committing to these goals in a public manner (Locke & Latham, 2002).

In coaching, goal theory is employed to enhance self-regulation (Grant, 2006). The coaching process involves setting goals, creating and implementing action plans, ongoing monitoring of progress, and adjusting goals or plans as necessary based on feedback and achievements. Previous research on chatbot coaching has shown that a chatbot coach is comparable to a human coach in terms of helping clients with goal attainment (Terblanche et al, 2022). In the Terblanche et al. (2022) study a goal attainment coaching chatbot helped users achieve their goals over a period of 10 months at a rate similar to people who received human coaching. They also found that goal attainment increases as time passes. A recent pilot study by Isaacson et al. (2024) found that even one coaching session with a goal attainment AI chatbot coach improved goal progress by 10%. Therefore, we expect that, irrespective of whether text or text+imagery is used as the communication modality:

H1a: The goal attainment for TextBot and VisualBot users will both be higher at T2 compared to T1

Using images has been shown to improve comprehension, problem-solving, and learning outcomes (Clark & Paivio, 1991). Therefore, we expect that when adding images to a goal-setting chatbot conversation:

H1b: The ImageBot users will have higher goal attainment compared to TextBot users

Technology adoption

Technology adoption is a mature research field that studies the factors that influence the uptake of a technology. Derived from the original Technology Adoption Model (TAM) (Davis et al.,1989), the Unified Theory of the Acceptance and Use of Technology (UTAUT) has emerged as the primary framework to study technology adoption (Venkatesh et al., 2003). The UTAUT version 2 considers the role that performance expectancy, effort expectancy, social influence, facilitation conditions, hedonic motivation, price value and habit have on the user’s attitude towards the technology and ultimate motivation to use it (Venkatesh et al., 2012). For the purpose of the present study, we omit price value and habit due to the novelty of AI chatbot coaches.

Performance expectancy is the degree to which an individual believes that using the system will help them perform better (Venkatesh et al., 2003). Performance expectancy has been shown to positively affect technology adoption in service-oriented chatbots (Almahri et al., 2020; Kasilingam, 2020; Kim et al., 2019; Kuberkar & Singhal, 2020; Melian-Gonzalez et al., 2021) as well as coaching chatbots (Terblanche & Kidd, 2022). Adding images to a chatbot should explain the purpose and functionality of the chatbot more clearly by placing emphasis on certain keywords and concepts like “privacy” (Bateman, 2014). We therefore expect that:

H2a: The ImageBot will be perceived to perform better (performance expectancy) than the TextBot

Effort expectancy is the level of ease associated with using a technology (Venkatesh et al., 2003, p. 450). Davis et al. (1989) found that people are more likely to use an application if they perceive it as easy to use. In previous studies, a positive relationship between effort expectancy and the intention to use the technology was found with service-oriented chatbots (Almahri et al., 2020; Kuberkar & Singhal, 2020). It is possible that adding imagery to a chatbot may enhance comprehension of the dialog, allowing users to spend less energy on deciphering instructions. However, in this case, both chatbots contain the same amount of text and the instructions are relatively straight-forward. As such, the amount of effort required to interact with the ImageBot may actually be higher as users need to scroll past numerous embedded images. Therefore, we expect that:

H2b: The TextBot will be perceived to be easier to use (effort expectancy) compared to the ImageBot

Social influence describes the extent people think other people believe they should use a technology (Venkatesh et al., 2003) and captures people’s perception of their status within a group (Moore & Benbasat, 1991). Studies have found social influence to be a significant predictor of intention to use technology service chatbots (Kim et al., 2019; Kuberkar & Singhal, 2020; Melian-Gonzalez et al., 2021) and coaching chatbots (Terblanche & Kidd, 2022). According to Dual code theory there is no reason to believe that images will affect peoples’ social standing. In addition, none of the images used in the ImageBot represent social situations or authority figures. Therefore, we therefore that the:

H2c: ImageBot and TextBot will be rated similarly in terms of social influence.

Facilitating conditions capture the infrastructure needed to use the technology (Venkatesh et al., 2003). This construct has been shown to be an important consideration for service chatbots (Kuberkar & Singhal, 2020) and coaching chatbots (Terblanche & Kidd, 2022). Adding images to a chatbot does not place any additional constraints on the technology infrastructure requirements in the present study. Therefore, we expect that the:

H2d: ImageBot and TextBot will be rated similarly in terms of infrastructure required to use them (facilitating conditions)

Hedonic motivation refers to the level of pleasure or fun derived from using a technology. It has been shown to be a significant factor in various contexts including predicting consumer’s intention to use mobile shopping services (Brown & Venkatesh 2005). Visual information is typically considered more enjoyable to interact with relative to a “words-only” communication modality (Grabe, 2020). Therefore, we expect that the:

H2e: ImageBot will be perceived to be more fun to use (hedonic motivation) compared to the TextBot

Users’ attitudes towards a technology have been widely shown to be a predictor of their intention to use the technology in the future (Ajzen, 1991; Melián-González et al, 2021). Attitude captures the user's perception of how engaging, likable and interesting a technology is. Visuals are often experienced as engaging and stimulating (Clark & Paivio, 1991). Therefore, we expect that:

H2f: Users will have a more positive attitude towards the ImageBot than the TextBot

Intention to use a technology captures users’ actual behavioral intent for engaging with the technology on an ongoing basis (Venkatesh et al. (2003). Given that we expect people to find the Imagebot more enjoyable and fun to use, we also expect that users will want to use the ImageBot in the future more so than the TextBot. Therefore, we hypothesize that:

H2g: The intention of users to use the ImageBot on an ongoing basis will be higher than the TextBot.

Preferences for Imagery verses Verbal modes of communication, and technology adoption

As noted earlier, dual code theory (Paivio, 1971; Sadoski & Paivio, 2012) posits that people have different preferences, habits and skills when it comes to processing imagery versus verbal stimuli. These preferences can be measured by an individual differences questionnaire (IDQ) (Paivio & Harshman, 1983), which examines five specific aspects related to visual or verbal processing. The first aspect is “verbal fluency”, which examines the extent to which a user has no difficulty expressing themselves verbally or with the written word. The second is “habitual use of imagery”, which examines the extent to which a user's thinking consists of mental pictures or images. The third is “correct word usage”, which captures the extent to which users are aware of, and try to adhere, to proper grammar. The fourth is “imagination”, which describes the extent to which people’s daydreams or night dreams are vivid. Finally, “reading difficulties” is the extent to which people feel they are hampered in their ability to read.

In a study that used the IDQ, it was found that multimedia presentation aids are more effective for learners who prefer visual information (Butler & Mautz, 1996). By definition, people with higher levels of visual preference prefer an interaction that contains images. As noted earlier, visual information is also typically perceived as easier, faster, and more enjoyable than written information (Grabe, 2020). As such, it is important to examine potential individual differences in processing imagery verses verbal information.

In the present study, the TextBot contained no images. The information and interactions in this condition were purely text-based. The ImageBot, on the other hand, included 12 images that complemented the written word. This inclusion of images while maintaining textual consistency across the two conditions should make it possible to detect a potential interaction between a user’s individual preference for a particular communication mode and technology adoption. Specifically, people who prefer visual modes of communication should favor the ImageBot and vice versa. Therefore, we hypothesize that:

H3: Individual differences in visual and verbal processing will influence technology adoption (UTAUT) such that people with a stronger preference for visual processing will favor the ImageBot over the TextBot.

Method

This study followed a between-group experimental design with two groups of participants (n=242) interacting with two equivalent coaching chatbots, one using only text (TextBot, n=126) and the other with images added (ImageBot, n=116). Participants complete a demographic survey as well as measures on technology adoption and goal attainment after an initial engagement with the chatbots. Technology adoption and goal attainment were measured again one week later. Individual participant differences were also measured in terms of imagery and verbal habits and skills.

The chatbot scripts

Two coaching chatbots were created with identical conversation scripts based on the well-known GROW coaching model (Grant, 2011). For the second chatbot, images were added in certain parts of the conversation to illustrate the text. Figure 1 displays side-by-side screenshots to show the reader how images were embedded in the coaching conversation. The chatbot with images is referred to as “ImageBot”, while the chatbot without images is referred to as “TextBot”.

Both chatbots take the user through the same structured, scripted conversation flow. The interaction begins with the chatbot introducing itself, explaining its capabilities and aim (to help solve a workplace challenge). It then proceeds to ask users questions following the GROW model:

What is your “Goal”?
What is your “Reality” surrounding this goal?
What “Options” have you considered to achieve this goal? Which option is the most viable to focus on?
What is your “Will” to achieve this goal and what actions are you going to take?

Again, the users’ interaction with the two chatbots was identical except for the added images in the ImageBot.

Figure 1: TextBot and ImageBot screen shots

Image Selection Procedure

While research has shown that visually illustrating a particular message can enhance the impact of a message (i.e., Zillmann, 2006), currently there is no overarching criteria or method to guide the use of images in research. Some studies use pre-existing images based on the researcher’s own subjective choice (e.g., Konecˇni et al., 2007; Samburskiy, 2020). Others rely on participants to generate images (e.g., Clark-Ibanez, 2004), or on researchers to generate images (e.g., Loewenthal, 2020). Yet others strive to combine researcher and participant input. For example, Hur et al. (2020) asked participants to bring in photographs related to two keywords (i.e., “fearful” or “happy” photographs), and then filtered the photographs based on the aims of their study. In a somewhat similar fashion, the present study aimed to leverage both researcher input and lay perspectives.

The following procedure was used to select this study’s images for the ImageBot. First, it was determined that the images utilized should be complementary to the text in the chatbot script, as opposed to having a contradictory or independent relationship to the text (Bateman, 2014). In other words, the goal of the images chosen was to support and potentially amplify the message conveyed in words by chatbot, as opposed to undercutting or deviating from the text message.

Second, the script of the chatbot utilized in this study was examined for “keywords”. Keywords are words or phrases that are central to deciphering the meaning of a sentence. For example, in the sentence “I am a chatbot coach”, the word “chatbot” is a keyword. Twelve such keywords were identified. Each keyword was entered into the search feature of a database of freely usable images called Unsplash.com. The website filters relevant results from the approximately 3 million high-resolution images stored in its database. The researchers then selected four images per keyword from the presented results using popularity (i.e., the most combined views and downloads) and the notion of compatibility to the chatbot text as criteria.

Next a crowdsourcing platform (Prolific) was used to ask 100 participants to rank the four images associated with each keyword. Participants were asked to share which image in their view best matched the keyword. Participants ranked the four images with 1 being their favorite (best match) and 4 their least favorite (worst match). The highest-ranking images per keyword were used in the ImageBot. In total, 12 images were inserted in the chatbot conversation. Figure 2 displays six user selected images with associated text.

Figure 2: Examples of images and corresponding conversation text of the ImageBot

Sample

Participants were sourced from a crowdsourcing platform called Prolific. The use of crowdsourcing platforms has become a popular way to collect data (Buhrmester et al, 2018) and Prolific is considered one of the most reliable platforms currently available (Douglas et al., 2023).

This study utilized a between group design where different groups of participants used the TextBot and ImageBot. In total 249 participants signed up for the study on a first-come basis. The gender split was 124 female, 124 male with one participant selecting “Neither”. In the end 126 participants completed the TextBot conversation and 116 the ImageBot conversation. TextBot participants were recruited first and excluded from participation in the ImageBot experiment. The mean age was 40 (sd=10.29) with most respondents falling into the 25 to 44 categories (67%). Participants were paid $6 to participate in this study. Informed consent was obtained from all participants before data collection started. The study was approved by the first author’s research institution.

Experimental procedure

Participants completed an initial survey capturing demographics and individual preferences in terms of imagery and verbal habits and skills. After having one conversation with the chatbot, they completed a survey measuring technology adoption and goal-attainment (T1). These constructs were measured again one week later (T2). Participants did not interact with the chatbots between T1 and T2.

Measures

Demographic questions were assessed including age, gender, industry employed in and current job level. To measure individual preferences in terms of imagery and verbal habits and skills, a questionnaire based on Paivo’s Individual Differences Questionnaire (IDQ) (Paivo et al., 1983) compiled by Kardash et al., (1986) was used. This IDQ survey consisted of 33 questions that measured five constructs: Verbal fluency (14 questions), Habitual use of imagery (four questions), Correct word usage (eight questions), Imagination (three questions) and Reading difficulties (four questions).

To measure technology adoption we used a 33-item adapted version of the UTAUT2 survey (Venkatesh et al., 2003) that included measures for performance expectancy - PE (five questions), effort expectancy - EE (six questions), social influence - SI (four questions), facilitating conditions - FC (six questions), hedonic motivation - HE (three questions), attitude towards the chatbot - AT (five questions) and intention to use the chatbot in future - BI (four questions). Finally, goal attainment was measured using the perceived competence measure (four items) adapted from Williams and Deci (1996).

All measurements used a five-point Likert scale with 1 being Strongly Disagree and 5 Strongly Agree.

Data analysis

A two-way Analysis of variance (ANOVA) was conducted to address this study’s hypotheses. The two chatbots and the five constructs of the image/verbal IDQ were inputted as independent variables. The dependent variables were the seven constructs of technology adoption (UTAUT) and goal attainment. To compare the TextBot verses the ImageBot conditions, a main effect was reported. The effect of the five IDQ constructs were investigated by focusing on the interaction effect between the chatbot and IDQ_construct. When the interaction was significant, it implied that the particular IDQ construct influenced the experience of the users, and that a difference existed between the two chatbot conditions.

Effect sizes were reported using Cohen’s d effect sizes. Normality assumptions were assessed by inspecting normal probability plots. These were all found to be acceptable.

Results

This study’s first research question asked, “Does adding images to a coaching chatbot increase goal attainment and technology adoption?”. Results for the first hypothesis (that Goal attainment for TextBot and VisualBot users will be higher at T2 compared to T1) indicate that for both chatbots, a significant increase was detected in users’ perceived goal attainment one week after using the chatbots (p=0.02). Figure 3 illustrates this result. Therefore, we accept H1a.

Figure 3: Goal attainment levels for both chatbots one week apart

Table 1 below compares the results of users’ perceptions of interacting with the TextBot versus the ImageBot. It summarizes the means, p-value and effect size related to seven technology adoption factors and goal attainment. Overall, no statistically significant difference was detected between the two bots on any of the dependent variables. Therefore, we reject H1b (goal attainment), H2a (performance expectancy), H2b (effort expectancy), H2e (hedonic motivation), H2f (attitude) and H2g (behavioral intent). Regarding H2c (social influence) and H2d (facilitating conditions) we expected no difference between the bots and no difference was detected. Therefore, we accept these two hypotheses.

Table 1: Summary of means and standard deviations for comparing TextBot and ImageBot user experiences

	TextBot mean (SD)	ImageBot mean (SD)	p-value	Effect size (Cohen’s d)
PE	2.88 (1.15)	2.27 (1.12)	0.45	0.02 (negl)
EE	3.17 (0.83)	3.61 (0.69)	0.16	0.18 (small)
SI	2.32 (0.90)	2.27 (0.81)	0.73	0.03 (negl)
FC	3.88 (0.69)	3.76 (0.65)	0.20	0.17 (small)
HE	3.07 (1.19)	3.09 (1.10)	0.76	0.03 (negl)
AT	3.17 (1.17)	3.00 (1.03)	0.20	0.15 (small)
BI	2.52 (1.17)	2.23 (1.15)	0.26	0.14 (small)
Goal Attainment	3.64 (0.95)	3.48 (0.94)	0.12	0.17 (small)

The second research question asked, “Do users’ imagination and verbal ability affect adoption and goal attainment perception when comparing a text-only with a text+image coaching chatbot?”. A number of hypotheses examined individual differences between users in terms of their imagination and verbal abilities, and a potential corresponding preference for a particular chatbot. Results indicate that most of the IDQ constructs had no significant effect on UTAUT measures. That being said, evidence was detected suggesting that for two of the five text and image-based individual predispositions (“Imagination” and “Correct word usage”), certain UTAUT constructs were in fact affected.

For the IDQ construct of “Imagination”, users with lower “Imagination” found that the TextBot required less effort (effort expectancy) than the ImageBot (p=0.05) (Figure 4). Similarly, users with lower “Imagination” were more likely to use the TextBot compared to the ImageBot (behavioral intent, p=0.06) (Figure 5). However, these effects disappear with higher levels of “Imagination”. This points to the possibility that adding images to the chatbot conversation is a hindrance to people with lower levels of imagination.

Figure 4: Effort expectancy differences detected between TextBot and ImageBot users at different levels of Imagination (significant at 0.05 level)

Figure 5: Behavioral intent differences detected between TextBot and ImageBot users at different levels of Imagination (approaching significance at 0.05 level)

For the IDQ construct of “Correct word usage”, results show that if “Correct word usage” is important, there is a tendency for users to experience the TextBot as less “fun” (Hedonic motivation) to use (p=0.07) (Figure 6). These three results suggest limited support for H3, as only three UTAUT constructs were affected by users’ image and verbal preferences.

Figure 6: Hedonic motivation differences detected between TextBot and ImageBot users for different levels of Correct word usage (approaching significance at 0.05 level)

A summary of results for all this study’s hypotheses are displayed in Table 2.

Table 2: Summary of hypotheses and results

Hypothesis	Supported?
H1a: Goal attainment for TextBot and VisualBot users will be higher at T2 compared to T1	Supported
H1b: ImageBot users will have higher goal attainment compared to TextBot users	Not supported
H2a: The ImageBot will be perceived to perform better (performance expectancy) than the TextBot	Not supported
H2b: The TextBot will be perceived to be easier to use (effort expectancy) compared to the ImageBot	Not supported
H2c: The ImageBot and TextBot will be rated similarly in terms of social influence	Supported
H2d: The ImageBot and TextBot will be rated similarly in terms of infrastructure required to use them (facilitating conditions)	Supported
H2e: The ImageBot will be perceived to be more fun to use (hedonic motivation) compared to the TextBot	Not supported
H2f: Users will have a more positive attitude towards the ImageBot than the TextBot	Not supported
H2g: The intention of users to use the ImageBot on an ongoing basis will be higher than the TextBot.	Not supported
H3: Individual differences in verbal and visual processing influence technology adoption (UTAUT) such that people with a stronger preference for visual processing will favor the ImageBot over the TextBot	Limited support

Discussion

This study asked two research questions: 1) Does adding images to a coaching chatbot increase goal attainment and technology adoption? 2) Do users’ imagination and verbal ability affect goal attainment and adoption when comparing a text-only with a text+image coaching chatbot?

For research question one, the overall results suggest no significant difference exists between the TextBot and ImageBot in terms of their impact on user goal attainment and technology adoption. This is somewhat surprising given dual-code theory and the literature on visuals cited above. To state the obvious, we had anticipated that ImageBot users would perceive both higher goal attainment and technology adoption relative to TextBot users. One possible explanation for these results is that the images used during the coaching conversation were not chosen by the chatbot users. They were in fact chosen by a separate sample of participants, as detailed in the methods section. Furthermore, the images utilized illustrated a generic goal setting process, rather than the actual goals users had in mind. If the ImageBot had asked participants to generate an image or set of images related to their goal, would this have had a significant impact on their user experience? It seems possible that if users had input into the visual dimension of their experience, it may have enhanced their engagement, and ultimately their degree of goal attainment and technology adoption.

Regarding technology adoption, we expected five of the seven UTAUT constructs to be higher for the ImageBot given that studies in other fields such as healthcare education found that the use of images increased desired behavior (Houts et al., 2006). However, this was not the case in this study. One possible explanation is that the images used merely illustrated the conversation text (see Figure 2) and were not used or need to explain or simplify complex concepts. The concepts covered in the coaching conversation were straightforward and easy to understand without the use of images, potentially negating the explanatory impact that visuals can have. As noted above, another factor that may have contributed to the lack of superior efficacy of the ImageBot is that the images were pre-selected. Users had no control over customizing the images in their chatbot experience. In the field of media studies, it is well established that people can interpret and relate to the same image differently. As such, even though the images in this study were selected using an objective, democratic process, it is possible that the images ultimately did not appeal to everyone, or were in fact, disliked by some users.

One clear result from this study is that irrespective of chatbot type (text or images+text), both chatbots led to an improved perception of goal attainment one week after using the chatbot. This result adds to a growing number of studies that show the remarkable efficacy of chatbot coaches to assist with goal attainment (see for example Terblanche et al., 2022). Goal attainment is considered by many as the essence of organizational coaching (e.g., Grant, 2006) and this result adds to the evidence that even relatively simple coaching chatbots can be effective in promoting individual goal attainment.

Research question two looked at the data from an individual difference perspective, namely that differences exist in users’ preferences to imaginal and verbal modes of communication. Certain differences were observed in this study between TextBot and ImageBot users. Users who perceive themselves as having low “Imagination” found the ImageBot more difficult to use (effort expectancy) and their intention to use the ImageBot was lower than TextBot users. It seems that adding images may have distracted these users and added a cognitive load to their processing of the coaching conversation text. For users with higher levels of “Imagination” this effect disappeared, suggesting that users with higher imagination may in fact find the ImageBot easier to use compared to the TextBot. For users who were particular about “Correct word usage”, the fun aspect of using the TextBot (hedonic motivation) decreased with an increase in “Correct word usage”. On the other hand, for ImageBot users the level of correct word usage of the users did not play a role in their experience of “fun” in interacting with the ImageBot. This result suggests that adding images to a chatbot coaching conversation may, to a certain extent, neutralize the irritation experienced by users with a high need for “correct word usage”.

An important concept that emerged from this study is that different coaching chatbot modalities may appeal to different types of individual users. In the present study this effect is evident in users with varying levels of “Imagination” and need for “Correct word usage”. This finding adds to initial evidence from other studies on the positive effect on chatbot adoption and efficacy when chatbots are customized to meet individual user needs, such as extraversion (Shumanov & Johnson, 2021; Terblanche et al, 2023) and preferences relating to clicking on predefined options versus typing instructions into a chatbot (Mai & Richert, 2022). This emerging phenomenon suggests that coaching chatbots could be tailored to suit individual preferences. Customization of coaching chatbots could help address the current challenge in AI coaching uptake and perceived efficacy.

Limitations and future research directions

This study sought to examine the impact that “illustrating” text messages conveyed by a coaching chatbot may have on a user’s experience. Visual scholars (i.e., Bateman, 2014; Nikolajeva & Scott, 2001) note that an image can amplify or reduce the message conveyed by text. However, no overarching criteria or guidelines exist to guide researchers in the selection and potential integration of visuals with text. This study utilized a method where an initial batch of images were selected by researchers who entered “key words” into a free image-bank. This batch was dwindled down using a pool of 100 lay participants, who ultimately selected the final images presented to Imagebot users. This method led to mixed results. As such, future research may consider having end-users generate images for themselves. Such processes are more “client-centric", and have yielded anecdotal success in coaching contexts. For example, Prywes and Mah (2019) encourage clients to choose inspirational images to help them implement their custom coaching action plan.

Future studies may also explore chatbots that give “equal weight” to both image and text, as is done in comic books and illustrated books. A more “visual centric” experience, rather than a “word centric” experience, may result in higher user engagement in certain populations.

An additional limitation of this study is that some of the keywords utilized in the chatbot script may have been more imageable than others. According to dual-code theory, concrete and abstract words are represented in the verbal system, whereas the imaginal system only links to concrete words (e.g., words that are readily imaginable). Perhaps some of the keywords identified in the chatbot script were more imageable than others? If so, this may have had an impact on the images chosen, and ultimately participant engagement. Future research should seek to investigate this effect by accounting for the imaginability of keywords.

Finally, it is worth noting how imagery is commonly used today in the realm of personal fitness apps. For example, the standard Health app on recent iPhones utilize a dynamic visual of a “circle” to encourage people’s to complete their “step loop” (e.g., achieve their daily target of steps). Therefore, future research may also seek to investigate the potential role of dynamic images and visuals aids to motivate progress.

Conclusion

This study set out to investigate the effect of adding images to a coaching chatbot on users’ goal attainment and their willingness to engage with a chatbot. The results indicate that while the addition of images did not make a significant difference on an individual user level, some differences did materialize in the experiences of ImageBot and Textbot users. These differences suggest that coaching chatbots can be customized to appeal to different types of users, thereby potentially improving the adoption rate and efficacy of AI chatbot coaching. This study thus contributes to the important, emerging field of AI chatbot coaching by introducing the concept of AI coach customization, as well as confirming the goal attainment abilities of AI coaching. As AI coaching adoption increases, human coaches should take note of the findings of this study. There is compelling evidence that AI chatbot coaching improves goal attainment and can therefore enhance human coaching. Human coaches should also be aware that not all chatbot coaches are alike and they should take care to match the most appropriate chatbot with their specific clients’ preferences.

References

Almahri, F. A. J., Bell, D., & Merhi, M. (2020). Understanding student acceptance and use of chatbots in the United Kingdom universities: a structural equation modelling approach. In 2020 6th International Conference on Information Management (ICIM), 284-288. IEEE.Austin JT & Vancouver JB (1996). Goal constructs in psychology: Structure, process, and content. Psychological Bulletin, 120(3), 338–75. DOI: 10.1037/0033-2909.120.3.338.Bachkirova, T., Cox, E., & Clutterbuck, D. (2014). Introduction. In E. Cox, T. Bachkirova, & D. Clutterbuck (Eds.), The complete handbook of coaching (2nd ed)., 1–20. SAGE.Bateman, J. (2014). Text and image: A critical introduction to the visual/verbal divide. Routledge.Brown, S. A., & Venkatesh, V. (2005). Model of adoption of technology in households: A baseline model test and extension incorporating household life cycle. MIS quarterly, 399-426.Butler, J. B., & Mautz, R. D. (1996). Multimedia presentations and learning: a laboratory experiment. Issues in Accounting Education, 11(2).Buhrmester, M. D., Talaifar, S., & Gosling, S. D. (2018). An evaluation of Amazon’s Mechanical Turk, its rapid rise, and its effective use. Perspectives on Psychological Science, 13(2), 149-154.Clark, J. M., & Paivio, A. (1991). Dual coding theory and education. Educational Psychology Review, 3(3), 149-210.Davis, F. D. (1989). Technology acceptance model: TAM. Al-Suqri, MN, Al-Aufi, AS: Information Seeking Behavior and Technology Adoption, 205-219.Deci, E. L., & Ryan, R. M. (2000). The "what" and "why" of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry, 11(4), 227-268.Douglas, B. D., Ewell, P. J., & Brauer, M. (2023). Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA. Plos one, 18(3), e0279720.Grabe, M. E. (2020). Visual cognition. In Handbook of Visual Communication, 2nd ed., 51-70. Routledge.Grant AM. (2006) An Integrative Goal-Focused Approach to Executive Coaching. In: Stober DR, Grant AM, editors. Evidence based coaching handbook: Putting best practices to work for your clients. Hoboken, NJ: John Wiley & Sons Inc. 153–192.Grant, A. M. (2011). Is it time to REGROW the GROW model? Issues related to teaching coaching session structures. The Coaching Psychologist, 7(2), 118-126.Harkin B, Webb TL, Chang BPI, Prestwich A, Conner M, Kellar I (2016). Does monitoring goal progress promote goal attainment? A meta-analysis of the experimental evidence. Psychological Bulletin, 142 (2):198–229. DOI: 10.1037/bul0000025 .Harper, D. (2002). Talking about pictures: A case for photo elicitation. Visual Studies, 17(1), 13-26.Houts, P. S., Doak, C. C., Doak, L. G., & Loscalzo, M. J. (2006). The role of pictures in improving health communication: a review of research on attention, comprehension, recall, and adherence. Patient Education and Counseling, 61(2), 173-190.Hur, Y. J., Gerger, G., Leder, H., & McManus, I. C. (2020). Facing the sublime: Physiological correlates of the relationship between fear and the sublime. Psychology of Aesthetics, Creativity, and the Arts, 14(3), 253.Isaacson, S., Kong, S., Leech, D., & Tee, D. (2024). Unlocking potential: AI coaching in the NHS. In 2024 EMCC Global Research Conference.Jawed, S., Amin, H. U., Malik, A. S., & Faye, I. (2018). Differentiating between visual and non-visual learners using EEG power spectrum entropy. In 2018 International Conference on Intelligent and Advanced System (ICIAS), 1-4. IEEE.Johnson, K. (2018). Facebook messenger passes 300,000 bots | VentureBeat. Retrieved January 2025. Available at: https://venturebeat.com/2018/05/01/facebook-messenger-passes-300000-bots.Kardash, C. A., Amlund, J. T., & Stock, W. A. (1986). Structural analysis of Paivio’s Individual Differences Questionnaire. The Journal of Experimental Education, 55(1), 33-38.Kasilingam, D.L. (2020) Understanding the attitude and intention to use smartphone chatbot for shopping. Technology in Society, 62, 101280.Kim, J.W., Jo, H.I and Lee, B.G. (2019) The study on the factors influencing on the behavioral intention of Chatbot Service for the Financial Sector: focusing on the UTAUT model. Journal of Digital Contents Society, 20, 41–50.Koć‐Januchta, M. M., Höffler, T. N., Eckhardt, M., & Leutner, D. (2019). Does modality play a role? Visual‐verbal cognitive style and multimedia learning. Journal of Computer Assisted Learning, 35(6), 747-757.Konečni, V. J., Wanic, R. A., & Brown, A. (2007). Emotional and aesthetic antecedents and consequences of music-induced thrills. The American Journal of Psychology, 120(4), 619-643.La Guardia, J. G., Ryan, R. M., Couchman, C. E., & Deci, E. L. (2000). Within-person variation in security of attachment: A self-determination theory perspective on attachment, need fulfillment, and well-being. Journal of Personality and Social Psychology, 79, 367-384.Levie, W. H., & Lentz, R. (1982). Effects of text illustrations: A review of research. Educational Technology Research and Development, 30(4), 195-232.Locke EA & Latham GP. (2002) Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American Psychologist. 57(9):705–17. DOI: 10.1037/0003-066x.57.9.705 .Locke EA & Latham GP (1990). A theory of goal setting & task performance. Upper Saddle River, NJ: Prentice-Hall.Mai, V., Neef, C., & Richert, A. (2022). Clicking vs. writing: The impact of a chatbot’s interaction method on the working alliance in AI-based coaching. Coaching Theories & Praxis, 8(1), 15-31.Mayer, R. E. (2017). How can brain research inform academic learning and instruction? Educational Psychology Review, 29(4), 835-846.Melián-González, S., Gutiérrez-Taño, D., & Bulchand-Gidumal, J. (2021). Predicting the intentions to use chatbots for travel and tourism. Current Issues in Tourism, 24(2), 192-210.McGuire, W.J. (2012). McGuire’s classic input–output framework for constructing persuasive messages. Public communication campaigns, 133-145.Moore, G. C., & Benbasat, I. (1991). Development of an instrument to measure the perceptions of adopting an information technology innovation. Information systems research, 2(3), 192-222.Nikolajeva, M., & Scott, C. (2001). Images of the mind: The depiction of consciousness in picturebooks. CREArTA, 2(1), 12-36.Paivio, A. (1971). Imagery and language. In Imagery, 7-32. Academic Press. Paivio, A. (1990). Mental representations: A dual coding approach. Oxford University Press.Paivio, A. (2014). Mind and its evolution: A dual coding theoretical approach. Psychology Press.Paivio, A., & Harshman, R. (1983). Factor analysis of a questionnaire on imagery and verbal habits and skills. Canadian Journal of Psychology/Revue canadienne de psychologie, 37(4), 461.Prywes, Y., & Mah, E. (2019). Seeing Polaris: A Call to Integrate Visual Images into Coaching Action Plans. Philosophy of Coaching: An International Journal, May 4(1), 34-56. DOI: 10.22316/poc/04.1.04.Ryan, R. M., & Deci, E. L. (2017). Self-determination theory: Basic psychological needs in motivation, development, and wellness. Guilford Publications.Sadoski, M., & Paivio, A. (2012). Imagery and Text: A Dual Coding Theory of Reading and Writing (2nd ed.). Routledge. DOI: 10.4324/9780203801932.Samburskiy, D. (2020). The Effect of a Dual Coding Technique on Idiom Interpretation in ESL/EFL Learners. International Journal of Instruction, 13(3), 187-206.Shumanov, M., & Johnson, L. (2021). Making conversations with chatbots more personalized. Computers in Human Behavior, 117, 106627.Strohmann, T., Siemon, D., Khosrawi-Rad, B., & Robra-Bissantz, S. (2023). Toward a design theory for virtual companionship. Human–Computer Interaction, 38(3-4), 194-234. Teer Stal, S., Kramer, L.L., Tabak, M., Akker, H & Hermens, H. (2020) Design features of embodied conversational agents in eHealth: a literature review. International Journal of Human-Computer Studies,138,102409. DOI: 10.1016/j.ijhcs.2020.102409.Terblanche, N. (2020). A design framework to create Artificial Intelligence Coaches. International Journal of Evidence Based Coaching & Mentoring, 18(2).Terblanche, N., & Kidd, M. (2022). Adoption factors and moderating effects of age and gender that influence the intention to use a non-directive reflective coaching chatbot. SAGE Open, 12(2), 21582440221096136.Terblanche, N., Molyn, J., de Haan, E., & Nilsson, V. O. (2022). Comparing artificial intelligence and human coaching goal attainment efficacy. Plos one, 17(6), e0270255.Terblanche, N. H. D., Wallis, G. P., & Kidd, M. (2023). Talk or Text? The Role of Communication Modalities in the Adoption of a Non-directive, Goal-Attainment Coaching Chatbot. Interacting with Computers, iwad039.Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2003). User acceptance of information technology: Toward a unified view. MIS quarterly, 425-478.Venkatesh, V., Thong, J. Y., & Xu, X. (2012). Consumer acceptance and use of information technology: extending the unified theory of acceptance and use of technology. MIS quarterly, 157-178. Venkatesh, V., Thong, J. Y., & Xu, X. (2016). Unified theory of acceptance and use of technology: A synthesis and the road ahead. Journal of the association for Information Systems, 17(5), 328-376.Williams, G. C., & Deci, E. L. (1996). Internalization of biopsychosocial values by medical students: a test of self-determination theory. Journal of personality and social psychology, 70(4), 767.Yang, K. (2010). Determinants of US consumer mobile shopping services adoption: implications for designing mobile shopping services. Journal of Consumer Marketing, 27(3), 262-270. DOI: 10.1108/07363761011038338.Zillmann, D. (2006)/ Exemplification Effects in the Promotion of Safety and Health, Journal of Communication, 56, 1, August, S221–S237

About the authors

Prof. Nicky Terblanche is an academic, researcher, leadership coach, and entrepreneur. He has a PhD in Leadership Coaching and a master’s degree in electronic and software engineering. He is an Associate Professor of Leadership Coaching and Research Methodology at Stellenbosch Business School, South Africa. His research interests include leadership coaching with a focus on Artificial Intelligence Coaching. He also runs an executive and leadership coaching practice.

Dr. Yaron Prywes is a coaching psychologist with 15+ years of experience, specializing in visual coaching. He is writing the Visual Coaching Handbook to capture creative coaching techniques aligned with 12 schools of psychology.

Details

Owner: Aaron Worsley-Burke
Collection: IJEBCM
Version: 1 (show all)
Status: Live
Views (since Sept 2022): 352

Find uses

Table of Contents