Nicky Terblanche ✉ (Stellenbosch Business School, Stellenbosch University, South Africa) Yaron Prywes (Cglobal Consulting)
Artificial intelligence chatbots could scale coaching, however user adoption is a challenge. We investigate the effect of images on chatbot adoption and coaching efficacy by comparing a text-only coachbot (TextBot, n=126) with a text+images bot (ImageBot, n=116). We measure goal attainment and technology adoption one week apart, as well as users’ preferences for imagery and verbal modes of communication. Perceived goal attainment increased at T2 for both bots. If “Correct word usage” was important, users found the TextBot to be less “fun”. Users with lower “Imagination” also found the TextBot easier to use, and were more likely to use it. We introduce the concept of “AI coach customization”.
artificial intelligence chatbots, chatbots, coaching chatbots, AI coaching, technology adoption, UTAUT, visual verbal preference, goal attainment, visual coaching
Accepted for publication: 03 January 2025 Published online: 03 February 2025
© the Author(s) Published by Oxford Brookes University
The advent of large language models (LLMs) and their use in chatbots such as ChatGPT have propelled forward an already vibrant chatbot scene. Some estimate that there are hundreds of thousands currently active chatbots (Johnson, 2018). Although slow to start, chatbots are becoming more popular in the domain of organizational coaching. Organizational coaching is "a human development process that involves structured, focused interaction and the use of appropriate strategies, tools, and techniques to promote desirable and sustainable change for the benefit of the coachee and potentially for other stakeholders" (Bachkirova, Cox, & Clutterbuck, 2014, p. 1).
Non-directive coaching chatbots differ from traditional service chatbots. Their purpose is not to provide answers to users’ questions or requests, but rather to use question and reflective frameworks to facilitate users’ own sense making (Terblanche & Kidd, 2022). For this type of chatbot to be effective, regular engagement is necessary to continuously promote users’ reflective practice since reflection is a crucial ingredient in raising self-awareness and personal growth. One mechanism to encourage chatbot usage is to design the chatbot to appeal to users’ perceptions of ease of use and fun (Venkatesh et al., 2016). The way a chatbot interacts with users plays a significant role in the adoption of this technology and one important aspect of chatbot adoption is its modality of interaction. Interaction modalities refer to the combination of message characteristics and the channel with which they are conveyed (McGuire, 2012).
Despite the existence of general guidelines for designing AI chatbot coaches (for example Strohmann et al., 2023; TerStal et al., 2020; Terblanche, 2020), research on technology adoption factors and coaching chatbot efficacy relating to interaction modalities is still limited. Examples of recent studies that investigated the role of chatbot communication modalities include comparing voice chatbots to text-only chatbots (Terblanche, Wallace & Kidd, 2023) and typing commands versus selecting from a list of options presented to the user (Mai et al., 2022). Both these studies found differences in user’s preferences for chatbot communication modalities.
Visual images have been shown empirically to increase people’s engagement relative to a “words only” approach in a number of coaching-adjacent contexts. In education settings, studies suggest people prefer visuals over a “words alone” approach. For example, audio-visual presentations are rated as more enjoyable and interesting if accompanied by pictures (Levie & Lentz, 1982). In ethnography, photo elicitation – the use of photographs in interviews – has been found to stimulate longer and more comprehensive interviews. Collier, who coined the phrase photo elicitation, noted that using photos while interviewing had a “compelling effect upon the informant … to stimulate and release emotional statements about the informant’s life” (Harper, 2002). Finally, in health education settings, people’s emotional response to pictures has been shown to affect whether they increase or decrease target behaviors. Patients who had a positive emotional response to illustrations were more likely to increase target behaviors (Houts et al., 2006).
To summarize, there are reasons to believe that visual images may positively influence user adoption and engagement with a coaching chatbot. Similar to a presenter giving a talk, an ethnographer conducting an interview, or a doctor educating a patient, the use of a visual image in a coaching chatbot may amplify the impact of the message being sent.
Given this background, the questions that this study asks are: 1) Does adding images to a coaching chatbot increase goal attainment and technology adoption? 2) Do users’ imagination and verbal preference affect goal attainment and adoption when comparing a text-only with a text+image coaching chatbot? Finding answers to these questions may assist with the increased adoption and efficacy of AI chatbot coaching in organizations.
We apply three theoretical lenses to study this phenomenon: dual-code theory, goal theory and technology adoption theory.
Dual-code theory (DCT) (Paivio, 1971; Sadoski & Paivio, 2012) postulates that human cognition involves two distinct systems for processing information, one specializing in verbal information and the other non-verbal information, particularly imagery. Combining written and visual information has shown to enhance comprehension, problem-solving, and learning outcomes across different educational domains (Clark & Paivio, 1991; Mayer, 2017). Visual information is also typically perceived as easier, faster, and more enjoyable than written information (Grabe, 2020).
An important nuance to note is that dual-code theory also suggests that not all users will respond to visual information in the same way. Paivio (1990), for example, noted that people have preferred “modes of thinking” where some individuals prefer an imaginal mode while other individuals prefer a verbal mode. Individuals with a preference for an imaginal mode of thinking often use mental pictures to solve problems. Individuals with a preference for a verbal mode of thinking attest that most of their thinking is verbal, as if they are talking to themselves. Neurophysiological differences between people with different cognitive styles has been detected (e.g., Jawed, Amin, Malik & Faye, 2018; Paivio, 2014), and eye-tracking studies have found different eye-movement patterns between people with higher or lower scores on the visual-verbal dimension (Koc-Januchta et al., 2019).
Goal theory, with its robust foundation in empirical research and practical application, posits that goal setting is inherently motivating, linking the challenge of a goal with performance and the effort required (Locke & Latham, 2002). This theory is underpinned by five key principles for setting goals: specificity and clarity; an appropriate level of difficulty; initial and ongoing commitment; consistent feedback on progress; and manageable complexity (Locke & Latham, 1990). Goals represent the personal desire for specific outcomes (Austin et al., 1996). To improve the likelihood of achieving goals, several strategies are advocated, such as writing down goals, assigning clear metrics and timelines, and committing to these goals in a public manner (Locke & Latham, 2002).
In coaching, goal theory is employed to enhance self-regulation (Grant, 2006). The coaching process involves setting goals, creating and implementing action plans, ongoing monitoring of progress, and adjusting goals or plans as necessary based on feedback and achievements. Previous research on chatbot coaching has shown that a chatbot coach is comparable to a human coach in terms of helping clients with goal attainment (Terblanche et al, 2022). In the Terblanche et al. (2022) study a goal attainment coaching chatbot helped users achieve their goals over a period of 10 months at a rate similar to people who received human coaching. They also found that goal attainment increases as time passes. A recent pilot study by Isaacson et al. (2024) found that even one coaching session with a goal attainment AI chatbot coach improved goal progress by 10%. Therefore, we expect that, irrespective of whether text or text+imagery is used as the communication modality:
H1a: The goal attainment for TextBot and VisualBot users will both be higher at T2 compared to T1
Using images has been shown to improve comprehension, problem-solving, and learning outcomes (Clark & Paivio, 1991). Therefore, we expect that when adding images to a goal-setting chatbot conversation:
H1b: The ImageBot users will have higher goal attainment compared to TextBot users
Technology adoption is a mature research field that studies the factors that influence the uptake of a technology. Derived from the original Technology Adoption Model (TAM) (Davis et al.,1989), the Unified Theory of the Acceptance and Use of Technology (UTAUT) has emerged as the primary framework to study technology adoption (Venkatesh et al., 2003). The UTAUT version 2 considers the role that performance expectancy, effort expectancy, social influence, facilitation conditions, hedonic motivation, price value and habit have on the user’s attitude towards the technology and ultimate motivation to use it (Venkatesh et al., 2012). For the purpose of the present study, we omit price value and habit due to the novelty of AI chatbot coaches.
Performance expectancy is the degree to which an individual believes that using the system will help them perform better (Venkatesh et al., 2003). Performance expectancy has been shown to positively affect technology adoption in service-oriented chatbots (Almahri et al., 2020; Kasilingam, 2020; Kim et al., 2019; Kuberkar & Singhal, 2020; Melian-Gonzalez et al., 2021) as well as coaching chatbots (Terblanche & Kidd, 2022). Adding images to a chatbot should explain the purpose and functionality of the chatbot more clearly by placing emphasis on certain keywords and concepts like “privacy” (Bateman, 2014). We therefore expect that:
H2a: The ImageBot will be perceived to perform better (performance expectancy) than the TextBot
Effort expectancy is the level of ease associated with using a technology (Venkatesh et al., 2003, p. 450). Davis et al. (1989) found that people are more likely to use an application if they perceive it as easy to use. In previous studies, a positive relationship between effort expectancy and the intention to use the technology was found with service-oriented chatbots (Almahri et al., 2020; Kuberkar & Singhal, 2020). It is possible that adding imagery to a chatbot may enhance comprehension of the dialog, allowing users to spend less energy on deciphering instructions. However, in this case, both chatbots contain the same amount of text and the instructions are relatively straight-forward. As such, the amount of effort required to interact with the ImageBot may actually be higher as users need to scroll past numerous embedded images. Therefore, we expect that:
H2b: The TextBot will be perceived to be easier to use (effort expectancy) compared to the ImageBot
Social influence describes the extent people think other people believe they should use a technology (Venkatesh et al., 2003) and captures people’s perception of their status within a group (Moore & Benbasat, 1991). Studies have found social influence to be a significant predictor of intention to use technology service chatbots (Kim et al., 2019; Kuberkar & Singhal, 2020; Melian-Gonzalez et al., 2021) and coaching chatbots (Terblanche & Kidd, 2022). According to Dual code theory there is no reason to believe that images will affect peoples’ social standing. In addition, none of the images used in the ImageBot represent social situations or authority figures. Therefore, we therefore that the:
H2c: ImageBot and TextBot will be rated similarly in terms of social influence.
Facilitating conditions capture the infrastructure needed to use the technology (Venkatesh et al., 2003). This construct has been shown to be an important consideration for service chatbots (Kuberkar & Singhal, 2020) and coaching chatbots (Terblanche & Kidd, 2022). Adding images to a chatbot does not place any additional constraints on the technology infrastructure requirements in the present study. Therefore, we expect that the:
H2d: ImageBot and TextBot will be rated similarly in terms of infrastructure required to use them (facilitating conditions)
Hedonic motivation refers to the level of pleasure or fun derived from using a technology. It has been shown to be a significant factor in various contexts including predicting consumer’s intention to use mobile shopping services (Brown & Venkatesh 2005). Visual information is typically considered more enjoyable to interact with relative to a “words-only” communication modality (Grabe, 2020). Therefore, we expect that the:
H2e: ImageBot will be perceived to be more fun to use (hedonic motivation) compared to the TextBot
Users’ attitudes towards a technology have been widely shown to be a predictor of their intention to use the technology in the future (Ajzen, 1991; Melián-González et al, 2021). Attitude captures the user's perception of how engaging, likable and interesting a technology is. Visuals are often experienced as engaging and stimulating (Clark & Paivio, 1991). Therefore, we expect that:
H2f: Users will have a more positive attitude towards the ImageBot than the TextBot
Intention to use a technology captures users’ actual behavioral intent for engaging with the technology on an ongoing basis (Venkatesh et al. (2003). Given that we expect people to find the Imagebot more enjoyable and fun to use, we also expect that users will want to use the ImageBot in the future more so than the TextBot. Therefore, we hypothesize that:
H2g: The intention of users to use the ImageBot on an ongoing basis will be higher than the TextBot.
As noted earlier, dual code theory (Paivio, 1971; Sadoski & Paivio, 2012) posits that people have different preferences, habits and skills when it comes to processing imagery versus verbal stimuli. These preferences can be measured by an individual differences questionnaire (IDQ) (Paivio & Harshman, 1983), which examines five specific aspects related to visual or verbal processing. The first aspect is “verbal fluency”, which examines the extent to which a user has no difficulty expressing themselves verbally or with the written word. The second is “habitual use of imagery”, which examines the extent to which a user's thinking consists of mental pictures or images. The third is “correct word usage”, which captures the extent to which users are aware of, and try to adhere, to proper grammar. The fourth is “imagination”, which describes the extent to which people’s daydreams or night dreams are vivid. Finally, “reading difficulties” is the extent to which people feel they are hampered in their ability to read.
In a study that used the IDQ, it was found that multimedia presentation aids are more effective for learners who prefer visual information (Butler & Mautz, 1996). By definition, people with higher levels of visual preference prefer an interaction that contains images. As noted earlier, visual information is also typically perceived as easier, faster, and more enjoyable than written information (Grabe, 2020). As such, it is important to examine potential individual differences in processing imagery verses verbal information.
In the present study, the TextBot contained no images. The information and interactions in this condition were purely text-based. The ImageBot, on the other hand, included 12 images that complemented the written word. This inclusion of images while maintaining textual consistency across the two conditions should make it possible to detect a potential interaction between a user’s individual preference for a particular communication mode and technology adoption. Specifically, people who prefer visual modes of communication should favor the ImageBot and vice versa. Therefore, we hypothesize that:
H3: Individual differences in visual and verbal processing will influence technology adoption (UTAUT) such that people with a stronger preference for visual processing will favor the ImageBot over the TextBot.
This study followed a between-group experimental design with two groups of participants (n=242) interacting with two equivalent coaching chatbots, one using only text (TextBot, n=126) and the other with images added (ImageBot, n=116). Participants complete a demographic survey as well as measures on technology adoption and goal attainment after an initial engagement with the chatbots. Technology adoption and goal attainment were measured again one week later. Individual participant differences were also measured in terms of imagery and verbal habits and skills.
Two coaching chatbots were created with identical conversation scripts based on the well-known GROW coaching model (Grant, 2011). For the second chatbot, images were added in certain parts of the conversation to illustrate the text. Figure 1 displays side-by-side screenshots to show the reader how images were embedded in the coaching conversation. The chatbot with images is referred to as “ImageBot”, while the chatbot without images is referred to as “TextBot”.
Both chatbots take the user through the same structured, scripted conversation flow. The interaction begins with the chatbot introducing itself, explaining its capabilities and aim (to help solve a workplace challenge). It then proceeds to ask users questions following the GROW model:
Again, the users’ interaction with the two chatbots was identical except for the added images in the ImageBot.
While research has shown that visually illustrating a particular message can enhance the impact of a message (i.e., Zillmann, 2006), currently there is no overarching criteria or method to guide the use of images in research. Some studies use pre-existing images based on the researcher’s own subjective choice (e.g., Konecˇni et al., 2007; Samburskiy, 2020). Others rely on participants to generate images (e.g., Clark-Ibanez, 2004), or on researchers to generate images (e.g., Loewenthal, 2020). Yet others strive to combine researcher and participant input. For example, Hur et al. (2020) asked participants to bring in photographs related to two keywords (i.e., “fearful” or “happy” photographs), and then filtered the photographs based on the aims of their study. In a somewhat similar fashion, the present study aimed to leverage both researcher input and lay perspectives.
The following procedure was used to select this study’s images for the ImageBot. First, it was determined that the images utilized should be complementary to the text in the chatbot script, as opposed to having a contradictory or independent relationship to the text (Bateman, 2014). In other words, the goal of the images chosen was to support and potentially amplify the message conveyed in words by chatbot, as opposed to undercutting or deviating from the text message.
Second, the script of the chatbot utilized in this study was examined for “keywords”. Keywords are words or phrases that are central to deciphering the meaning of a sentence. For example, in the sentence “I am a chatbot coach”, the word “chatbot” is a keyword. Twelve such keywords were identified. Each keyword was entered into the search feature of a database of freely usable images called Unsplash.com. The website filters relevant results from the approximately 3 million high-resolution images stored in its database. The researchers then selected four images per keyword from the presented results using popularity (i.e., the most combined views and downloads) and the notion of compatibility to the chatbot text as criteria.
Next a crowdsourcing platform (Prolific) was used to ask 100 participants to rank the four images associated with each keyword. Participants were asked to share which image in their view best matched the keyword. Participants ranked the four images with 1 being their favorite (best match) and 4 their least favorite (worst match). The highest-ranking images per keyword were used in the ImageBot. In total, 12 images were inserted in the chatbot conversation. Figure 2 displays six user selected images with associated text.
Participants were sourced from a crowdsourcing platform called Prolific. The use of crowdsourcing platforms has become a popular way to collect data (Buhrmester et al, 2018) and Prolific is considered one of the most reliable platforms currently available (Douglas et al., 2023).
This study utilized a between group design where different groups of participants used the TextBot and ImageBot. In total 249 participants signed up for the study on a first-come basis. The gender split was 124 female, 124 male with one participant selecting “Neither”. In the end 126 participants completed the TextBot conversation and 116 the ImageBot conversation. TextBot participants were recruited first and excluded from participation in the ImageBot experiment. The mean age was 40 (sd=10.29) with most respondents falling into the 25 to 44 categories (67%). Participants were paid $6 to participate in this study. Informed consent was obtained from all participants before data collection started. The study was approved by the first author’s research institution.
Participants completed an initial survey capturing demographics and individual preferences in terms of imagery and verbal habits and skills. After having one conversation with the chatbot, they completed a survey measuring technology adoption and goal-attainment (T1). These constructs were measured again one week later (T2). Participants did not interact with the chatbots between T1 and T2.
Demographic questions were assessed including age, gender, industry employed in and current job level. To measure individual preferences in terms of imagery and verbal habits and skills, a questionnaire based on Paivo’s Individual Differences Questionnaire (IDQ) (Paivo et al., 1983) compiled by Kardash et al., (1986) was used. This IDQ survey consisted of 33 questions that measured five constructs: Verbal fluency (14 questions), Habitual use of imagery (four questions), Correct word usage (eight questions), Imagination (three questions) and Reading difficulties (four questions).
To measure technology adoption we used a 33-item adapted version of the UTAUT2 survey (Venkatesh et al., 2003) that included measures for performance expectancy - PE (five questions), effort expectancy - EE (six questions), social influence - SI (four questions), facilitating conditions - FC (six questions), hedonic motivation - HE (three questions), attitude towards the chatbot - AT (five questions) and intention to use the chatbot in future - BI (four questions). Finally, goal attainment was measured using the perceived competence measure (four items) adapted from Williams and Deci (1996).
All measurements used a five-point Likert scale with 1 being Strongly Disagree and 5 Strongly Agree.
A two-way Analysis of variance (ANOVA) was conducted to address this study’s hypotheses. The two chatbots and the five constructs of the image/verbal IDQ were inputted as independent variables. The dependent variables were the seven constructs of technology adoption (UTAUT) and goal attainment. To compare the TextBot verses the ImageBot conditions, a main effect was reported. The effect of the five IDQ constructs were investigated by focusing on the interaction effect between the chatbot and IDQ_construct. When the interaction was significant, it implied that the particular IDQ construct influenced the experience of the users, and that a difference existed between the two chatbot conditions.
Effect sizes were reported using Cohen’s d effect sizes. Normality assumptions were assessed by inspecting normal probability plots. These were all found to be acceptable.
This study’s first research question asked, “Does adding images to a coaching chatbot increase goal attainment and technology adoption?”. Results for the first hypothesis (that Goal attainment for TextBot and VisualBot users will be higher at T2 compared to T1) indicate that for both chatbots, a significant increase was detected in users’ perceived goal attainment one week after using the chatbots (p=0.02). Figure 3 illustrates this result. Therefore, we accept H1a.
Table 1 below compares the results of users’ perceptions of interacting with the TextBot versus the ImageBot. It summarizes the means, p-value and effect size related to seven technology adoption factors and goal attainment. Overall, no statistically significant difference was detected between the two bots on any of the dependent variables. Therefore, we reject H1b (goal attainment), H2a (performance expectancy), H2b (effort expectancy), H2e (hedonic motivation), H2f (attitude) and H2g (behavioral intent). Regarding H2c (social influence) and H2d (facilitating conditions) we expected no difference between the bots and no difference was detected. Therefore, we accept these two hypotheses.
The second research question asked, “Do users’ imagination and verbal ability affect adoption and goal attainment perception when comparing a text-only with a text+image coaching chatbot?”. A number of hypotheses examined individual differences between users in terms of their imagination and verbal abilities, and a potential corresponding preference for a particular chatbot. Results indicate that most of the IDQ constructs had no significant effect on UTAUT measures. That being said, evidence was detected suggesting that for two of the five text and image-based individual predispositions (“Imagination” and “Correct word usage”), certain UTAUT constructs were in fact affected.
For the IDQ construct of “Imagination”, users with lower “Imagination” found that the TextBot required less effort (effort expectancy) than the ImageBot (p=0.05) (Figure 4). Similarly, users with lower “Imagination” were more likely to use the TextBot compared to the ImageBot (behavioral intent, p=0.06) (Figure 5). However, these effects disappear with higher levels of “Imagination”. This points to the possibility that adding images to the chatbot conversation is a hindrance to people with lower levels of imagination.
For the IDQ construct of “Correct word usage”, results show that if “Correct word usage” is important, there is a tendency for users to experience the TextBot as less “fun” (Hedonic motivation) to use (p=0.07) (Figure 6). These three results suggest limited support for H3, as only three UTAUT constructs were affected by users’ image and verbal preferences.
A summary of results for all this study’s hypotheses are displayed in Table 2.
This study asked two research questions: 1) Does adding images to a coaching chatbot increase goal attainment and technology adoption? 2) Do users’ imagination and verbal ability affect goal attainment and adoption when comparing a text-only with a text+image coaching chatbot?
For research question one, the overall results suggest no significant difference exists between the TextBot and ImageBot in terms of their impact on user goal attainment and technology adoption. This is somewhat surprising given dual-code theory and the literature on visuals cited above. To state the obvious, we had anticipated that ImageBot users would perceive both higher goal attainment and technology adoption relative to TextBot users. One possible explanation for these results is that the images used during the coaching conversation were not chosen by the chatbot users. They were in fact chosen by a separate sample of participants, as detailed in the methods section. Furthermore, the images utilized illustrated a generic goal setting process, rather than the actual goals users had in mind. If the ImageBot had asked participants to generate an image or set of images related to their goal, would this have had a significant impact on their user experience? It seems possible that if users had input into the visual dimension of their experience, it may have enhanced their engagement, and ultimately their degree of goal attainment and technology adoption.
Regarding technology adoption, we expected five of the seven UTAUT constructs to be higher for the ImageBot given that studies in other fields such as healthcare education found that the use of images increased desired behavior (Houts et al., 2006). However, this was not the case in this study. One possible explanation is that the images used merely illustrated the conversation text (see Figure 2) and were not used or need to explain or simplify complex concepts. The concepts covered in the coaching conversation were straightforward and easy to understand without the use of images, potentially negating the explanatory impact that visuals can have. As noted above, another factor that may have contributed to the lack of superior efficacy of the ImageBot is that the images were pre-selected. Users had no control over customizing the images in their chatbot experience. In the field of media studies, it is well established that people can interpret and relate to the same image differently. As such, even though the images in this study were selected using an objective, democratic process, it is possible that the images ultimately did not appeal to everyone, or were in fact, disliked by some users.
One clear result from this study is that irrespective of chatbot type (text or images+text), both chatbots led to an improved perception of goal attainment one week after using the chatbot. This result adds to a growing number of studies that show the remarkable efficacy of chatbot coaches to assist with goal attainment (see for example Terblanche et al., 2022). Goal attainment is considered by many as the essence of organizational coaching (e.g., Grant, 2006) and this result adds to the evidence that even relatively simple coaching chatbots can be effective in promoting individual goal attainment.
Research question two looked at the data from an individual difference perspective, namely that differences exist in users’ preferences to imaginal and verbal modes of communication. Certain differences were observed in this study between TextBot and ImageBot users. Users who perceive themselves as having low “Imagination” found the ImageBot more difficult to use (effort expectancy) and their intention to use the ImageBot was lower than TextBot users. It seems that adding images may have distracted these users and added a cognitive load to their processing of the coaching conversation text. For users with higher levels of “Imagination” this effect disappeared, suggesting that users with higher imagination may in fact find the ImageBot easier to use compared to the TextBot. For users who were particular about “Correct word usage”, the fun aspect of using the TextBot (hedonic motivation) decreased with an increase in “Correct word usage”. On the other hand, for ImageBot users the level of correct word usage of the users did not play a role in their experience of “fun” in interacting with the ImageBot. This result suggests that adding images to a chatbot coaching conversation may, to a certain extent, neutralize the irritation experienced by users with a high need for “correct word usage”.
An important concept that emerged from this study is that different coaching chatbot modalities may appeal to different types of individual users. In the present study this effect is evident in users with varying levels of “Imagination” and need for “Correct word usage”. This finding adds to initial evidence from other studies on the positive effect on chatbot adoption and efficacy when chatbots are customized to meet individual user needs, such as extraversion (Shumanov & Johnson, 2021; Terblanche et al, 2023) and preferences relating to clicking on predefined options versus typing instructions into a chatbot (Mai & Richert, 2022). This emerging phenomenon suggests that coaching chatbots could be tailored to suit individual preferences. Customization of coaching chatbots could help address the current challenge in AI coaching uptake and perceived efficacy.
This study sought to examine the impact that “illustrating” text messages conveyed by a coaching chatbot may have on a user’s experience. Visual scholars (i.e., Bateman, 2014; Nikolajeva & Scott, 2001) note that an image can amplify or reduce the message conveyed by text. However, no overarching criteria or guidelines exist to guide researchers in the selection and potential integration of visuals with text. This study utilized a method where an initial batch of images were selected by researchers who entered “key words” into a free image-bank. This batch was dwindled down using a pool of 100 lay participants, who ultimately selected the final images presented to Imagebot users. This method led to mixed results. As such, future research may consider having end-users generate images for themselves. Such processes are more “client-centric", and have yielded anecdotal success in coaching contexts. For example, Prywes and Mah (2019) encourage clients to choose inspirational images to help them implement their custom coaching action plan.
Future studies may also explore chatbots that give “equal weight” to both image and text, as is done in comic books and illustrated books. A more “visual centric” experience, rather than a “word centric” experience, may result in higher user engagement in certain populations.
An additional limitation of this study is that some of the keywords utilized in the chatbot script may have been more imageable than others. According to dual-code theory, concrete and abstract words are represented in the verbal system, whereas the imaginal system only links to concrete words (e.g., words that are readily imaginable). Perhaps some of the keywords identified in the chatbot script were more imageable than others? If so, this may have had an impact on the images chosen, and ultimately participant engagement. Future research should seek to investigate this effect by accounting for the imaginability of keywords.
Finally, it is worth noting how imagery is commonly used today in the realm of personal fitness apps. For example, the standard Health app on recent iPhones utilize a dynamic visual of a “circle” to encourage people’s to complete their “step loop” (e.g., achieve their daily target of steps). Therefore, future research may also seek to investigate the potential role of dynamic images and visuals aids to motivate progress.
This study set out to investigate the effect of adding images to a coaching chatbot on users’ goal attainment and their willingness to engage with a chatbot. The results indicate that while the addition of images did not make a significant difference on an individual user level, some differences did materialize in the experiences of ImageBot and Textbot users. These differences suggest that coaching chatbots can be customized to appeal to different types of users, thereby potentially improving the adoption rate and efficacy of AI chatbot coaching. This study thus contributes to the important, emerging field of AI chatbot coaching by introducing the concept of AI coach customization, as well as confirming the goal attainment abilities of AI coaching. As AI coaching adoption increases, human coaches should take note of the findings of this study. There is compelling evidence that AI chatbot coaching improves goal attainment and can therefore enhance human coaching. Human coaches should also be aware that not all chatbot coaches are alike and they should take care to match the most appropriate chatbot with their specific clients’ preferences.
Prof. Nicky Terblanche is an academic, researcher, leadership coach, and entrepreneur. He has a PhD in Leadership Coaching and a master’s degree in electronic and software engineering. He is an Associate Professor of Leadership Coaching and Research Methodology at Stellenbosch Business School, South Africa. His research interests include leadership coaching with a focus on Artificial Intelligence Coaching. He also runs an executive and leadership coaching practice.
Dr. Yaron Prywes is a coaching psychologist with 15+ years of experience, specializing in visual coaching. He is writing the Visual Coaching Handbook to capture creative coaching techniques aligned with 12 schools of psychology.