Nicky Terblanche ✉ (University of Stellenbosch Business School) Joanna Molyn ✉ (Oxford Brookes University) Erik De Haan ✉ (VU University, Amsterdam ) Victor O. Nilsson ✉ (Hult International Business School)
There is limited empirical efficacy evidence on the confluence of artificial intelligence (AI) and organisational and life coaching. Coaching “works” but is often unavailable or unaffordable. AI could scale coaching to reach a wider audience, however, we do not yet know how well AI coaching “works”. This replication randomised controlled trial longitudinal study tested the efficacy of a chatbot AI coach called Vici. An experimental group (n=75) used Vici for six months. Eight measurements on goal attainment, resilience, psychological wellbeing, and perceived stress were collected from the experimental and control group (n=94). Data was collected at baseline, after each of the six chatbot usage months, and three-months later. The experimental group showed a statistically significant increase in goal attainment, while all other measures yielded non-significant results. Using AI, goal and control theories we interpret these results to indicate that AI coaching is effective in a narrow application, suggesting that AI could democratise coaching in a cost-effective, scalable manner.
artificial intelligence coaching, AI coaching, chatbot coach, goal attainment, psychological wellbeing, resilience, perceived stress, positive psychology
Accepted for publication: 14 July 2022 Published online: 01 August 2022
© the Author(s) Published by Oxford Brookes University
Internationally, coaching is a fast-growing multi-billion dollar industry (International Coaching Federation [ICF], 2020a) with wide-ranging, proven benefits to individuals (Athanasopoulou & Dopson, 2018; Blackman, Moscardo and Gray, 2016; Grover & Furnham, 2016). However, coaching is not widely available (Shoukry & Cox, 2018) or affordable (Terblanche, Passmore and Myburgh, 2021) in all societies. Artificial intelligence (AI), able to replace certain human expertise at a vastly reduced cost, could be a potential game changer for coaching (Acemoglu & Restrepo, 2018).
AI as a concept originated in the 1950s and has seen a number of surges and declines (Haenlein & Kaplan, 2019). However, in recent years there has been renewed interest in the application of AI in numerous contexts including helping professions such as psychology, healthcare (Kamphorst, 2017) and, more recently, coaching.
Coaching is defined as a one-on-one structured conversation between a coach and client with the aim of facilitating sustainable change for the individual and potentially other stakeholders (Bachkirova, Cox and Clutterbuck, 2014). Research in coaching has grown substantially (De Haan, 2021; De Haan & Nilsson, in press). Although it is recognized that proving coaching efficacy is difficult and expensive research, as is the case for therapy, this type of research is imperative to verify the claims of practitioners and to understand the factors, processes and mechanisms underlying the interventions (Grover & Furnham, 2016). Examples of recent, well designed efficacy studies include Jones, Woods and Zhou (2021) who found that individual characteristics influence coaching efficacy, and Fontes and Dello Russo (2021) who found that coaching was associated with increases in psychological capital, job attitudes and aspects of job performance. In addition, there are a number of meta-studies that provide compelling evidence that coaching has positive outcomes for the individuals and their organisations (Athanasopoulou & Dopson, 2018; Blackman et al., 2016; De Haan, 2021; Grover & Furnham, 2016; Jones, Woods and Guillaume, 2016; Theeboom, Beersma and van Vianen, 2014).
There appears to be broad agreement on coaching efficacy. Evidence of AI efficacy in the coaching domain however is sparse. There are studies on virtual and e-coaching (see for example Geissler, Hasenbein, Kanatouri and Wegener, 2014), but these studies involve human coaches and cannot be considered AI coaching (Terblanche, 2020). In the coaching domain AI research seems to be limited to conceptual papers (Graßmann & Schermuly, 2020; Terblanche, 2020), technology adoption studies (Terblanche & Cilliers, 2020), or the use of AI entities that employ pure psychological approaches (Ellis-Brush, 2021). Studies where human coaches are replaced entirely by AI are more common in the related fields of psychology and healthcare and will be discussed later in this paper (see Gaffney, Mansell & Tai 2019; Lattie, Adkins, Winquist, Stiles-Shields, Wafford and Graham, 2019).
Although coaching has been shown to be effective in developing people, the use of coaching especially in organisational settings is still relatively limited as observed by Shoukry and Cox (2018). Cost is partly to blame as Terblanche et al. (2021), for example, found that the average fee for executive coaching in the USA is 300 USD per hour and even in lower income countries like South Africa, the average rate is around 100 USD per hour. Employing AI coaching could significantly reduce the cost of coaching and democratise this service, allowing people, who would not normally be able to afford it, to benefit. This is especially true in the developing world. Clearly AI coaching holds potential to expand the benefits of coaching beyond the current limited reach; however, the question remains whether AI coaching “works”?
To address this gap in knowledge and to take a first step towards assessing AI coaching efficacy, we used a goal-attainment coaching chatbot called Vici to replicate a previous human-coach randomised controlled trial (RCT) longitudinal study currently undergoing peer review. The original study, situated in positive psychology, cognitive behavioural and goal attainment theories found that human coaches were able to improve goal attainment, psychological wellbeing, resilience and reduce stress. Our study aimed to investigate to what extent an AI coach could also improve these aspects in order to address the important issue of the accessibility of coaching.
The literature review that follows has three objectives. Firstly, we justify why we chose to measure goal attainment, psychological wellbeing, resilience, and stress. Secondly, we explore the theoretical foundations of coaching efficacy in order to understand why AI coaching may be effective. Thirdly we present empirical evidence from recent research on the current abilities of AI to facilitate desired coaching outcomes. This overview forms the basis of interpreting the results of the present study.
The continued growth of the coaching industry precipitates the effort by researchers to study and understand the coaching phenomenon. The increase in coaching research is evident from several recent meta-studies that have classified and examined coaching outcomes. As far back as 2008, the overview by Kombarakaran, Yang, Baker and Fernandes (2008) found that coaching helps to improve people management, relationships with managers, goal setting and prioritisation, engagement and productivity, and dialogue and communication. Theeboom et al.’s (2014) meta-study found that coaching has a significantly positive effect on five individual-level categories: performance and skills; wellbeing; coping; work attitudes; and goal-directed self-regulation. In their meta-study, Blackman et al. (2016, p. 469) found a host of positive outcomes including: improved work/life balance; psychological and social competencies; self-awareness and assertiveness, increasing confidence; developing relationships, networks and interpersonal skills; adapting to change more effectively; helping to set and achieve goals; role clarity; and changing behaviors.
From these meta-studies, it is apparent that while coaching clearly has positive outcomes, there is little consensus in the literature regarding the most appropriate criteria for classifying and evaluating coaching outcomes (De Haan & Nilsson, in press; Grant, Passmore, Cavanagh and Parker, 2010; MacKie, 2007; Smither, 2011). To address this issue, Jones et al. (2016) proposed a three-component classification framework for coaching outcomes comprising of cognitive, skills-based and affective categories. They stated that cognitive outcomes are typically guided by goal-setting and that many of the coaching outcomes are affective in nature, including development of self-efficacy, confidence, reduced stress, increase satisfaction and motivation. In line with this framework, the meta-study by Athanasopoulou and Dopson (2018) found that positive outcomes for the coachee include personal development specifically related to reduced stress/anxiety, increased work/life balance (wellness), improved resilience and improved goal-setting.
The choice of measurement constructs for the present study (goal attainment, psychological wellbeing, resilience, perceived stress) is therefore validated by the coaching outcomes framework of Jones et al. (2016) and the findings of Athanasopoulou and Dopson (2018). The constructs are also supported by other studies who found coaching to increase goal attainment (Diller, Muehlberger, Braumandl and Jonas, 2020; Grant, Curtayne and Burton, 2009; Spence & Grant, 2007), wellbeing (Duijts, Kant, van den Brandt and Swaen, 2007; Govindji & Linley, 2007; Grant et al., 2009; Spence & Grant, 2007; Theeboom et al., 2014), resilience (Grant et al., 2009; Green, Grant and Rynsaardt, 2020), and to reduce stress (Grant et al., 2009; Junker, Pömmer and Traut-Mattausch, 2020). The measurement constructs for the present study were also selected in order to compare the results directly with a similar study involving human coaches currently undergoing peer review.
As an emerging, multi-disciplinary field of research and practice, coaching has numerous theoretical underpinnings that explain its efficacy (Grant, 2014; Shoukry & Cox, 2018). The main theories informing the present study are goal and control theory. Goal attainment is considered the hallmark of coaching that sets it apart from other helping professions (Grant, 2012; 2014). Goal theory is an overall approach to motivation that emphasises the need to establish goals as intrinsic motivation. A relationship exists between goal difficulty, level of performance, and effort involved (Locke & Latham, 2002). Locke and Latham (1990) state five principles of goal setting: clarity (specific and clear), challenging (sufficiently difficult), commitment (buy-in from onset), and feedback (regular stock-taking on progress). In essence, goals are ‘internal representations of desired states or outcomes’ (Austin & Vancouver, 1996, p.388). Goal theory is relevant to coaching as a mechanism to facilitate self-regulation (Grant, 2006). Grant (2006) holds that coaching is at its core a process of goal-focused self-regulation where individuals set goals, develop and execute action plans, monitor progress and change either goals or action plans based on feedback and progress (Carver & Schneiner, 1998). There are various types of goals, for example proximal (short-term) and distal (long term) goals (Grant, 2006). Control theory compliments goal theory and suggests that the process of monitoring influences goal progress. Monitoring helps to translate goals into actions and actions lead to progress (Harkin, Webb, Chang, Prestwich, Conner, Kellar, Benn and Sheeran, 2016)
Artificial intelligence (AI) has gained renewed prominence in recent years, likely as a result of the announcement of the arrival of the Fourth Industrial Revolution in 2016 (Schwab, 2017). AI is considered by some as one of the most significant developments in human history with the potential to disrupt all aspects of human life (Acemoglu & Restrepo, 2018). AI is defined as “the broad collection of technologies, such as computer vision, language processing, robotics, robotic process automation and virtual agents that are able to mimic cognitive human functions” (Bughin & Hazan, 2017, p. 4).
A distinction can be made between artificial narrow intelligence (ANI), which refers to systems that can perform only a very specific task in a narrow context; artificial general intelligence (AGI), which refers to systems that are at least as intelligent as humans and can apply their learning in different contexts; and artificial super intelligence (ASI), which refers to systems that can outperform humans in most dimensions (Bostrom, 2014; Shanahan, 2015; Siau & Yang, 2017). AGI and especially ASI are currently still in their infancy with no clear indication of reaching maturity in the foreseeable future. ANI, however, is showing steady signs of progress with encouraging results in specific applications, such as speech recognition and self-driving cars (Panetta, 2018). Expert systems, relevant to the present study are considered a form of ANI and are defined as complex software programmes based on specialised knowledge, able to provide acceptable solutions to individual problems in a narrow topic area (Chen, Hsu, Liu and Yang, 2012; Telang, Kalia, Vukovic, Pandita and Singh, 2018).
Given the current limitation around the levels of general intelligence of AI, ANI expert systems should be employed when creating AI coaches (Terblanche, 2020). At present, chatbots (an instance of ANI) in particular is a promising technological platform for creating AI coaches (Terblanche, 2020). Chatbots are defined as computer programmes that interact with users via natural language either through text, voice, or both (Chung & Park, 2019). Chatbots can be embodied (have a physical presence) or disembodied (no physical presence) (Araujo, 2018). Considerations of the level of human-likeness (anthropomorphism) of an AI entity are important as it has been shown to influence aspects such as resilience (De Visser, Monfort, McKendrick, Smith, McKnight, Krueger and Parasuraman, 2016) and even gambling frequency (Riva, Saachi and Brambilla, 2015).
Research into the efficacy of AI in the coaching domain as defined in the present study is sparse. To gain insight into the potential efficacy of AI coaching, we turn to the related fields of psychology and healthcare where AI has been studied for longer. We start by reviewing two specific examples of AI application, one in psychology and one in healthcare before reviewing two relevant meta-studies.
A study by Fulmer, Joerin, Gentile, Lakerink and Rauws (2018) investigated the efficacy of a cognitive behavioral therapy (CBT) based AI agent to reduce self-identified symptoms of depression and anxiety in college students. They found that the AI reduced symptoms of depression (after two weeks) and anxiety (after four weeks) in the user group relative to a control group. They conclude that AI can serve as a cost effective and accessible therapeutic agent. In a RCT study from the healthcare domain, Greer, Ramo, Chang, Fu, Moskowitz and Haritatos (2019) investigated the feasibility of improving key psychosocial well-being outcomes in young adults treated for cancer by delivering positive psychology skills via a chatbot. They found that after four weeks the experimental group had reduced anxiety compared to the control group, however there were no significant changes in depression, positive or negative emotions.
Two meta-studies provide a general overview of AI efficacy in psychology. The first is a meta-study by Lattie et al. (2019) on digital mental health interventions for depression, anxiety and psychological wellbeing among college students. They found that the majority of the 89 programs evaluated were either effective or partially effective in producing beneficial psychological changes, however they conclude that generally the research quality was problematic. The second meta-study by Gaffney, Mansell and Tai (2019) looked at the ability of conversational agents to treat mental health problems. They found that all 13 studies reviewed (eight were RCT) reported some reductions in psychological distress. Five of the RCT studies demonstrated significant reductions in psychological distress compared with control groups while three RCT studies failed to demonstrate superior effects. They conclude that efficacy and acceptability of conversational agent interventions for mental health problems are promising, but that more robust experimental designs are required to demonstrate efficacy and efficiency. It appears therefore that AI can be effective in treating psychological and health related issues, suggesting that AI could potentially also be effective in the coaching domain.
Given the preceding overview and discussion on coaching efficacy, general AI capabilities and current AI efficacy in fields related to coaching, we now summarise our position and state hypotheses for the present study.
We aimed to investigate the ability of an AI coach, Vici, to improve the goal attainment, psychological wellbeing, resilience and to reduce perceived stress of users. The review has shown that AI agents are able to improve psychological and health related outcomes relative to a control group. We therefore expect that our AI coach, Vici may have some efficacy. A limitation of Vici is that it was designed based on expert system and artificial narrow intelligence principles whereby its functionality is limited to a specific objective, that of assisting with goal-attainment. It is therefore reasonable to expect that:
H1: Using the AI Coach will provide a statistically significant improvement in goal-attainment for the experimental group relative to the control group.
A number of studies from the psychological and healthcare domains have demonstrated the ability of AI to improve psychological wellbeing, resilience and reduce stress. The AI entities used in these studies were specifically designed to incorporate CBT practice, known to positively affect a range of psychological outcomes. Although Vici was not designed to improve psychological state, there is a positive link between goal attainment and wellbeing. Research has shown that the feedback provided to people through achieving their goals could help them to overcome obstacles and ultimately could lead to enhanced wellbeing (Koestner, Lekes, Powers and Chicoine, 2002; Niemiec, Ryan and Decci, 2009; Sonnentag, 2002). Therefore, even though Vici only assists with goal attainment we still expect that:
H2: Using the AI Coach will provide a statistically significant improvement in psychological wellbeing for the experimental group relative to the control group.
In terms of the final two constructs, resilience and perceived stress we do not expect Vici to be of assistance because it was not designed to improve these facets and there is no clear link between goal attainment (Vici’s primary focus) and resilience and perceived stress. We included these measures since our research is a replication study of a human coaching efficacy study that measured these aspects. We therefore state our final two hypotheses as follows:
H3: Using the AI Coach will not provide a statistically significant improvement in resilience for the experimental group relative to the control group.
H4: Using the AI Coach will not provide a statistically significant decrease in perceived stress for the experimental group relative to the control group.
An email invitation to participate in this research was sent to undergraduate students studying at a business school in the United Kingdom. Their fields of study included business management, economics, marketing, tourism, events management and logistics. They were offered employability-scheme credits to encourage participation. In total, 268 students responded to the first baseline questionnaire, who were then randomly allocated into two different groups: one experimental group (n=134) receiving AI coaching and one non-intervention control group (n=134). After the initial baseline survey, 97 participants who had been allocated to the experiment engaged with the chatbot coach and responded to the second survey, while 121 participants in the control group responded to the second survey. A further 22 participants in the experimental group dropped out through the period of data collection and 27 of the control group dropped out, leaving a final sample of 168 participants (56% females) with 75 in the experimental group and 94 in the control group. Their average age was 22 years old (sd = 4.96) and the two groups did not significantly differ in either gender distribution or age distribution.
In order to capture a placebo effect a study design requires subjects to not be aware whether they receive treatment or not (Wampold & Imel, 2015). In that sense we were unable to investigate the placebo effect in our study as coached students were given access to Vici almost immediately after the start of the experiment. The students in the control group were also aware of not having access to the AI coach. To address outcomes expectancy (Colagiuiri & Smith, 2012) the control group received a fact sheet that provided information about goal attainment, psychological wellbeing, resilience and perceived stress. They were also asked to think of and specify goals they wanted to achieve over the next 10 months.
In terms of sample size, we performed a power-analysis based on a Mixed Factorial MANOVA over eight time points between two groups using G*Power 3.1.9.7 (Faul, Erdfelder, Lang & Buchner, 2007). A meta-analysis indicated that coaching has a lower effect on variables such as psychological well-being compared to goal attainment (de Haan & Nilsson, 2021, unpublished). Thus, to be able to identify an effect on all measured scales, a power-analysis was conducted based on results from a previous psychological well-being study rather than previous goal-attainment effects. Previous research on the impact of coaching on psychological wellbeing showed an effect of ηp2 = .033 (de Haan, Gray & Bonnywell, 2019) which was used as a baseline for the analysis. The analysis indicated that a total sample size of 126 participants is required to identify the given effect size with a power of .80 and alpha level of .05. The true sample size of this project (N = 169) provided a 90% probability of detecting the effect size which suggest that the sample size in this study is appropriate for answering the hypotheses.
The data collection took place over eight time points from October 2019 until September 2020. T1 captured pre-study scores of demographics, independent and dependent variables. T2 to T7 was one month apart and required participants in both groups to complete the same survey measuring the independent and dependent variables at the end of the month. The experimental group had to use the AI coach at least once before completing the monthly survey. There was a three-month lapse between T7 and the final data collection point T8. During this period experimental group participants were free to use the AI coach optionally. Both control and experimental groups completed the same surveys, and these surveys were sent to both groups at the same time. The participants were also asked to write down two goals that they were aiming to work on over the duration of the study. Each student had a unique log-in for their survey which allowed them to see their original goals every time they logged in and responded to the survey. One of the researchers sent out reminders to the students to complete the survey if they had not engaged with the data collection.
Vici, the AI coach used in this study is a custom development, text-based chatbot designed according to the DAIC (Designing AI Coaches) framework (Terblanche, 2020) and deployed on the Telegram instant messaging platform. The DAIC framework merges human coaching efficacy aspects of a strong coach-coachee relationship with chatbot design best practices. It recommends that, due to the existing limitation of ANI, the AI coach should be designed to fulfil one specific coach task only, in line with expert system design principles (Terblanche, 2020). This recommendation was used to inform the design of Vici to have the sole purpose of being a goal-attainment chatbot coach. The choice of goal attainment as focus for Vici was based on the fact that, while coaching has numerous positive outcomes, goal attainment is arguably what distinguishes coaching from other similar helping professions (Grant, 2012; 2014). A further recommendation of the DAIC framework is that the AI coach must be designed based on proved coaching theories. As such, Vici uses goal-attainment theory (Grant, 2014; Latham & Locke, 2007) and the GROW coaching model (Grant, 2011).
Vici’s sole coaching objective is assisting people with goal achievement. Vici helps users to identify goals, specify actions to reach the goals, monitor the progress of goals and actions, and to adjust any if necessary. Vici also helps users to distinguish between proximal and distal goals (Latham & Locke, 2007) and keeps track of both. Vici operationalises goal-attainment theory through two types of coaching conversations: initial goal-setting and progress-tracking. The initial goal-setting conversation follows the GROW coaching model (Grant, 2011). Developed by Graham Alexander in 1984 (West & Milan, 2001) and popularised by Whitmore (2003) the GROW Coaching model consists of goal-identification, reality checking, identification of options available to reach the goal, and the will and commitment to reach the goal. The progress-tracking conversation records, monitors and assesses the progress made towards proximal and distal goals and optionally allows goals to be changed. Vici was available 24/7 to the experimental group.
In this study, we report all measures, manipulations and exclusions.
The authors adapted goal attainment to be suitable for the purpose of this article based on the scale from Grant et al. (2009). The participants in this study were asked to write down two goals they would work on over the time and they were instructed that the goals should be something challenging that is either new or something that has been difficult in the past. The participants then rated the two self-assigned goals separately in terms of both success and difficulty. The success-score was measured from 0% (no achievement at all) up to a 100% (total success), while the difficulty scale was measured on a 7-point Likert scale from ‘very easy’ to ‘very difficult’. Goal attainment scores were then computed by multiplying the success score with the difficulty score for each goal and then creating a mean score between the two goals to create a composite scale.
The authors used the long version of the Warwick-Edinburgh Mental Wellbeing Scale (WEMWBS) (Stewart-Brown, Tennant, Tennant, Platt, Parkinson and Weich, 2009) to measure the psychological wellbeing of the participants. This 14-item scale measures their perceived psychological wellbeing using a 5-point Likert scale, from ‘None of the time’ to ‘All of the time’. An example item is: “I’ve been feeling relaxed”. The internal consistency analysis of the WEMWBS showed a Cronbach’s Alpha of 0.93 which indicates a high internal consistency.
Brief Resilience Scale (BRS) is a 6-item scale, measured by a 7-point Likert Scale from ‘Strongly disagree’ to ‘Strongly agree’ (Smith, Dalen, Wiggins, Tooley, Christopher and Bernard, 2008). The BRS includes items such as “I tend to bounce back quickly after hard times”. The internal consistency analysis of the BRS showed a Cronbach’s Alpha of 0.77 which indicates an acceptable internal consistency.
This project measured the participants’ stress level by using the Perceived Stress Scale (PSS) (Cohen, Kamarck and Mermelstein, 1983). This is a ten-item scale that measures how often stressful events occurred the last month using a 5-point Likert scale from ‘Never’ to ‘Often’. An example of an item was: “How often have you been able to control irritations in your life?” The internal consistency analysis of the PSS showed a Cronbach’s Alpha of 0.85 which indicates a high internal consistency.
Note that Shapiro-Wilk tests were used on all scales to assess the normal distribution of the scales due to the different nature of the measures. The tests indicated that none of the scales, for either of the groups deviated from a normal distribution.
The data from the surveys was first screened to determine any potential relationships between the dependent variables. Pearson correlation on the whole sample indicated that Psychological Wellbeing, Resilience and Stress showed a medium-sized correlation with each other, ranging from r = .50 to r = -.64 (correlations with stress being negative). However, Goal Attainment was not related to any of the other dependent variables. See Table 1 for correlation matrices for the different dependent variables. These correlations indicate that Goal Attainment is being measured independently of the other dependent variables, while Psychological Wellbeing, Resilience and Stress were analysed in one MANOVA to avoid family errors that may occur with multiple testing of related variables.
Note: Table shows Pearson’s correlations *** = p <.001. á = Cronbach’s alpha of the measurement
A Mixed Factorial ANOVA was conducted using the Goal Attainment Scale between the two groups in order to answer Hypothesis 1. The analysis indicated that there was a statistical significant interaction of time and group, f(7, 973) = 4.44, p < 0.001, ηp2 = 0.031. This significant interaction allowed follow-up analysis over time, per group, and comparisons between the groups on each time point. Both groups indicated an initial increase in the Goal Attainment for the first three months and this increase was statistically significant for each group at three months (exp, p = 0.005, control, p = 0.01) according to Bonferroni corrected Pairwise comparisons. Moreover, the groups did not significantly differ between each other at this time point, p = 0.82, d = 0.03. However, the AI coaching chatbot had an effect after using Vici for three months where the goal attainment for the experimental group kept on significantly increasing while the control group faded off. At the end of the project, three months after the AI coaching had finished, the difference between the groups was now statistically different after Bonferroni adjustments for multiple comparisons: p = 0.005, d = 0.60, as illustrated in Figure 1. See Table 2 for all means and standard deviations over time.
Note: The table shows means for each time point and group (standard deviations within brackets)
We analysed the usage of the AI application in terms of how many times the AI coaching chatbot was used to identify any potential within-group differences. We were able to identify a significant difference in development of Goal Attainment (and not of any of the other dependent variables) when splitting the frequency of usage into two equal groups based on their median usage (6 AI coaching sessions), t (73) = -2.24, p = 0.028, d = 0.52. The lower usage group had an average increase on Goal Attainment of 17.62 (sd = 32.50) compared to 37.62 (sd = 34.16) in the higher usage group.
These findings indicate that the AI coaching chatbot did not seem to have an immediate effect on goal attainment. However, using the chatbot over three months’ time resulted in a significant effect, which was sustained three months after the end of the research project. The experimental group showed an increase of 55% on their goal attainment compared to 24% in the control group, the findings further indicate that, using the coaching application more frequently, led to a higher increase in the goal attainment from T1 to T8. Thus, the findings from these analyses support Hypothesis H1.
The dependent variables Psychological Wellbeing, Resilience and Perceived Stress were all analysed in a Mixed Factorial MANOVA since they showed moderate correlations with each other. We conducted the Mixed Factorial MANOVA to identify if there were developments between the two groups of the three different but correlated variables. However, the analysis could not identify a significant interaction of time and group, Λ = 0.91, f (21, 119) = 0.56, p = .94, ηp2= 0.09. Furthermore, univariate tests of each dependent variable could not either identify any unique interaction of group and time on any of the three variables (resilience, p = 0.80; psychological wellbeing, p = 0.89; and stress, p = 0.91).
The insignificant results of the Mixed Factorial ANOVA suggest that the AI coaching chatbot in the study did not have an impact on Psychological Wellbeing, Resilience or Perceived Stress. Therefore, Hypothesis H2 is rejected and Hypotheses H3 and H4 are supported.
Given the positive effect the AI coaching chatbot had on goal attainment, we conducted further qualitative analysis on the nature of the goals in both groups. Two of the authors independently analysed the first goal to assess the theme of the goal, the type of outcome (concrete or vague goal) and whether the goal was proximal (<6 months) or distal (>6 months). The inter-rater reliability of the analysis indicated a very high similarity between the reviewers on all three categories, with Cohen’s kappa of κ = .96, p < .001 for the theme of the goal, κ = .90 for the outcome and κ = .87, p <.001 for the timeline of the goal.
The themes of the goals that were identified relate to the participants’ studies (42%), self-development (21%), career related (18%), health and well-being (17%), other (2 %), financial related (1%) and family related goals (0.5%). Most of the goals were concrete and measurable (65%), for example, “To gain overall mark of 75% in study year 1” and 55% of the goals were long term focused (> 6 months).
The proportion of the type of goals, the type of outcome and the timeline was tested between the groups to identify if there were any association of group and the goals. However, chi-square tests did not identify any significant associations of the goals which indicates that the goals were similarly assigned between both groups.
This RCT longitudinal study investigated the ability of Vici, a goal-attainment AI chatbot coach to improve the goal attainment, psychological wellbeing, resilience and reduce the perceived stress of people. Coaching as a helping profession is growing in stature with convincing evidence of its efficacy in various domains, including the four constructs investigated in this study. Understanding to what extent an AI coach could improve these aspects could help to democratize coaching by making its benefits available to a much wider audience.
There are three main findings of the present study that require in-depth discussion: (1) Vici was able to assist participants with increased goal attainment relative to the control group; (2) Vici users only outperformed the control group in goal attainment after three months (T4); (3) Vici was not able to help participants with improved resilience, psychological wellbeing or reduced perceived stress.
To explain our first finding we turn to control and goal theory. Control theory provides several clues to Vici’s performance. Firstly, control theory suggests that the process of monitoring influences goal progress. Monitoring helps to translate goals into actions and actions lead to progress (Harkin, Webb, Chang, Prestwich, Conner, Kellar, Benn and Sheeran, 2016). Monitoring is one of the core features of Vici. After the initial goal-setting conversation, all follow-up interactions with Vici are dedicated to continued monitoring of goals and actions progress, hence providing one explanation of Vici’s goal efficacy.
Secondly, the frequency of progress monitoring has a positive effect on goal attainment (see for example Acharya, Elci, Sereika, Styn and Burke, 2011; Coughlin, Gullion, Brantley, Stevens, Bauck, Champagne and Appel, 2013; Sherwood, Crain, Martinson, Anderson, Hayes, Anderson, Senso and Jeffery, 2013). The results from the present study indicate that the issue of frequency is also true for AI-based monitoring. Participants who interacted with Vici more often achieved higher goal attainment.
Thirdly, Harkin et al.’s (2016) meta-study found that the physical recording of goals increases goal attainment. One of Vici’s core purposes is to help people capture their goals by physically typing it into the chatbot system. It seems therefore that Vici’s ability to monitor goal setting and attainment can be explained through the dynamics of control theory.
This result can also be interpreted in terms of goal theory. Goal attainment is moderated by ability, goal commitment, feedback on goal progress, goal complexity, and external factors such as resources required (Latham & Locke, 2007). In the goal-setting and progress conversations, Vici asks explicit questions covering these aspects including helping users to assess how realistic their goals are, how committed they are, what resources they need and have available. Vici also helps users break down their goals into long-term (distal) and short-term (proximal) goals. Goal-setting theory holds that feedback from a proximal goal helps shape a person’s idea of what is realistic and keeps the goal attainment process congruent with the distal goal (Latham & Seijts, 1999). The setting of proximal goals also leads to more regular feedback, which aids overall goal achievement as discussed earlier and hence explaining Vici’s ability to improve goal achievement.
There appears to be compelling evidence from control theory and goal theory that explains why Vici was able to improve goal attainment. This insight empirically supports the conceptual recommendation of the Designing AI Coaches (DAIC) framework (used to create Vici) that AI coaches must be designed using sound theoretical models (Terblanche, 2020).
The second important finding is that Vici users only outperformed the control group in goal attainment after three months (T4). In fact, goal attainment at T2 when users had their first interaction with Vici was lower for the Vici group than the control group. This phenomenon could possibly be explained in terms of control and goal theories. Participants in the control group did not any active feedback. After setting their initial goals, they also experienced initial improvements in goal attainment for the first three months through self-sustained effort. However due to lack of feedback and support their progress eventually decreased, in line with control and goal theory predictions (Acharya et al., 2011; Coughlin et al., 2013; Sherwood et al., 2013). Vici users on the other hand sustained their goal progression after three months due to monitoring and feedback provided by the chatbot. Vici’s questioning about the relevance and appropriateness of actions aimed at achieving goals may also have helped users to optimize their action plans. The initial decrease in goal attainment at T2 for Vici users could be because Vici helped users to gain a more realistic sense of their goals compared to the control group. Knowing practically what is required to achieve a goal may cause an initial sense of slower progress. In the long term however, clear and realistic goals lead to a higher level of goal attainment as the results indicate (Latham & Locke, 2007). This finding on the role of the number of AI coaching session on coaching outcomes is also significant since in human coaching there is limited evidence of the impact of the number of coaching sessions (Theeboom et al., 2014). In AI coaching it appears that more sessions are more beneficial, however this claim needs to be investigated further.
The third important finding of our study is that Vici was not able to assist people with psychological wellbeing, resilience or perceived stress. We expected Vici to be successful in improving psychological wellbeing since goal theory suggests that goal setting and achievement have positive influences on a person’s wellbeing (Koestner et al., 2002; Niemiec et al., 2009; Sonnentag, 2002). In human coaching, even though the focus may be on goal attainment, the very presence of a human as coach has implications for the scope, focus and outcomes of the intervention. Good human coaches are aware of the importance of maintaining a strong coach-coachee relationship (De Haan et al., 2016). This supportive relational dimension could positively influence aspects such as wellbeing. The fact that Vici was not able to engage on a human relationship level may therefore explain why no spill-over effects of the goal attainment focus was observed. Vici’s sole focus on goal-attainment and lack of human-like empathy and contextual awareness may also explain why there was no improvement on resilience and reduction in perceived stress. There is evidence from the psychology domain that AI entities can in fact improve wellbeing, resilience and reduce stress (Gaffney, Mansell & Tai, 2019; Lattie et al., 2019). These AI entities were based on theoretical models from psychology such as CBT, similar to how Vici was created based on goal theory, with the associated outcomes.
This study contributes on a theoretical level to our understanding of the link between coaching and goal theory as well as AI theory. Goal theory states that goal attainment is enhanced by setting clear and challenging (but not too difficult) goals, commitment to the process and obtaining regular feedback (Locke & Latham, 1990). The relevance of goal theory in human coaching has been established (Grant 2006). The present study extends our understanding of goal theory in coaching by showing that a machine algorithm that implements goal theory accurately delivers positive coaching outcomes, thus extending the application and reach of goal theory in coaching.
AI theory holds that the narrow application of AI in the form of expert systems produce applications that can perform one function well (Bostrom, 2014; Shanahan, 2015; Siau & Yang, 2017). The AI coach in the present study was designed based on these principles of narrow AI and expert systems as captured in the DAIC framework that suggests creating AI coaches with a focused approach (Terblanche, 2020). The positive efficacy of Coach Vici therefore supports the theoretical understanding of the limitations of current AI technology and confirms our theoretical understanding of the application of AI in coaching.
Vici’s ability to facilitate goal attainment is encouraging in terms of the prospect of democratising coaching. Coaching has proven benefits for people, yet many strata of society do not have access to this service due to limited availability of skilled coaches and associated high costs. As of August 2020, the statistics from the largest such body for coaching in the world, the International Coach Federation (ICF), shows that of their 42 786 members, fewer than 2 000 are practicing in Africa (ICF, 2020b). The average cost of organisational coaching in Africa is approximately 100 USD per hour (Terblanche et al., 2021). This combination of low availability and high cost places human coaching well beyond the reach of most people in Africa. The once-off cost of creating Vici, its scalability, 24/7 availability and now proven efficacy makes a compelling case for employing AI coaching to democratise this helping service for people who are currently unable to access it.
Even in organisations that pay for coaching services, coaching is often reserved for more senior staff due to cost. AI coaching could therefore also democratise coaching within organisations by providing a basic coaching service to more junior employees. Employees who show commitment to the AI coaching process could then be offered a more expensive human coach at a later stage since they are then more likely to maximise the investment. This mechanism could help improve return on investment in coaching by organisations.
This study has two notable limitations. Firstly, participants were undergraduate students, which suggests that the results may not generalise to other populations. While students may well benefit from using an AI coach, there are other domains such as the workplace where AI coaching could be used. It is therefore imperative that AI coaching should be investigated beyond the current limited context of higher education. Secondly, we only measured self-scores by participants, which opens the possibility of self-score bias. Self reporting measure are a common occurrence in coaching research and often criticised. Objective measures would enhance the validity of the present study. These limitations are offset to some degree by the research design (longitudinal, RCT) and the relatively large sample size.
In terms of future research, two suggestions are put forward. Firstly, the efficacy of the hybrid between human and AI coaches must be studied. Is the combination of a human and AI coach more efficacious than a human coach alone and what are the contributing factors? Combinations such as blended asynchronous and synchronous models should be researched where a coach is available for remaining issues if the AI coach was inadequate. Secondly, AI coaches who focus on various outcomes other than goal attainment, such as wellbeing, must be created and their efficacy studied. It is suggested that all these studies employ longitudinal RCT designs.
Coaching offers numerous proven benefits to society; however, not everyone can afford a coach. Our study shows that Vici, an AI chatbot coach was able to successfully assist users with a particular aspect of coaching, increased goal attainment, in line with its design. This evidence of the efficacy of AI coaching is a major step towards democratising coaching and offering the benefits of this service to people who could not normally afford it.