A collection of academic publications and studies I led and contributed to.
2024
To What Extent Are Large Language Models Capable of Generating Substantial Reflections for Motivational Interviewing Counseling Chatbots? A Human Evaluation
Erkan Basar, Iris Hendrickx, Emiel Krahmer, Gert-Jan de Bruijn, and Tibor Bosse
In Proceedings of the 1st Human-Centered Large Language Modeling Workshop, 2024
Motivational Interviewing is a counselling style that requires skillful usage of reflective listening and engaging in conversations about sensitive and personal subjects. In this paper, we investigate to what extent we can use generative large language models in motivational interviewing chatbots to generate precise and variable reflections on user responses. We conduct a two-step human evaluation where we first independently assess the generated reflections based on four criteria essential to health counseling; appropriateness, specificity, naturalness, and engagement. In the second step, we compare the overall quality of generated and human-authored reflections via a ranking evaluation. We use GPT-4, BLOOM, and FLAN-T5 models to generate motivational interviewing reflections, based on real conversational data collected via chatbots designed to provide support for smoking cessation and sexual health. We discover that GPT-4 can produce reflections of a quality comparable to human-authored reflections. Finally, we conclude that large language models have the potential to enhance and expand reflections in predetermined health counseling chatbots, but a comprehensive manual review is advised.
Effectiveness and user experience of a smoking cessation chatbot: A mixed-methods study comparing motivational interviewing and confrontational counseling
Background: Cigarette smoking poses a major public health risk. Chatbots may serve as an accessible and useful tool to promote cessation due to their high accessibility and potential in facilitating long-term personalized interactions. To increase effectiveness and acceptability, there remains a need to identify and evaluate counseling strategies for these chatbots, an aspect that has not been comprehensively addressed in previous research.
Objective: This study aims to identify effective counseling strategies for such chatbots to support smoking cessation. In addition, we sought to gain insights into smokers’ expectations of and experiences with the chatbot.
Methods: This mixed methods study incorporated a web-based experiment and semistructured interviews. Smokers (N=229) interacted with either a motivational interviewing (MI)–style (n=112, 48.9%) or a confrontational counseling–style (n=117, 51.1%) chatbot. Both cessation-related (ie, intention to quit and self-efficacy) and user experience–related outcomes (ie, engagement, therapeutic alliance, perceived empathy, and interaction satisfaction) were assessed. Semistructured interviews were conducted with 16 participants, 8 (50%) from each condition, and data were analyzed using thematic analysis.
Results: Results from a multivariate ANOVA showed that participants had a significantly higher overall rating for the MI (vs confrontational counseling) chatbot. Follow-up discriminant analysis revealed that the better perception of the MI chatbot was mostly explained by the user experience–related outcomes, with cessation-related outcomes playing a lesser role. Exploratory analyses indicated that smokers in both conditions reported increased intention to quit and self-efficacy after the chatbot interaction. Interview findings illustrated several constructs (eg, affective attitude and engagement) explaining people’s previous expectations and timely and retrospective experience with the chatbot.
Conclusions: The results confirmed that chatbots are a promising tool in motivating smoking cessation and the use of MI can improve user experience. We did not find extra support for MI to motivate cessation and have discussed possible reasons. Smokers expressed both relational and instrumental needs in the quitting process. Implications for future research and practice are discussed.
Exploring user engagement through an interaction lens: what textual cues can tell us about human-chatbot interactions
Monitoring and maintaining user engagement in human-chatbot interactions is challenging. Researchers often use cues observed in the interactions as indicators to infer engagement. However, evaluation of these cues is lacking. In this study, we collected an inventory of potential textual engagements cues from the literature, including linguistic features, utterance features, and interaction features. These cues were subsequently used to annotate a dataset of 291 user-chatbot interactions, and we examined which of these cues predicted self-reported user engagement. Our results show that engagement can indeed be recognized at the level of individual utterances. Notably, words indicating cognitive thinking processes and motivational utterances were strong indicators of engagement. An overall negative tone could also predict engagement, highlighting the importance of nuanced interpretation and contextual awareness of user utterances. Our findings demonstrated initial feasibility of recognizing utterance-level cues and using them to infer user engagement, although further validation is needed across different content-domains.
2023
HyLECA: A Framework for Developing Hybrid Long-Term Engaging Controlled Conversational Agents
Erkan Basar, Divyaa Balaji, Linwei He, Iris Hendrickx, Emiel Krahmer, Gert-Jan de Bruijn, and Tibor Bosse
In Proceedings of the 5th ACM Conference on Conversational User Interfaces (CUI), 2023
We present HyLECA, an open-source framework designed for the development of long-term engaging controlled conversational agents. HyLECA’s dialogue manager employs a hybrid architecture, combining rule-based methods for controlled dialogue flows with retrieval-based and generation-based approaches to enhance the utterance variability and flexibility. The motivation behind HyLECA lies in enhancing user engagement and enjoyment in task-oriented chatbots by leveraging the natural language generation capabilities of open-domain large language models within the confines of predetermined dialogue flows. Moreover, we discuss the technical capabilities, potential applications, relevance, and adaptability of the system. Lastly, we report preliminary findings from integrating state-of-the-art large language models in simulating a conversation centred on smoking cessation.
2022
Can chatbots help to motivate smoking cessation? A study on the effectiveness of motivational interviewing on engagement and therapeutic alliance
Linwei He, Erkan Basar, Reinout W Wiers, Marjolijn L Antheunis, and Emiel Krahmer
Background: Cigarette smoking poses a major threat to public health. While cessation support provided by healthcare professionals is effective, its use remains low. Chatbots have the potential to serve as a useful addition. The objective of this study is to explore the possibility of using a motivational interviewing style chatbot to enhance engagement, therapeutic alliance, and perceived empathy in the context of smoking cessation.
Methods: A preregistered web-based experiment was conducted in which smokers (n = 153) were randomly assigned to either the motivational interviewing (MI)-style chatbot condition (n = 78) or the neutral chatbot condition (n = 75) and interacted with the chatbot in two sessions. In the assessment session, typical intake questions in smoking cessation interventions were administered by the chatbot, such as smoking history, nicotine dependence level, and intention to quit. In the feedback session, the chatbot provided personalized normative feedback and discussed with participants potential reasons to quit. Engagement with the chatbot, therapeutic alliance, and perceived empathy were the primary outcomes and were assessed after both sessions. Secondary outcomes were motivation to quit and perceived communication competence and were assessed after the two sessions.
Results: No significant effects of the experimental manipulation (MI-style or neutral chatbot) were found on engagement, therapeutic alliance, or perceived empathy. A significant increase in therapeutic alliance over two sessions emerged in both conditions, with participants reporting significantly increased motivation to quit. The chatbot was perceived as highly competent, and communication competence was positively associated with engagement, therapeutic alliance, and perceived empathy.
Conclusion: The results of this preregistered study suggest that talking with a chatbot about smoking cessation can help to motivate smokers to quit and that the effect of conversation has the potential to build up over time. We did not find support for an extra motivating effect of the MI-style chatbot, for which we discuss possible reasons. These findings highlight the promise of using chatbots to motivate smoking cessation. Implications for future research are discussed.
Hints of Independence in a Pre-scripted World: On Controlled Usage of Open-domain Language Models for Chatbots in Highly Sensitive Domains
Erkan Basar, Iris Hendrickx, Emiel Krahmer, Gert-Jan de Bruijn, and Tibor Bosse
In Proceedings of the 14th International Conference on Agents and Artificial Intelligence (ICAART), 2022
Open-domain large language models have progressed to generating natural-sounding and coherent text. Even though the generated texts appear human-like, the main stumbling block is that their output is never fully predictable, which runs the risk of resulting in harmful content such as false statements or inflammatory language. This makes it difficult to apply these models in highly sensitive domains including personal health counselling. Hence, most of the chatbots for highly sensitive domains are developed using pre-scripted approaches. Although pre-scripted approaches are highly controlled, they suffer from repetitiveness and scalability issues. In this paper, we explore the possibility of combining the best of both worlds. We propose and describe in detail a new, flexible expert-driven hybrid architecture for harnessing the benefits of large language models in a controlled manner for highly sensitive domains and discuss the expectations and challenges.
Enriching impact data by mining digital media
Marc van den Homberg, Jacopo Margutti, Erkan Basar, and Jurjen Wagemaker
UN Global Assessment Report on Disaster Risk Reduction, 2022
This contributing paper explores the opportunities offered by digital media mining to complement impact databases. Impact data on past disasters caused by natural hazards (in short, impact data) are of paramount importance for several applications. These include advocacy for investments in disaster risk reduction (DRR) and providing and evidence base for new policies. It is challenging, however, to create, sustain, and increase the adoption of an impact database with sufficient quality for different applications in humanitarian response and DRR. Online newspapers, both national and local ones, tend to cover small disasters more than some institutional databases that focus only on disasters above a particular threshold impact. Mining data from these sources offer a means to complement existing databases. This paper indicates that leading openly available databases have their different strengths and weaknesses. Mining digital newspapers helps shed light on data discrepancies given by different database. In the case studies, the enriched impact database was used to validate a hydrological model, particularly in defining triggers and improving monitoring. The study focused on sudden-onset disasters and further research will be needed to understand how mining can be used for slow-onset disasters such as droughts. Big data and modern information processing systems can also further improve operational excellence in humanitarian applications.
2021
Towards a New generation of Personalized Intelligent Conversational Agents
Iris Hendrickx, Federica Cena, Erkan Basar, Luigi Di Caro, Florian Kunneman, Elena Musi, Cataldo Musto, Amon Rapp, and Jelte Waterschoot
In Adjunct Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, 2021
The Personalized Intelligent Conversational Agents workshop focuses on both long-term engaging spoken dialogue systems and text-based chatbots, as well as conversational recommender systems. The goal of the workshop is to stimulate discussion around problems, challenges, possible solutions and research directions regarding the exploitation of natural language processing and machine learning techniques to learn user features and to use them to personalize the dialogue in the next generation of intelligent conversational agents.
2020
Improving the classification of flood tweets with contextual hydrological information in a multimodal neural network
Jens de Bruijn, Hans de Moel, Albrecht Weerts, Marleen de Ruiter, Erkan Basar, Dirk Eilander, and Jeroen Aerts
While text classification can classify tweets, assessing whether a tweet is related to an ongoing flood event or not, based on its text, remains difficult. Inclusion of contextual hydrological information could improve the performance of such algorithms. Here, a multilingual multimodal neural network is designed that can effectively use both textual and hydrological information. The classification data was obtained from Twitter using flood-related keywords in English, French, Spanish and Indonesian. Subsequently, hydrological information was extracted from a global precipitation dataset based on the tweet’s timestamp and locations mentioned in its text. Three experiments were performed analyzing precision, recall and F1-scores while comparing a neural network that uses hydrological information against a neural network that does not. Results showed that F1-scores improved significantly across all experiments. Most notably, when optimizing for precision the neural network with hydrological information could achieve a precision of 0.91 while the neural network without hydrological information failed to effectively optimize. Moreover, this study shows that including hydrological information can assist in the translation of the classification algorithm to unseen languages.
The Connection between the Text and Images of News Articles: New Insights for Multimedia Analysis
Nelleke Oostdijk, Hans Halteren, Erkan Basar, and Martha Larson
In Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC), 2020
We report on a case study of text and images that reveals the inadequacy of simplistic assumptions about their connection and interplay. The context of our work is a larger effort to create automatic systems that can extract event information from online news articles about flooding disasters. We carry out a manual analysis of 1000 articles containing a keyword related to flooding. The analysis reveals that the articles in our data set cluster into seven categories related to different topical aspects of flooding, and that the images accompanying the articles cluster into five categories related to the content they depict. The results demonstrate that flood-related news articles do not consistently report on a single, currently unfolding flooding event and we should also not assume that a flood-related image will directly relate to a flooding-event described in the corresponding article. In particular, spatiotemporal distance is important. We validate the manual analysis with an automatic classifier demonstrating the technical feasibility of multimedia analysis approaches that admit more realistic relationships between text and images. In sum, our case study confirms that closer attention to the connection between text and images has the potential to improve the collection of multimodal information from news articles.
2019
The Multimedia Satellite Task at MediaEval 2019.
Benjamin Bischke, Patrick Helber, Simon Brugman, Erkan Basar, Zhengyu Zhao, Martha A Larson, and Konstantin Pogorelov
In Proceedings of the MediaEval 2019 Workshop, 2019
This paper provides a description of the Multimedia Satellite Task at MediaEval 2019. The main objective of the task is to extract complementary information associated with events which are present in Satellite Imagery and news articles. Due to their high socioeconomic impact, we focus on flooding events and built upon the last two years of the Multimedia Satellite Task. Our task focuses this year on flood severity estimation and consists of three subtasks:(1) Image-based News Topic Disambiguation,(2) Multimodal Flood Level Estimation from news,(3) Classification of city-centered satellite sequences. The task moves forward the state of the art in flood impact assessment by concentrating on aspects that are important but are not generally studied by multimedia researchers.
A Comparative Study on Generalizability of Information Extraction Models on Protest News.
Erkan Basar, Simge Ekiz, and Antal van den Bosch
In Working Notes of the Conference and Labs of the Evaluation Forum, 2019
Information Extraction applications can help social scientists to obtain necessary information to understand the reasons behind certain social dynamics. Many recent state-of-the-art information extraction approaches are based on supervised machine learning which can recognize information that has similar patterns with previously shown ones. Recognizing relevant information with never-shown patterns, however, is still a challenging task. In this study, we design a Recurrent Neural Network (RNN) architecture employing ELMo embeddings and Residual Bidirectional Long-Short Term Memory layers to overcome this challenge in the context of CLEF 2019 ProtestNews shared task. Furthermore, we train a classical Conditional Random Fields (CRF) model as our strong baseline to display a contrast between a state-of-the-art classical machine learning approach and a recent neural network method both in performance and in generalizability. We show that RNN model outperforms classical CRF model and shows a better promise on generalizability.
2017
Supporting Experts to Handle Tweet Collections About Significant Events
Ali Hurriyetoglu, Nelleke Oostdijk, Erkan Basar, and Antal van den Bosch
In Natural Language Processing and Information Systems (NLDB), 2017
We introduce Relevancer that processes a tweet set and enables generating an automatic classifier from it. Relevancer satisfies information needs of experts during significant events. Enabling experts to combine automatic procedures with expertise is the main contribution of our approach and the added value of the tool. Even a small amount of feedback enables the tool to distinguish between relevant and irrelevant information effectively. Thus, Relevancer facilitates the quick understanding of and proper reaction to events presented on Twitter.