2409.01907

Model: gemini-2.5-flash-lite-free

# Focus Agent: LLM-Powered Virtual Focus Group **Authors**: Taiyu Zhang, Xuesong Zhang, Robbe Cools, Adalberto L. Simeone > 1234-5678-9012 KU Leuven Naamsestraat 22 Leuven Belgium 3001 > KU Leuven Naamsestraat 22 Leuven Belgium 3001 > KU Leuven Leuven Belgium (2024) ## Abstract In the domain of Human-Computer Interaction, focus groups represent a widely utilised yet resource-intensive methodology, often demanding the expertise of skilled moderators and meticulous preparatory efforts. This study introduces the “Focus Agent,” a Large Language Model (LLM) powered framework that simulates both the focus group (for data collection) and acts as a moderator in a focus group setting with human participants. To assess the data quality derived from the Focus Agent, we ran five focus group sessions with a total of 23 human participants as well as deploying the Focus Agent to simulate these discussions with AI participants. Quantitative analysis indicates that Focus Agent can generate opinions similar to those of human participants. Furthermore, the research exposes some improvements associated with LLMs acting as moderators in focus group discussions that include human participants. Human-computer Interaction, Intelligent Virtual Agent, Virtual Focus Group, Multi Agent Simulation journalyear: 2024 copyright: acmlicensed conference: ACM International Conference on Intelligent Virtual Agents; September 16–19, 2024; GLASGOW, United Kingdom booktitle: ACM International Conference on Intelligent Virtual Agents (IVA ’24), September 16–19, 2024, GLASGOW, United Kingdom doi: 10.1145/3652988.3673918 isbn: 979-8-4007-0625-7/24/09 ccs: Computing methodologies Multi-agent planning ccs: Human-centered computing User studies ccs: Human-centered computing Virtual reality ## 1. Introduction In the domain of qualitative research, focus groups have emerged as a widely adopted methodology and are extensively employed in both industrial and academic contexts (Kitzinger, 1994, 1995; Mazza, 2006), thanks to its structured group discussions aimed at gaining in-depth insights into specific issues. Within Human-Computer Interaction (HCI), researchers routinely employ focus groups as a vital tool in project planning, evaluation, and data collection endeavours (Mazza, 2006; Troshani et al., 2021; Selter et al., 2023; Stalmeijer et al., 2014). Particularly noteworthy is the growing prominence of virtual focus groups, especially in the post-COVID-19 era (Keen et al., 2022). This transition towards virtual focus groups can be attributed to their blending a methodologically sound approach with the potential of engaging with geographically dispersed and otherwise challenging to access populations (Turney and Pocknee, 2005). Organising a focus group presents two primary challenges: first, gathering so many people at the same time is not an easy task, especially when researchers are interested in exploring the lived experiences of diverse or hard to reach groups (Brüggen and Willems, 2009; Gratton and O’Donnell, 2011; Wirtz et al., 2019); second, the success of a focus group relies on an experienced moderator with domain-specific expertise. A moderator lacking experience can disrupt the discussion flow or gather unproductive data (Nagle and Williams, 2013). These issues have sometimes hindered the adoption of focus groups into certain HCI research efforts (Rosenbaum et al., 2002). <details> <summary>extracted/5830562/FocusGroupSimulation.png Details</summary> ![cf5b33af](/v1/image/cf5b33afb2701c9d8a95b29806a4d6f0bbf4abae11bbcd84c1fb2d77af238893) ### Visual Description ## Diagram: Collaborative Process Flow ### Overview This diagram illustrates a cyclical collaborative process involving a moderator robot and a group of participant robots. The process is divided into four key stages: Planning, Moderator, Questions, Discussion, and Reflection, depicted by arrows indicating the flow and relationships between these stages. ### Components/Axes The diagram does not have traditional axes or scales. The components are: * **Moderator Robot:** A stylized, friendly robot character positioned at the top-center of the diagram. It is holding a pencil in its right hand and a clipboard with a document in its left hand, suggesting a role in organization and guidance. * **Participant Robots:** A group of five smaller, colorful robot characters seated around a table, engaged in discussion. They are depicted with speech bubbles above their heads, indicating communication. * **Labels:** * "Planning": Located above the Moderator Robot, enclosed in a dashed red oval with double red lines forming the outer boundary, indicating a phase. * "Moderator": Located to the right of the Moderator Robot, with a blue arrow originating from it. * "Questions": Located to the right of the Moderator Robot, with a blue arrow pointing towards the participant robots. * "Discussion": Located below the participant robots, enclosed in a solid blue circle with an arrow indicating a continuous process. * "Reflection": Located to the left of the participant robots, with a dashed red arrow originating from the discussion phase and pointing towards the planning phase. ### Detailed Analysis or Content Details The diagram depicts a workflow with the following sequence and interactions: 1. **Planning:** Represented by the dashed red oval and text at the top. This phase appears to be initiated or overseen by the Moderator Robot. 2. **Moderator:** The Moderator Robot is shown in a central position, implying its role in facilitating or initiating the next steps. A blue arrow labeled "Moderator" points from the Moderator Robot towards the "Questions" phase. 3. **Questions:** A blue arrow labeled "Questions" originates from the vicinity of the Moderator Robot and points towards the group of participant robots. This suggests the moderator poses questions to the group. 4. **Discussion:** A solid blue circular arrow labeled "Discussion" encircles the group of participant robots. This indicates an ongoing interactive process among the participants. 5. **Reflection:** A dashed red arrow labeled "Reflection" originates from the "Discussion" phase (or the participant robots) and curves upwards and to the left, pointing towards the "Planning" phase. This signifies that insights or outcomes from the discussion lead back to planning. The overall flow is a cycle: Planning -> Moderator -> Questions -> Discussion -> Reflection -> Planning. ### Key Observations * The process is cyclical, suggesting continuous improvement or iterative collaboration. * The Moderator Robot appears to be a central facilitator, initiating and guiding the process. * The use of different arrow styles (solid blue for Discussion, dashed red for Reflection) and colors (blue for Moderator/Questions, red for Planning/Reflection) may denote different types of actions or phases. Solid blue might represent direct interaction or a core activity, while dashed red might represent a feedback loop or a preparatory/concluding phase. * Speech bubbles above the participant robots visually reinforce the "Discussion" and "Questions" phases. ### Interpretation This diagram illustrates a structured approach to collaborative problem-solving or project development, likely within an AI or robotics context given the characters. The process emphasizes a balance between structured guidance (Moderator, Planning, Questions) and organic interaction (Discussion). The inclusion of "Reflection" as a feedback loop from "Discussion" back to "Planning" highlights the importance of learning and adaptation. This cyclical model suggests that outcomes from one iteration inform and refine the next, leading to continuous improvement. The Moderator Robot's role is crucial in initiating and directing the flow, ensuring that the group's discussions are productive and lead to actionable insights for future planning. The visual representation is clear and intuitive, effectively conveying a dynamic and iterative collaborative workflow. </details> Figure 1. The AI moderator generates questions according to the discussion content and plan, while AI Participants discuss the prompt from the moderator. The advent of Large Language Models (LLMs), such as ChatGPT, offers a potential solution. These models can frequently communicate in text, generate diverse content from various perspectives based on the large scale of text information on the internet (Reynolds and McDonell, 2021; Brown et al., 2020), and demonstrate expertise across several fields, including social sciences, healthcare, and education (Koubaa, 2023; Sallam, 2023). Their capabilities extend to assisting with paper writing (Katar et al., 2023; Ciaccio, 2023), providing legal advice (Katz et al., 2024; Nay et al., 2024), and supporting medical inquiries (Haupt and Marks, 2023). Given these advancements, focus groups, a classic qualitative data collection method, should benefit from LLMs. Despite their potential, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect (hallucinated) information (Wang et al., 2023b). Additional framework design is still necessary for multi-agent tasks, such as societal simulations (Park et al., 2023) or role-playing game simulations (Xu et al., 2023). This work introduces the “Focus Agent”, an LLM-based moderator for focus groups that has two functions: 1) simulating discussions without human participants and collecting AI-generated opinions, and 2) guiding focus groups as a moderator as shown in Figure 1, with human participants as well. To address prevalent issues in multi-agent simulations, including repetitive opinions and the generation of irrelevant content, the “Focus Agent” employs a scheduled discussion format that divides the focus group into distinct stages, each corresponding to a specific topic. This method mirrors the strategies employed by experienced human moderators. Additionally, the framework incorporates reflection periods during discussion to counteract memory loss during the simulation, ensuring a coherent and productive discussion flow. When moderating focus groups with human participants, a multi-person Speech-to-Text (S2T) and Text-to-Speech (T2S) integration enables the “Focus Agent” to interact with multiple users simultaneously. Our work primarily explores the application of LLMs in simulating focus group discussions. Two main Research Questions (RQs) are as follows: RQ 1: To what extent do the opinions generated by a LLM align with those of human participants in focus group? RQ 2: To what extent is a LLM effective in performing the duties of a moderator in focus group discussions? To answer these RQs, we conducted a user study with 23 participants across five discussion groups. Participants engaged in a one-hour AI-moderated focus group discussion on the topic of “digital well-being”, followed by a 30-minute session led by a researcher to share their experiences, evaluate the AI moderator’s performance and collect feedback, which was referred as a meta focus group in our work. Meanwhile, the Focus Agent simulated the focus group discussions on the same topic with AI participants. Qualitative analysis including thematic analysis and content analysis of the transcriptions reveals that the AI simulation outputs the majority of opinions expressed by human participants. Additionally, we assessed the performance of the Focus Agent functioning as a moderator, both in the focus group simulation with AI participants as well as with focus groups involving human participants. Based on our findings, the Focus Agent meets the essential criteria required of a focus group moderator. This includes progressively guiding discussions from general to more specific topics and maintaining an actively engaged atmosphere, drawing on the fundamental literacy expected of a focus group moderator (Stewart and Shamdasani, 2014). However, when tasked with moderating discussions involving human participants, the agent’s ability to interact with humans seems constrained, and it has not demonstrated sufficient understanding of human conversation. We identified several limitations of current LLMs in managing multi-person discussions and offer suggestions for integrating AI agents into focus group more effectively. To promote further research, the code has been open-sourced https://github.com/AriaXR/FocusAgent. ## 2. Related Work This section discusses previous research directly related to our study. We divided it into three subsections: Focus Group Development, Multi-Agent Simulation and Multi-speaker speech recognition for Voice-based Conversational Agents. ### 2.1. Focus Group Development The utilisation of focus groups, or group depth interviews, is a cornerstone method within the realms of advertising, marketing, and HCI research due to its effectiveness in gathering qualitative insights (Stewart and Shamdasani, 2014). The earliest focus groups were conducted through face-to-face conversations, which make the organisation complex and time-consuming, even with a lot of fees for participant reimbursement (Rosenbaum et al., 2002). The popularity of online focus groups has augmented their appeal, offering advantages such as the convenience of participation from any location at any time, and anonymity, which reduces participants’ apprehension of judgement (Daniels et al., 2019; Wilkerson et al., 2014; Stewart and Shamdasani, 2017). Researchers inviting many people to participate in online meetings at the same time often encounter difficulties, such as inconsistent time schedules, time differences, and poor communication caused by network delays. To further facilitate users’ participation in focus groups, some social media platforms provide asynchronous text-based focus groups (Gordon et al., 2021; Biedermann, 2018; Richard et al., 2021; Wenzek et al., 2019). However, as participants do not contribute simultaneously, it brings some difficulties relating to such a reduced ‘spontaneity’ (Brüggen and Willems, 2009; Nicholas et al., 2010) including: shorter answers with fewer word counts (Chen and Neo, 2019); uneven flow during the interactions due to their lag (Veloso, 2020); and more unfocused exchanges that do not always address the relevant research question (Brüggen and Willems, 2009). The recent advancements in LLMs, which are trained on extensive internet text data, offer novel opportunities for conducting focus groups. As an innovative retrieval model, LLMs have the potential to streamline the data collection process (Zhu et al., 2023). Utilising LLMs to simulate focus groups presents a simpler and potentially more efficient alternative to engaging human participants, thereby opening new avenues for qualitative research. ### 2.2. Multi-Agent Simulation Despite the capability of LLMs to process one-on-one question-answer formats, their deployment in long term dialogues and opinion generation, such as focus group discussions, reveals some limitations. These challenges include difficulties in understanding complex instructions, hallucination of agents, a limited token memory leading to loss of continuity, repetitive dialogues, and the generation of meaningless conversation in long-term interactions (OpenAI, 2023; Xu et al., 2024). To help solve these issues, recent research has come up with new ways to organise how these AI agents think and respond, tailored to specific kinds of tasks (Talebirad and Nadiri, 2023; Park et al., 2023). The Chain-of-Thought (CoT) principle is pivotal, serving as the foundational idea behind them (Wang et al., 2023a). By dissecting complex issues into simpler elements, it facilitates a collaborative approach among multiple agents to tackle each component, leading to a comprehensive solution. By decomposing complex problems into many simple parts, the solution is achieved through the combined efforts of multiple small agents. Additionally, the reflection mechanism plays a crucial role in addressing memory limitations and enhancing the authenticity of the generated content (Yan et al., 2024). This process involves storing detailed historical data as structured information, which can be referenced for more informed decision-making in future interactions. Moreover, to improve the consistency of agent performance across various contexts, some works have investigated the exploration of diverse prompting techniques tailored to the specific roles (Shanahan et al., 2023). In our work, we have built upon insights from previous research to address potential challenges that could arise during focus group discussions. Furthermore, we have developed a novel framework for conducting focus groups, primarily guided by an AI moderator. The AI moderator facilitates simulated focus group discussions and aids in coordinating focus groups that include human participants. To bridge the interaction gap with human participants, we incorporate a voice-based conversational agent to the moderator. ### 2.3. Multi-speaker speech recognition for Voice-based Conversational Agents Unlike text-based chatbots, Voice-based Conversational Agents (VCAs) necessitate an extra technological layer for operation: they use a speech-to-text (S2T) process to interpret spoken inputs and a text-to-speech (T2S) system for generating spoken responses (Jokinen and McTear, 2022; Rough and Cowan, 2020). This integration allows VCAs to facilitate interactions in a more natural, conversational manner, bridging the gap between human users and digital assistants. However, current S2T technologies, such as Google’s API or OpenAI’s Whisper, encounter difficulties in long-term group discussions such as focus groups (Radford et al., 2022). One challenge with using S2T technologies like Whisper for multi-participant discussions is the duration limit on voice recording inputs, which is considerably less than the typical length of conversations. A potential solution involves segmenting longer discussions into shorter fragments using Voice Activity Detection (VAD), which helps manage recordings more effectively (Bain et al., 2023). Another limitation is lack of speaker differentiation, a critical feature for understanding who is speaking in group discussions. Some research has attempted to identify individual speakers by analysing the unique timbre of their voices (Medennikov et al., 2020; Horiguchi et al., 2021, 2020). However, these methods often fall short in accuracy due to the absence of prior information about the speakers. A more effective approach involves using a pre-recorded sample from each speaker, enabling a retrieval-based method to significantly improve performance by accurately distinguishing between speakers (Desplanques et al., 2020). In our work, we improved Whisper, an open-source S2T model, with a retrieval-based technique, optimising it for multi-participant discussions such as focus groups. ## 3. Focus Agent Implementation Our Focus Agent was designed to simulate focus group discussions and facilitate running sessions involving human participants. For the focus group simulation, we devised a multi-agent framework, complemented by a moderator to oversee the entire focus group process. This ensures that the contributions from AI participants are both relevant and valuable. Regarding interactions with actual human participants, we incorporated S2T and T2S systems into the AI moderator, enabling voice-based communication. ### 3.1. Focus Group Simulation <details> <summary>extracted/5830562/AISimulation.png Details</summary> ![0ec2abf1](/v1/image/0ec2abf13f18e2bf8eb66ea32770dd56e2c9908173b1bd74446fd6e5cd5dbb69) ### Visual Description ## Diagram: Focus Group Participants and Roles ### Overview This image depicts a diagram representing participants in a focus group discussion titled "Digital Wellbeing: Balancing Screen Time and Mental Health." The diagram uses a circular layout with a central element and several surrounding elements, visually indicating relationships or roles within the group. Textual information includes the focus group topic, scheduled and simulation times, and individual participant contributions. ### Components/Axes The diagram is a circular representation. - **Central Element:** A red circle labeled "Able (Moderator)". This element is positioned centrally within the larger grey circle. - **Surrounding Elements:** Several smaller green circles and one larger black-bordered circle, each containing a participant's name. These are positioned around the central moderator. - "Noah" (black border, larger than green circles) is positioned at the top-left. - "Amelia" (green circle) is positioned to the left of the moderator. - "William" (green circle) is positioned at the bottom-left. - "David" (green circle) is positioned at the bottom-right. - "Caleb" (green circle) is positioned at the top-right. - **Outer Boundary:** A large, light grey circle encloses all participant elements, suggesting a shared space or context for the discussion. ### Detailed Analysis or Content Details **Focus Group Topic:** Digital Wellbeing: Balancing Screen Time and Mental Health **Scheduled Time:** 60 minutes **Simulation Time:** 11 minutes **Participant Contributions (Transcribed Text):** **Noah:** On average, I spend about 4-5 hours on screens each day. My online activities vary, but I'm particularly drawn to exploring alternative viewpoints and engaging in intellectual discussions on various platforms. While this has broadened my horizons and allowed me to connect with like-minded individuals, I've also noticed the potential for information overload and the negative impact it can have on my mental clarity. It's a constant reminder to balance my online pursuits with offline experiences that rejuvenate my mind and soul. **Able (Moderator):** Now that we've discussed our overall perspectives on digital wellbeing, let's dive into our daily digital habits. How much time do you typically spend on screens each day, including smartphones, computers, tablets, and TVs? And what are the activities you engage in most frequently online, such as social media, gaming, or streaming? Please share your experiences and any positive or negative impacts you've noticed from your digital habits. **Noah:** Personally, I've found that incorporating regular digital detoxes into my routine has been incredibly beneficial. One weekend a month, I completely disconnect from all screens and immerse myself in nature, engaging in activities that ground me and allow me to reconnect with myself. This intentional break from technology helps me recharge and gain a fresh perspective, reminding me of the importance of being present in the physical world. It's a liberating experience that I would highly recommend to anyone seeking to strike a healthy balance between the digital realm and their mental wellbeing. **Able (Moderator):** It's fascinating to hear about your experiences and perspectives on digital wellbeing. Building on our discussion so far, I'm curious to know if any of you have tried any specific techniques or tools to limit screen time and improve your mental health? **William:** Hello everyone, my name is William. Coming from Malaysia, I work as an Event Planner. Technology has revolutionized the event planning industry, making tasks more efficient and communication easier. However, I also believe in maintaining a healthy balance. What I find important is adapting and being flexible to embrace new technologies, while also being mindful of the impact it can have on our mental wellbeing. Let's explore strategies that allow us to harness the benefits of technology without compromising our overall well-being. **Able (Moderator):** It's great to have such diverse perspectives in our group. Now, let's hear from William. Would you like to introduce yourself and share your relationship with technology? **Noah:** As a free-spirited and independent thinker, I've always valued personal freedom and individuality. When it comes to screen time and mental wellbeing, I believe it's essential to not only set boundaries but also question the societal norms around technology. Sometimes, going against the norm and following our own path can lead to a more fulfilling and balanced life. So, let's explore unconventional strategies and challenge the status quo in our pursuit of digital wellbeing. **Caleb:** I completely agree with David. In my experience, I've also found that setting specific time limits for certain activities can be helpful. For example, I allocate a fixed amount of time for professional work, social media, and leisure activities. This way, I can maintain a sense of control over my screen time and ensure that I'm not neglecting other important aspects of my life. It's all about finding a balance that works for us individually. *(Note: The transcript for "David" is not directly provided in the text blocks, but Caleb's statement implies David also shared a perspective on setting time limits.)* ### Key Observations - The diagram visually represents a focus group with a moderator ("Able") at the center. - Participants are identified by name: Noah, Amelia, William, David, and Caleb. - Noah is depicted as a larger participant (black border), potentially indicating a more prominent role or contribution, or simply a different visual representation. - The green circles for Amelia, William, David, and Caleb suggest they are the primary focus group members, while Noah's distinct representation might imply a slightly different status or a more detailed contribution. - The text provides context for the focus group, including its topic and duration. - Individual participant statements reveal diverse perspectives on screen time, digital habits, and mental wellbeing. - Noah discusses balancing online and offline activities and the benefits of digital detoxes. - William, an Event Planner, highlights technology's role in his profession and the importance of mindful balance. - Caleb emphasizes setting time limits for various activities to maintain control and balance. - Able acts as the moderator, guiding the discussion and posing questions. ### Interpretation This diagram and accompanying text illustrate a focus group session aimed at understanding the complex relationship between digital engagement and mental health. The visual representation, with the moderator at the core and participants arranged around, effectively conveys the structure of a facilitated discussion. The distinct visual treatment of "Noah" might suggest a more detailed or perhaps a more introspective contribution compared to the other participants, or it could simply be a stylistic choice. The transcribed contributions reveal a range of personal experiences and strategies for managing screen time. Participants are grappling with the dual nature of technology: its ability to broaden horizons and connect individuals, versus the risks of information overload and negative impacts on mental clarity. The emphasis on finding a "healthy balance" and employing strategies like "digital detoxes" and "setting specific time limits" underscores the proactive approach many are taking to mitigate potential downsides. The moderator, Able, plays a crucial role in guiding this exploration, ensuring diverse viewpoints are shared and encouraging deeper reflection on techniques for improving mental health in the digital age. The mention of "challenging societal norms" by Noah suggests a critical perspective on the pervasive nature of technology and the pressure to be constantly connected. Overall, the data suggests a group actively seeking to understand and optimize their digital wellbeing. </details> Figure 2. A web demo of the Focus Group simulation system. In accordance with the benchmark study conducted by OpenCompass (Contributors, 2023), the two most advanced Large Language Models (LLMs) available in the field at the time of writing are ChatGPT and GPT4. Pilot testing revealed that ChatGPT resulted in similar opinions compared with GPT4, after which we decided not to use the superior GPT4 due to its 20-fold increase in cost. Compared to direct prompts, our algorithmic framework improves the realism and comprehensiveness of the AI simulation, as corroborated in Figure 2. Initially, we attempted to employ a singular prompt to simulate focus group discussions. However, concerning both content and length, the generated outcomes significantly deviated from our expectations. In response to these challenges, we introduce the framework of our Focus Agent, featuring an AI moderator to guide the discussion process. As shown in Figure 1, this AI moderator generates some plans to divide the whole discussion into multiple stages, aligning with the distinct topic and aims of the focus group. Based on these guidelines, the AI moderator then facilitates a simulated focus group discussion with other AI entities as participants. Throughout the conversation, the moderator actively engages in reflection, responding to the dialogue of the participants by timely introducing pertinent questions to foster further discussion. We explained this process in detail in the online appendix. Within the simulated focus group, each participant represents an artificial intelligence entity. Experimenters are responsible for defining key parameters such as the topic, goals, overall duration, and specific characteristics of the participants, which include names, ages, occupations, nationalities, and personalities. In this setting, LLMs are tasked with understanding the context through assigned roles, typically categorised as system, user, and assistant. The system role involves attributing virtual personas to the LLMs, while the user and assistant roles are designed to aid in interpreting the context either from the viewpoint of the designated character or from that of others. To achieve this, we have developed a sequence of prompt designs, the details of which are provided in the online appendix. To simulate the focus group discussion as realistically as possible, we designed the algorithm of both moderator and participants. The role of the moderator within the focus group simulation system encompasses the critical responsibilities of guiding and orchestrating the discussion, which includes managing time allocation and steering the discourse topics. These responsibilities are reflected in the moderator’s thought chain, elucidated in Algorithm 1. We added a reflection mechanism at the end of every stage to compress the context of previous discussion to avoid memory lost. Time allocation is managed based on text lengths, with a convention of one hundred words equating to approximately one minute within the simulation. Algorithm 1 Moderator 0: $List:[Stages],List:[TimeArrangements]$ 0: $Str:Response$ for all $stage,time_{stage}\leftarrow Stages,TimeArrangements$ do $Response\leftarrow LLM(NewStagePrompt)$ $time_{cur}\leftarrow Estimate(Response)$ while $time_{cur}<time_{stage}$ do if Response from participants then $Response\leftarrow ParticipantResponse$ else if any participant is inactivate then $Response\leftarrow LLM(InactivateParticipantPrompt)$ else $Response\leftarrow LLM(InsightsPrompt)$ end if Update $time_{cur}$ according to $Estimate(Response)$ end while end for Algorithm 2 outlines the systematic approach adopted by each AI participant throughout the discussion, with their level of engagement assessed by the LLM. The LLM dynamically evaluates the ongoing conversation and the contributions of other AI participants to gauge engagement levels. AI participants are provided the latitude to contribute to the discussion uninterrupted unless they surpass the stipulated time allocation. In instances where participants opt to disengage or exhibit novel ideas, signalling a lull in the discourse, the moderator intervenes by posing new questions, drawing inspiration from the preceding discussions. In parallel, the moderator actively encourages less active participants to actively partake in the discourse. Participant activity is monitored through the detection of speaking times within the ongoing stage. Participants who exhibit negligible speaking activity or speaking three times less than those of the most speaking participants are categorised as inactive. Algorithm 2 Participants 0: $List:[Participants]$ 0: $Str:response$ repeat $engagements\leftarrow[]$ for all $participant$ in $Participants$ do $engagements$ add $LLM(EngagementPrompt)$ end for if $Max(engagements)\geq Threshold$ then $speaker\leftarrow Participants[Index(Max(engagements))]$ $Response\leftarrow LLM(PartResponsePrompt(speaker))$ end if until Finished ### 3.2. Voice-based Focus Agent with human participants To make sure the AI moderator can communicate with human participants efficiently, S2T and T2S are necessary. APIs provided by various companies are often suitable for many scenarios. However, they fall short of our specific needs for facilitating multi-participant discussions in focus groups due to limitations related to the length of input recordings and the absence of speaker differentiation. To address these challenges, we have developed our own S2T system, as depicted in Figure 3. This system processes long discussion audio by segmenting it into shorter sentence-length audio pieces, leveraging VAD for segmentation. Subsequently, it identifies the most similar participant from a database of participant voices and transcribes the audio using the open-source S2T model Whisper by OpenAI (Radford et al., 2022). To ensure participants have ample opportunity to express their views without undue interruption, the AI moderator is programmed to intervene only after a silence of 5 seconds, thus differing from approaches that might actively disrupt the conversation flow. <details> <summary>extracted/5830562/S2T.png Details</summary> ![ea6d0a97](/v1/image/ea6d0a97fc580fc00ce76dd026ac2afe4dda0ac4091ebab91c80f627fd43e9b8) ### Visual Description ## Diagram: Audio Processing and Speaker Identification System ### Overview This diagram illustrates a system designed to process input audio, detect silence, perform speech recognition, retrieve voiceprints from a database, and identify speakers based on their voice characteristics. The system also involves a Large Language Model (LLM) and a transcript generation component. ### Components/Axes The diagram consists of several interconnected components represented by text labels, shapes, and arrows indicating data flow. There are no explicit axes or legends as this is a process flow diagram. **Key Components and Labels:** * **Input Audio:** Represented by an icon of sound waves, indicating the initial audio input. * **Voice Activate Detection:** A process that analyzes the input audio. * **Silence?:** A decision point (diamond shape) that checks for silence in the audio. * **"N" (No):** An arrow labeled "N" branches from the "Silence?" decision point, indicating the path taken when silence is not detected. This path leads to "Speech Recognition." * **"5 seconds":** An arrow labeled "5 seconds" branches from the "Silence?" decision point, indicating a time delay or a condition related to silence. This path leads to the "LLM." * **Speech Recognition:** A process that converts spoken audio into text. * **Waveform Visualizations:** Multiple rectangular boxes contain waveform representations of audio segments. These are labeled as: * "Sentence 1" (under the output of Speech Recognition) * "Sentence 1", "Sentence 2", "Sentence 3" (underneath three separate waveform visualizations originating from the Voiceprint Database) * Three waveform visualizations are shown as outputs from "Speaker Retrieval." * **Voiceprint Database:** A cylindrical shape labeled "Voiceprint Database" containing "Participants Voiceprints." Arrows point from this database to three individual waveform visualizations labeled "Sentence 1," "Sentence 2," and "Sentence 3." * **Speaker Retrieval:** A process that retrieves speaker-specific voice information. * **Transcripts:** A rectangular box labeled "Transcripts" containing entries: * "........" (ellipsis indicating continuation) * "Participant 1 Sentence 1" * "Participant 2 Sentence 2" * "Participant 1 Sentence 3" * **LLM:** A rectangular box labeled "LLM" (Large Language Model). ### Detailed Analysis or Content Details The diagram outlines the following process flow: 1. **Input Audio:** Raw audio is fed into the system. 2. **Voice Activate Detection:** This initial stage analyzes the audio. 3. **Silence Detection:** A decision is made on whether the audio segment contains silence. * If **No** (audio is not silent), the audio proceeds to **Speech Recognition**. * If **Yes** (silence is detected), a path is taken that involves a **5 seconds** delay or condition, leading to the **LLM**. 4. **Speech Recognition:** The audio segment (when not silent) is processed by speech recognition. The output of this stage is a waveform visualization labeled "Sentence 1." 5. **Voiceprint Database Interaction:** * The "Voiceprint Database" containing "Participants Voiceprints" is accessed. * Three distinct waveform visualizations, labeled "Sentence 1," "Sentence 2," and "Sentence 3," are shown as being retrieved from this database. These likely represent voiceprints of different sentences from different participants. 6. **Speaker Retrieval:** The retrieved voiceprints are processed by the "Speaker Retrieval" module. The output of this stage is three waveform visualizations, presumably representing the identified speaker's voice segments. 7. **LLM and Transcripts:** * The **LLM** receives input from the "Silence?" decision point (after a 5-second delay/condition). * The **LLM** also receives input from the "Speaker Retrieval" process. * A "Transcripts" box is shown, containing entries like "Participant 1 Sentence 1," "Participant 2 Sentence 2," and "Participant 1 Sentence 3." This suggests that the LLM, in conjunction with speaker retrieval and potentially speech recognition outputs, generates or refines these transcripts. ### Key Observations * The system differentiates between silent and non-silent audio segments. * Silence detection has a direct impact on the LLM's input, suggesting a role in managing context or triggering specific LLM functions based on audio activity. * There is a clear separation between speech recognition (converting audio to text) and speaker retrieval (identifying the speaker based on voice characteristics). * The Voiceprint Database is central to speaker identification, providing reference voiceprints. * The LLM appears to be a core processing unit, integrating information from silence detection and speaker retrieval to produce or enhance transcripts. * The "Transcripts" section implies that the system aims to associate spoken content with specific participants and sentences. ### Interpretation This diagram depicts a sophisticated audio processing pipeline that combines speech recognition, speaker identification, and natural language processing (via the LLM). The system's primary goal seems to be the accurate transcription and speaker attribution of audio content. The "Silence?" decision point and the associated "5 seconds" path to the LLM suggest that the system might be designed to handle pauses or periods of inactivity in a specific way. This could be for buffering, context management, or to avoid processing irrelevant audio segments. The direct connection from "Silence?" to LLM implies that the LLM might be informed about the presence or absence of speech, which could influence its subsequent processing or output. The "Voiceprint Database" and "Speaker Retrieval" components are crucial for multi-speaker scenarios. By comparing incoming audio segments against stored voiceprints, the system can identify who is speaking. This is a common requirement in applications like call center analytics, meeting transcription, or voice-activated assistants that need to distinguish between users. The "Transcripts" section, populated with participant and sentence information, indicates that the final output is not just raw text but also includes metadata about the speaker and the content they uttered. The LLM's role in this final stage is likely to refine the accuracy of the transcripts, potentially correcting errors from the speech recognition module, or even inferring context and meaning based on the identified speaker and the transcribed text. The ellipsis in the "Transcripts" box suggests that the system can handle a variable number of participants and sentences. In essence, the diagram illustrates a system that aims to provide a structured and attributed textual representation of spoken audio, leveraging both acoustic and linguistic processing. The integration of a voiceprint database and an LLM suggests a focus on accuracy, speaker verification, and potentially advanced natural language understanding. </details> Figure 3. Speech to Text system. We divided long audio recording into short pieces with voice activity detection. Then we transcribed the short audio pieces and recognised the speaker according to the voiceprints collected in advance from the participants. In order to incorporate T2S functionality into our system, we leveraged the Google TTS API https://console.cloud.google.com/speech/text-to-speech. To allow some participants who are interested in discussion in an immersive environment, we established the focus group environment within Mozilla Hubs https://hubs.mozilla.com, a Virtual Reality (VR) platform. ## 4. Pilot Study To enhance user experience, we conducted a pilot study with four volunteers before the main user study to assess the system’s stability and the AI moderator’s effectiveness. The pilot included a 50-minute focus group discussion and a 30-minute feedback session on the AI agent’s performance. Feedback from the pilot highlighted areas for improvement, which were addressed to optimise the user study: 1. Human participants may not always have insights for every query, unlike AI participants who consistently generate new content. Observations showed the AI moderator might repeat questions if there were no responses, leading to stagnation. We adjusted the AI moderator’s protocol to move on if no further responses were forthcoming. 1. Anonymity in Summaries: Volunteers were uncomfortable with being mentioned by name in summaries. We revised the process to ensure participant anonymity, enhancing comfort levels. 1. Conciseness of Questions: The long content generated by LLMs are not ideal for verbal interactions. We refined prompts to yield shorter responses. Additionally, we assessed the S2T system’s accuracy to ensure comprehensive transcription and understanding by the agent. The Word Error Rate (WER) WER is a metric for gauging speech-to-text conversion accuracy, calculated as $WER=(S+D+I)/N$ , where $S$ denotes substitutions, $D$ deletions, $I$ insertions, and $N$ the total number of words in the reference text. served as the evaluation metric. Professional human transcribers typically achieve a WER of 11.3% in open conversational settings (Xiong et al., 2016). Using this as a benchmark, we found our S2T system achieved a WER of 4.6%, demonstrating commendable accuracy. For speaker identification, our system achieved a micro F1 Score of 0.81 using the EN2001 audio segment from the AMI Corpus (Carletta, 2006), highlighting its capability in recognising speakers. The pilot study indicated the agent exhibited no significant misunderstandings of the conversations. ## 5. User Study To investigate our research questions, we designed a user study that involved human participants engaging in focus group discussions on the theme of “digital well-being,” alongside simulations of focus groups centred around the same topic. The objective of these sessions was to study individual practices in managing screen time and their perceptions of its impact on mental health. The choice of “digital well-being” as the focal topic was strategic, given its universal relevance, which facilitated participant recruitment. Participants had the option to join the focus groups either via a VR headset or through their personal computers, aiming for device consistency within groups to streamline the discussion dynamics, as shown in Figure 4. Demographics. Our recruitment efforts yielded 23 participants, where we assigned 11 to join with VR headset and 12 to join with their own personal computer. The participant pool had an average age of 30 years ( $min=18$ , $max=60$ , $SD=10$ ), distributed across five groups–three with VR headset and two with desktop. Each group comprised 3 to 6 individuals, ensuring a diverse range of perspectives and experiences. The selection of the total number of groups is based on previous work (Guest et al., 2017), which has demonstrated that five groups are optimal for focus group studies. Procedure. The user study included three distinct components: a primary focus group involving human participants (hereafter referred to as “ focus group ”), a meta focus group where human participants convened to reflect on their experiences within the focus group (hereafter referred to as “ meta focus group ”), and a simulated focus group with AI entities as participants (hereafter referred to as “ focus group simulation ”). First, participants submitted a one-minute self-introduction audio recording before the focus group. This recording collected demographic information (age, prior focus group experience, and daily screen usage) and provided a unique voice print for each participant. This data initialised the AI participants in the simulation. We assessed English proficiency based on the accuracy of the S2T results from their recordings. Then participants accessed the designated meeting rooms in Mozilla Hubs. For VR groups, our team provided VR headsets (Quest series or Vive Pro), while the desktop group used their own PCs. Once all participants were ready, the researcher started the system, and the AI moderator began moderating the focus group. An author observed and recorded essential information throughout the sessions. The sessions were scheduled for 60 minutes, with an actual average duration of 51 minutes ( $SD=13minutes$ ). Following the conclusion of each focus group discussion, a meta focus group was conducted. This session spanned approximately 20 minutes and was facilitated by one of the authors. The topic of the meta focus group mainly focuses on two points: the experience of focus group discussion and the attitude to the AI moderator. At the end, each participant received a 10€ gift card as compensation. This study was reviewed and approved by the university’s ethics review board. <details> <summary>extracted/5830562/FocusAgent.png Details</summary> ![f1ecdb79](/v1/image/f1ecdb796c57e4dfb99c2268a017c8331aa87d90e3c04bbd6481e786a686785e) ### Visual Description ## Virtual Environment: Meeting Room Scene ### Overview The image depicts a 3D rendered scene of a virtual meeting room. The perspective is from behind a person seated at a large conference table, looking towards the front of the room. The scene includes furniture typical of a meeting room, such as a conference table, chairs, and windows. There are also two non-player characters (NPCs) or avatars present: one labeled "Moderator" and another in the form of a cartoon tiger. The lighting suggests a daytime setting with sunlight streaming through the windows, casting shadows on the carpeted floor. ### Components/Axes This image is not a chart or graph, therefore, it does not have axes or legends in the traditional sense. However, there are several textual labels and visual elements that can be identified: * **Label:** "EXIT" - Located on a green illuminated sign above a doorway on the left side of the image. * **Label:** "Moderator" - A black rectangular label with white text, positioned above a purple and white robot-like avatar. * **Avatar 1:** A purple and white robot-like figure, standing near the doorway on the left. It appears to be a moderator or facilitator. * **Avatar 2:** A cartoon tiger avatar, positioned in the middle of the room, closer to the conference table. * **Conference Table:** A large, oval-shaped table with a polished wooden surface, occupying the center of the room. * **Chairs:** Several dark-colored office chairs are visible around the conference table and in the foreground. * **Windows:** Multiple large windows with white panes are present on the far wall, suggesting an exterior view, though the view itself is obscured by bright white light. * **Brick Wall Texture:** The wall behind the windows has a visible brick texture. * **Wooden Cabinetry:** A dark wooden cabinet runs along the wall beneath the windows. * **Ceiling:** The ceiling has recessed lighting and a central rectangular panel. * **Doorway:** An open doorway on the left leads to another area, with a yellow line marking the floor just inside the doorway. * **Floor:** The floor is carpeted in a neutral grey tone. ### Detailed Analysis or Content Details The scene is rendered in a stylized, possibly game-like, 3D environment. * **Foreground:** The viewpoint is from a person (or avatar) seated at the conference table, with the back of their head and shoulders visible. The chair they are in is a dark office chair with wheels. * **Midground:** The large conference table dominates this area. The "Moderator" avatar is to the left, near the entrance, and the tiger avatar is further into the room, closer to the table. The tiger avatar appears to be floating slightly above the floor. * **Background:** The far wall features large windows, a wooden cabinet, and a brick texture. The lighting from the windows is very bright, creating a washed-out effect. * **Lighting and Shadows:** Strong directional light is coming from the windows, casting distinct shadows across the carpeted floor, indicating the position of the sun or a strong light source. ### Key Observations * The presence of distinct avatars ("Moderator" and tiger) suggests an interactive virtual environment, possibly for meetings, social gatherings, or gaming. * The "Moderator" label and the robot-like appearance of that avatar imply a role of control or guidance within the virtual space. * The bright, washed-out windows suggest a high dynamic range rendering or a deliberate artistic choice to emphasize the interior. * The yellow line on the floor near the exit might indicate a safety marking or a designated path. ### Interpretation This image portrays a virtual meeting space designed to simulate a real-world conference room. The inclusion of avatars and labels like "Moderator" points towards a functional application of this virtual environment, likely for communication and collaboration in a digital setting. The scene is set up for a meeting, with the conference table and chairs arranged for participants. The presence of the tiger avatar adds a whimsical or perhaps customizable element, common in virtual worlds. The overall impression is of a functional yet stylized virtual space, emphasizing the capabilities of 3D rendering for creating interactive environments. The lighting and shadows contribute to the realism and depth of the scene, making it feel more immersive. The image effectively captures a moment within a virtual interaction, hinting at the activities and social dynamics that might occur within such a space. </details> Figure 4. Users participant focus group using Focus Agent in VR environment. ## 6. Result Analysis Following the methodological framework proposed by Gerling et al. (Gerling et al., 2020), we employed both thematic and content analyses to scrutinise the transcripts derived from the focus group and focus group simulation sessions. Additionally, thematic analysis was specifically applied to the meta focus group discussions to collect participant feedback. For the transcription of data from the user study, we utilised the outputs from our S2T system, subsequently refining these transcripts against the recorded audio by two researchers. The final evaluation of our S2T system showcased a WER of 2.5% and an F1 score of 0.9, indicating a level of performance sufficiently reliable for the purposes of our study. Due to recording issues, the data from the third focus group session was incomplete. The transcription for this group was reconstructed based on recollections and notes taken by an observer, and consequently, this data was not included in the accuracy assessment of the S2T system. The initial analysis was conducted by the lead author, with the findings subsequently reviewed and validated by the co-authors. ### 6.1. Focus Group In the thematic analysis conducted on the transcriptions from both the human focus group and the focus group simulations, we elicited distinct themes related to our study topic. From the transcriptions, we identified four central themes. In contrast, the focus group simulations revealed five themes, incorporating an additional theme focused on the challenges associated with controlling screen time. This discrepancy mainly came from the differences in moderation performance between the two groups. In the focus group simulations, the AI moderator tends to guide AI participants to engage more deeply with the topics. While human participants in the focus group did broach additional topics, these were less related to the central theme of discussion, highlighting a contrast in how thematic expansion was handled across the two settings. <details> <summary>extracted/5830562/Content_Analysis.png Details</summary> ![e18fbef1](/v1/image/e18fbef19456ed88a427b85ae0359ecba9999d76605a8d66ada888514bdb33e3) ### Visual Description ## Venn Diagrams: Daily Screen Usage, Impact of Long Screen Time, Strategies to Balance, and Effect After Balance ### Overview This image presents four Venn diagrams, each illustrating different aspects related to screen time. The top row displays "Daily Screen Usage" and "Impact of Long Screen Time," while the bottom row shows "Strategies to Balance" and "Effect After Balance." Each diagram uses colored circles to represent categories and lists various activities, impacts, or strategies within these categories and their intersections. A legend in the top right corner associates colors (green, pink, beige) with "AI," "Human," and "Both," respectively, though this legend appears to be relevant only to the "Impact of Long Screen Time" diagram. Numerical values are present outside and within the circles, likely representing counts or proportions. ### Components/Axes **General Legend (Top Right):** * **Green Square:** AI * **Pink Square:** Human * **Beige Square:** Both **Diagram 1: Daily Screen Usage** * **Title:** Daily Screen Usage * **Circles:** * **Pink Circle (Left):** Represents activities primarily associated with "Human" (based on legend, though not explicitly stated for this diagram). Contains labels: "Navigate," "Take Photos," "Shop," "Social Media." * **Green Circle (Right):** Represents activities primarily associated with "AI" (based on legend, though not explicitly stated for this diagram). Contains labels: "Volunteer," "Work." * **Beige Overlap (Center):** Represents activities associated with "Both" (based on legend, though not explicitly stated for this diagram). Contains labels: "Study," "Read E-book," "Watch Videos," "Keep Relationships," "Play Games," "Update News." * **External Numbers:** * "3" (bottom-left, pink): Likely represents items exclusively in the pink circle. * "1" (bottom-right, green): Likely represents items exclusively in the green circle. * "8" (bottom-center, beige): Likely represents items exclusively in the beige overlap. **Diagram 2: Impact of Long Screen Time** * **Title:** Impact of Long Screen Time * **Circles:** * **Pink Circle (Left):** Labeled "Human" in the legend. Contains labels: "Addicted," "Stressed," "Comparison," "FOMO," "anxiety or depression due to negative content from social media," "Increased Sense of Belonging," "Insomnia." * **Green Circle (Right):** Labeled "AI" in the legend. Contains labels: "Insecure," "Impaired focus," "Reduced ability for face-to-face interaction," "Limited creativity." * **Beige Overlap (Center):** Labeled "Both" in the legend. Contains labels: "Overwhelmed," "Self-doubt," "relaxed," "Guilty." * **External Numbers:** * "1" (bottom-left, pink): Likely represents items exclusively in the pink circle. * "4" (bottom-right, green): Likely represents items exclusively in the green circle. * "13" (bottom-center, beige): Likely represents items exclusively in the beige overlap. **Diagram 3: Strategies to Balance** * **Title:** Strategies to Balance * **Circles:** * **Pink Circle (Left):** Contains labels: "Go For a Walk," "Setting Boundaries," "Digital Declutter," "Engage in Offline Hobbies or Activates," "Reminders to Monitor Screen Time," "Screen Blocking," "Limit Screen Time with Apps." * **Green Circle (Right):** Contains labels: "Set 'Screen-Free' days," "Practicing Mindfulness," "Digital Minimalism," "Technology-Free Zones," "Create a support system with friends." * **Beige Overlap (Center):** Contains labels: "Regular Exercise," "Setting Goals for Online Activates," "Schedule a Fulfilling Life," "Self-Reflection." * **External Numbers:** * "0" (left, pink): Likely represents items exclusively in the pink circle. * "5" (bottom-right, green): Likely represents items exclusively in the green circle. * "11" (bottom-left, beige): Likely represents items exclusively in the beige overlap. **Diagram 4: Effect After Balance** * **Title:** Effect After Balance * **Circles:** * **Pink Circle (Left):** Contains labels: "stressful." * **Green Circle (Right):** Contains labels: "presence," "fulfilled," "relaxed." * **Beige Overlap (Center):** Contains labels: "anxiety," "isolation." * **External Numbers:** * "1" (bottom-left, pink): Likely represents items exclusively in the pink circle. * "3" (bottom-right, green): Likely represents items exclusively in the green circle. * "2" (bottom-center, beige): Likely represents items exclusively in the beige overlap. ### Detailed Analysis or Content Details **Daily Screen Usage:** * **Pink (Human-centric):** Navigate, Take Photos, Shop, Social Media. External count: 3. * **Green (AI-centric):** Volunteer, Work. External count: 1. * **Beige (Both):** Study, Read E-book, Watch Videos, Keep Relationships, Play Games, Update News. External count: 8. **Impact of Long Screen Time:** * **Pink (Human):** Addicted, Stressed, Comparison, FOMO, anxiety or depression due to negative content from social media, Increased Sense of Belonging, Insomnia. External count: 1. * **Green (AI):** Insecure, Impaired focus, Reduced ability for face-to-face interaction, Limited creativity. External count: 4. * **Beige (Both):** Overwhelmed, Self-doubt, relaxed, Guilty. External count: 13. **Strategies to Balance:** * **Pink (Exclusive):** Go For a Walk, Setting Boundaries, Digital Declutter, Engage in Offline Hobbies or Activates, Reminders to Monitor Screen Time, Screen Blocking, Limit Screen Time with Apps. External count: 0. * **Green (Exclusive):** Set 'Screen-Free' days, Practicing Mindfulness, Digital Minimalism, Technology-Free Zones, Create a support system with friends. External count: 5. * **Beige (Overlap):** Regular Exercise, Setting Goals for Online Activates, Schedule a Fulfilling Life, Self-Reflection. External count: 11. **Effect After Balance:** * **Pink (Exclusive):** stressful. External count: 1. * **Green (Exclusive):** presence, fulfilled, relaxed. External count: 3. * **Beige (Overlap):** anxiety, isolation. External count: 2. ### Key Observations * **Daily Screen Usage:** "Social Media" is a prominent activity in the "Both" category, suggesting it's a common activity for both human and AI-related contexts (or perhaps a broad category encompassing both). "Work" and "Volunteer" are categorized as "AI" activities, which is an unusual categorization for "Volunteer." "Study" and "Read E-book" are also in the "Both" category. * **Impact of Long Screen Time:** The "Both" category has the highest count (13), indicating that "Overwhelmed," "Self-doubt," "relaxed," and "Guilty" are common impacts experienced by both humans and AI (or in both contexts). Negative impacts like "anxiety or depression" and "Insomnia" are listed under the "Human" category. "Reduced ability for face-to-face interaction" and "Limited creativity" are listed under the "AI" category, which is a peculiar attribution for AI. * **Strategies to Balance:** The "Overlap" category for strategies (11) is the largest, suggesting that activities like "Regular Exercise," "Setting Goals," "Schedule a Fulfilling Life," and "Self-Reflection" are considered multifaceted or beneficial across different aspects of balance. The "Human" exclusive strategies (0) are notably absent in this count, implying all listed strategies are either shared or have some overlap. * **Effect After Balance:** The "Overlap" category for effects after balance (2) is the smallest, with "anxiety" and "isolation." The "AI" exclusive effects (3) are "presence," "fulfilled," and "relaxed," which are generally positive outcomes. The "Human" exclusive effect (1) is "stressful." ### Interpretation The collection of Venn diagrams provides a conceptual framework for understanding screen time, its impacts, and strategies for balance. 1. **Daily Screen Usage:** This diagram suggests a categorization of daily activities based on their perceived association with human or AI contexts. The significant overlap in the "Both" category for activities like "Study," "Watch Videos," and "Social Media" highlights their ubiquity. The categorization of "Volunteer" and "Work" under "AI" is a notable point of interpretation, potentially reflecting a perspective on how these activities are facilitated or perceived in an increasingly automated or digitally-driven world, or it might be a misapplication of the legend. 2. **Impact of Long Screen Time:** This diagram illustrates the negative consequences of excessive screen time, with a strong emphasis on psychological and emotional impacts. The high count in the "Both" category for "Overwhelmed" and "Self-doubt" suggests these are widespread issues. The attribution of impacts like "Insecure," "Impaired focus," and "Limited creativity" to "AI" is highly unconventional and warrants further investigation into the intended meaning. It's possible "AI" here refers to the digital environment or systems rather than artificial intelligence itself, or it represents a novel perspective on how AI systems might be affected by or contribute to these issues. The legend's explicit labeling of colors to "AI" and "Human" makes this interpretation critical. 3. **Strategies to Balance:** This diagram outlines various approaches to mitigate the negative effects of screen time. The large overlap in strategies suggests that many effective methods are multifaceted and contribute to overall well-being. The absence of any "Human" exclusive strategies (count of 0) implies that all listed strategies are either shared or have some degree of overlap, reinforcing the idea of holistic approaches to balance. 4. **Effect After Balance:** This diagram contrasts the outcomes of balancing screen time. The positive outcomes like "presence," "fulfilled," and "relaxed" are associated with the "AI" category, while "stressful" is linked to "Human." The overlap of "anxiety" and "isolation" suggests these are common challenges that can arise even when attempting balance, or perhaps are residual effects. The interpretation of "AI" here is again crucial; if it refers to the digital environment, then positive states are associated with it, while "stressful" is a human-specific outcome. If "AI" refers to artificial intelligence, this implies AI systems can achieve states of "presence" and "fulfillment" after balance, which is a speculative interpretation. **Overall Interpretation:** The diagrams collectively suggest a complex relationship between humans and technology, particularly concerning screen time. There's an acknowledgment of both the utility and the detrimental impacts of screen usage. The "Impact of Long Screen Time" diagram, in particular, presents some unconventional attributions to "AI," which might indicate a specific theoretical model or a metaphorical use of the term. The "Strategies to Balance" and "Effect After Balance" diagrams offer a more conventional view, emphasizing holistic approaches and the pursuit of positive states like presence and fulfillment. The numerical values, while not providing precise data points, offer a relative sense of the prevalence of different categories within each diagram. The consistent use of Venn diagrams implies a focus on overlapping and distinct elements within each thematic area. </details> Figure 5. Content analysis according to the themes from both focus group and focus group simulation, font size indicates the frequency of the codes In our content analysis, we derived 39 unique codes from the focus group transcriptions and 47 from the focus group simulations, each reflecting various facets of the discussion topic. To compare the perspectives of AI and human participants, we illustrated the overlap and divergence of these codes through a Venn diagram, as showcased in Figure 5. The analysis revealed that the majority of opinions expressed by human participants were also covered by AI participants. Interestingly, AI participants introduced several viewpoints not raised by their human counterparts, such as volunteering online during daily screen usage, adding additional dimensions to the discussion. Another observation from this analysis is the tendency of AI participants to express similar opinions more than human participants across different focus group sessions. The data referenced in Figure 6 reveal a discrepancy in code generation between simulation and focus groups. Each iteration of the focus group can collect similar number of codes. The result indicates that simulations of focus groups tend to generate higher repetition of identical codes. Following several iterations, the aggregate of unique codes converges, suggesting that the most common opinions have been collected. At this point, AI participants can not generate new codes, whereas human participants continue to demonstrate potential for such creativity. This observation underscores the tendency of AI to produce more common opinions, while human participants display greater variance and individuality in their perspectives. <details> <summary>extracted/5830562/CodeNumberChangeWithGroup.png Details</summary> ![452dd2d7](/v1/image/452dd2d7504c66d1d8175ab413a26789887207a0a72e2ff693f5c9ed18c62b56) ### Visual Description ## Bar Chart: Code Number Increase with Groups ### Overview This image is a bar chart that displays the number of unique codes generated by "Human" and "AI" across five different "Number of group" categories. The chart visually compares the performance of Human and AI in generating unique codes as the group number increases. ### Components/Axes * **Title:** "Code Number Increase with Groups" * **Y-axis Title:** "Unique Codes" * **Y-axis Scale:** Ranges from 0 to 50, with major tick marks at intervals of 5 (0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50). * **X-axis Title:** "Number of group" * **X-axis Categories:** Labeled 1, 2, 3, 4, and 5. * **Legend:** Located at the bottom center of the chart. * **Blue Square:** Represents "Human". * **Orange Square:** Represents "AI". ### Detailed Analysis The chart presents paired bars for each group number, with blue bars representing "Human" and orange bars representing "AI". * **Group 1:** * Human (Blue bar): Approximately 12 unique codes. * AI (Orange bar): Approximately 38 unique codes. * **Group 2:** * Human (Blue bar): Approximately 23 unique codes. * AI (Orange bar): Approximately 45 unique codes. * **Group 3:** * Human (Blue bar): Approximately 35 unique codes. * AI (Orange bar): Approximately 46 unique codes. * **Group 4:** * Human (Blue bar): Approximately 37 unique codes. * AI (Orange bar): Approximately 47 unique codes. * **Group 5:** * Human (Blue bar): Approximately 39 unique codes. * AI (Orange bar): Approximately 47 unique codes. **Trend Verification:** * **Human (Blue bars):** The number of unique codes generated by humans shows a generally upward trend from Group 1 to Group 5. The increase is most significant between Group 1 and Group 3, with smaller increases thereafter. * **AI (Orange bars):** The number of unique codes generated by AI also shows an upward trend from Group 1 to Group 4. The number of unique codes generated by AI appears to plateau or increase very slightly between Group 3, Group 4, and Group 5. ### Key Observations * **AI consistently outperforms Human:** In all five groups, AI generates a significantly higher number of unique codes compared to humans. * **Growth Rate:** While both Human and AI show an increase in unique codes with larger group numbers, the AI's performance starts at a much higher baseline and its growth rate appears to slow down or plateau at higher group numbers, whereas the human's growth rate is more consistent, albeit from a lower starting point. * **Largest Gap:** The largest absolute difference in unique codes between AI and Human occurs in Group 1 (approximately 26 codes). * **Smallest Gap:** The smallest absolute difference in unique codes between AI and Human occurs in Group 3 (approximately 11 codes) and Group 5 (approximately 8 codes). ### Interpretation This bar chart demonstrates a clear advantage of AI over humans in generating unique codes, particularly at the initial stages (Group 1). The data suggests that AI is more efficient or capable of producing a wider variety of codes from the outset. The trend of increasing unique codes with larger group numbers for both Human and AI indicates that the complexity or scope of the task (represented by "Number of group") positively influences the diversity of generated codes. However, the plateauing of AI's performance at higher group numbers, while humans continue to show some increase, might suggest that AI reaches a saturation point or its current algorithm is optimized for a certain level of complexity. Conversely, humans might have a more adaptable or exploratory approach that allows for continued generation of novel codes even at higher group numbers, though at a much lower overall volume. The data could imply that for tasks requiring a high volume of diverse codes, AI is superior. However, if the goal is to explore a broader range of possibilities or if the task complexity continues to increase significantly, human performance might become relatively more competitive in terms of incremental improvement, even if the absolute numbers remain lower. Further investigation could explore the nature of these "codes" and the specific task being performed to understand the underlying reasons for these performance differences. </details> Figure 6. Unique code number increased according to the round of focus group and focus group simulation. ### 6.2. Meta Focus Group According to the transcriptions of the meta focus group, we coded 51 data points and identified three main themes. We derived three themes from the data: 1) User Experiences of the Virtual Focus Group; 2) User Attitudes towards the Focus Agent, which is further divided into two sub-themes: a) Positive Attitudes, and b) Negative Attitudes; and 3) Feedback on the Virtual Focus Group System. #### Theme 1: User Experiences of Focus Group. A majority of participants conveyed satisfaction with the focus group discussions, highlighting several reasons. For many, the topics discussed were directly relevant to their daily lives, adding value to their participation. As one participant explained, “ I think it’s great to discuss these topics because that’s what we deal with every day. ” (G5, P4). Furthermore, participants appreciated the diversity of perspectives present, valuing the opportunity to exchange experiences. An exemplifying statement reads, “ I think you bring up so many great points. It’s very enriching to hear different perspectives. ” (G5, P3). At the end of the discussions, the moderator inquired whether participants had any additional opinions on the topic that they had not had the opportunity to express during the session. All participants confirmed that they had no further insights to share, indicating that the discussions had comprehensively covered the topic from their perspectives. #### Theme 2: Attitude to the Focus Agent. The second theme encapsulates the users’ feedback and experiences with Focus Agent. This theme is divided into two categories: positive and negative, to provide a clearer understanding of the users’ attitudes towards Focus Agent. SubTheme 1: positive attitude. A prevalent sentiment among participants was their appreciation for the guidance offered by the Focus Agent, acknowledging its efficacy in steering the discussions. As an example, one participant remarked, “ The moderator kind of did a good job by posing questions that allowed us to express our thoughts and encouraged other participants to share their sentiments on the topic. ” (G4, P1). Furthermore, three participants specifically commended the Focus Agent’s clear articulation in English, while an additional participant admired the agent’s friendly demeanour. SubTheme 2: Negative attitude. The prevailing sentiment among participants leaned towards dissatisfaction with the Focus Agent’s performance. A recurring concern revolved around the repetition of questions, as one participant articulated, “ I found it somewhat confusing at times since the moderator repeated the questions several times, which we had already discussed ” (G1, P2). Another noteworthy issue was the perceived lack of intellectual acumen exhibited by the Focus Agent during discussions. For example, one participant expressed, “ I don’t believe it possesses true intelligence, nor does it seem capable of comprehending all the information we’ve conveyed, let alone guiding us into more profound and coherent discussions ” (G5, P2). At last, some biases were identified in the discussion, notably in steering participants towards articulating the adverse effects associated with prolonged screen use, “ When discussing the impact of long screen using time, I felt that the AI moderator tried to demonise the technology. (G1, P1)”. #### Theme 3: Feedback on virtual focus group system. The third theme encapsulates certain system issues encountered during the use of Focus Agent. A concern raised by some participants was the insufficient time allocated for responding to questions, resulting in interruptions by the agent. As articulated by one participant, “ There were instances where we were attempting to respond to a question or had just commenced our response when the moderator interrupted us and swiftly moved on to the next question ” (G3, P3). Furthermore, two participants recommended the incorporation of subtitles to augment their understanding of the questions posed. ## 7. Discussion In this discussion, we address the RQs through our findings and expand on the underlying reasons informed by our analysis. ### 7.1. RQ1: To what extent do the opinions generated by a LLM align with those of human participants in focus group? The content analysis of the focus group discussions revealed that opinions generated by AI tend to encompass a wide array of human perspectives within the designated topic. Nevertheless, these AI-generated opinions often reflected more common viewpoints, demonstrating a lack of the uniqueness commonly found in human responses. A possible explanation is that, unlike human participants, who dynamically build upon previous contributions and enrich discussions with personal experiences, AI responses largely appeared as potentially plausible experiences that might happen to people. This observation suggests that LLMs could serve as a tool for researchers aiming to streamline the focus group process with human participants. By deploying a Focus Agent, researchers could initially gather a broad spectrum of common opinions on a specific topic, thereby setting a foundational understanding of the expected participant responses. This could further assist in refining the focus group’s questions and topics, making the discussion more targeted and efficient. Therefore, fewer human focus group sessions may be required to confirm the AI-generated content and identify novel insights from participants, optimising the research process while still uncovering the unique, creative perspectives that only human participants can provide. However, human participants are still necessary for current focus groups to make sure the data is reliable. ### 7.2. RQ2: To what extent is a LLM effective in performing the duties of a moderator in focus group discussions? During the focus group simulations, LLMs demonstrated sufficient knowledge to facilitate the group and engage with AI participants effectively. Feedback from the meta focus group indicated that human participants acknowledged the AI moderator’s capability to support the discussion, albeit perceiving it more as a tool rather than a sentient interlocutor. This perception was attributed to the AI moderator’s lack of apparent intelligence in interactions, such as overlooking participant requests, posing repetitive questions, or failing to grasp the hints behind conversations. The primary challenges are rooted in the inherent limitations of LLMs in navigating multi-person dialogues. For LLMs to respond appropriately, they must comprehend inputs from human participants, reason through the conversation’s context, and formulate accurate responses. While their reasoning capacity seemed adequate during simulations, issues predominantly arose in understanding and response generation phases. Existing research has begun to address LLMs’ comprehension issues when assisting humans, yet their effectiveness in multi-participant discussions remains constrained (Dong et al., 2023). From an understanding standpoint, discussions among human participants often involve colloquial language and incomplete sentences, differing markedly from the more structured exchanges with AI, leading to the AI moderator’s difficulties in recognising whether questions had been answered. Consequently, the AI moderator might repetitively address the same points rather than progressing the discussion. Additionally, challenges in generating aligned and unbiased content persist within LLM outputs (Wang et al., 2023b; Taubenfeld et al., 2024). Although not the primary focus of our study, participants noted issues such as deviation from guidelines or biases in the discussion (see subsection 6.2), underscoring the LLMs’ limitations in mimicking human conversational norms accurately. Given these observations, we advise against deploying the Focus Agent as the sole moderator in focus group discussions due to the current inadequacies in human-AI communication. Instead, the AI-generated summaries and questions could be utilised by human moderators to streamline the discussion flow and address specific topics. For more in-depth discussions, the presence of a human moderator is essential to ensure a positive user experience and foster the generation of innovative insights, highlighting the complementary roles of AI and human moderators in enhancing the efficacy of focus groups. ### 7.3. Improvement of Focus Agent Based on the process of the user study and the insights gathered during the meta focus group, several areas for enhancing the structure and functionality of the Focus Agent have been identified: 1. Design of thought chain: Although the thought chain in our work shows enough ability to facilitate the focus group discussion, to be able to facilitate a deeper topic during discussion a more complex design is required, for example one such as the tree of thoughts (Yao et al., 2023). 2. Subtitles for the Focus Agent’s speech: Participants suggested that the speech of the Focus Agent might be too long for them to be able to comprehend a question in its entirety. In this case, subtitles would be a useful help. 3. Time schedule of Focus Agent: the Time allocation was a pre-determined time duration. However, the time should be allocated according to the participants’ engagement in the current discussion. In this case, the Focus Agent should make dynamic time allocations based on the flow of the discussion. ## 8. Limitation and Future Work Our investigation underscores several limitations that pave the way for future research directions. First, the current iteration of the Focus Agent is limited to text-based interactions, differing significantly from the multi-modal nature of human moderation. Human moderators use non-verbal cues and physical context to tailor their approach, which text-only agents cannot replicate. This limitation is particularly challenging in settings involving tactile or visual elements. However, advancements in sophisticated LLMs like GPT-4, which understand multi-modal data (OpenAI, 2023), could evolve the Focus Agent into a more versatile, multi-modal platform that closely simulates human discussions. Second, our study centres on LLM application within focus groups, overlooking broader quantitative and qualitative research methodologies. Prior studies have used LLMs to generate reviews or comments (Liang et al., 2023; Chuang et al., 2023), noting that LLM-generated opinions may lack human creativity (Bender et al., 2021). Ensuring the validity of these insights requires extensive empirical validation. Lastly, our analysis highlights the difficulties LLMs face in multi-participant discussions. While there is some research on one-on-one dialogues and all-AI discussions (Abbasiantaeb et al., 2024), studies on mixed human-AI communication in group settings are scarce. The ability of LLMs to engage in multi-human conversations is crucial for advancing human-AI interaction. Future research should explore how human participants adjust their communication strategies in the presence of AI, aiming to optimise these interactions for better collaborative outcomes. ## 9. Conclusion Our research introduced the Focus Agent, a novel AI simulation system developed to simulate focus group discussions through the dialogue of AI agents. This system aims to gather insights akin to those derived from traditional focus groups, leveraging the capabilities of AI participants to generate discussions on designated topics. To assess the degree of alignment between the viewpoints expressed by AI and human participants, we ran a user study that employed an AI moderator to facilitate discussions among human participants. Our analysis uncovered that the Focus Agent includes opinions that similar to those of human participants. Additionally, we studied human participants’ perceptions of the AI moderator and found that while the AI could fulfil the functional role of a moderator, there remained some differences in the interaction experience compared to engagement with human moderators. We examined the underlying reasons and identified specific areas within the large language model’s capabilities that require further enhancement. ## References - (1) - Abbasiantaeb et al. (2024) Zahra Abbasiantaeb, Yifei Yuan, Evangelos Kanoulas, and Mohammad Aliannejadi. 2024. Let the llms talk: Simulating human-to-human conversational qa via zero-shot llm-to-llm interactions. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining. 8–17. - Bain et al. (2023) Max Bain, Jaesung Huh, Tengda Han, and Andrew Zisserman. 2023. WhisperX: Time-Accurate Speech Transcription of Long-Form Audio. INTERSPEECH 2023 (2023). - Bender et al. (2021) Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 610–623. - Biedermann (2018) Narelle Biedermann. 2018. The use of Facebook for virtual asynchronous focus groups in qualitative research. Contemporary nurse 54, 1 (2018), 26–34. - Brown et al. (2020) Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901. - Brüggen and Willems (2009) Elisabeth Brüggen and Pieter Willems. 2009. A critical comparison of offline focus groups, online focus groups and e-Delphi. International Journal of Market Research 51, 3 (2009), 1–15. - Carletta (2006) Jean Carletta. 2006. Announcing the AMI meeting corpus. The ELRA Newsletter 11, 1 (2006), 3–5. - Chen and Neo (2019) Julienne Chen and Pearlyn Neo. 2019. Texting the waters: An assessment of focus groups conducted via the WhatsApp smartphone messaging application. Methodological Innovations 12, 3 (2019), 2059799119884276. - Chuang et al. (2023) Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, and Timothy T Rogers. 2023. Simulating Opinion Dynamics with Networks of LLM-based Agents. arXiv preprint arXiv:2311.09618 (2023). - Ciaccio (2023) Edward J Ciaccio. 2023. Use of artificial intelligence in scientific paper writing. , 101253 pages. - Contributors (2023) OpenCompass Contributors. 2023. OpenCompass: A Universal Evaluation Platform for Foundation Models. https://github.com/InternLM/OpenCompass. - Daniels et al. (2019) Nicola Daniels, Patricia Gillen, Karen Casson, and Iseult Wilson. 2019. STEER: Factors to consider when designing online focus groups using audiovisual technology in health research. International Journal of Qualitative Methods 18 (2019), 1609406919885786. - Desplanques et al. (2020) Brecht Desplanques, Jenthe Thienpondt, and Kris Demuynck. 2020. Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv preprint arXiv:2005.07143 (2020). - Dong et al. (2023) Xin Luna Dong, Seungwhan Moon, Yifan Ethan Xu, Kshitiz Malik, and Zhou Yu. 2023. Towards next-generation intelligent assistants leveraging llm techniques. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5792–5793. - Gerling et al. (2020) Kathrin Gerling, Patrick Dickinson, Kieran Hicks, Liam Mason, Adalberto L Simeone, and Katta Spiel. 2020. Virtual reality games for people using wheelchairs. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–11. - Gordon et al. (2021) Allegra R Gordon, Jerel P Calzo, Rose Eiduson, Kendall Sharp, Scout Silverstein, Ethan Lopez, Katharine Thomson, and Sari L Reisner. 2021. Asynchronous online focus groups for health research: case study and lessons learned. International journal of qualitative methods 20 (2021), 1609406921990489. - Gratton and O’Donnell (2011) Marie-France Gratton and Susan O’Donnell. 2011. Communication technologies for focus groups with remote communities: a case study of research with First Nations in Canada. Qualitative Research 11, 2 (2011), 159–175. - Guest et al. (2017) Greg Guest, Emily Namey, and Kevin McKenna. 2017. How many focus groups are enough? Building an evidence base for nonprobability sample sizes. Field methods 29, 1 (2017), 3–22. - Haupt and Marks (2023) Claudia E Haupt and Mason Marks. 2023. AI-generated medical advice—GPT and beyond. Jama 329, 16 (2023), 1349–1350. - Horiguchi et al. (2020) Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, and Kenji Nagamatsu. 2020. End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors. arXiv preprint arXiv:2005.09921 (2020). - Horiguchi et al. (2021) Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, and Kenji Nagamatsu. 2021. End-to-end speaker diarization as post-processing. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7188–7192. - Jokinen and McTear (2022) Kristina Jokinen and Michael McTear. 2022. Spoken dialogue systems. Springer Nature. - Katar et al. (2023) Oğuzhan Katar, Dilek ÖZKAN, Özal YILDIRIM, U Rajendra Acharya, et al. 2023. Evaluation of GPT-3 AI language model in research paper writing. Turkish Journal of Science and Technology 18, 2 (2023), 311–318. - Katz et al. (2024) Daniel Martin Katz, Michael James Bommarito, Shang Gao, and Pablo Arredondo. 2024. Gpt-4 passes the bar exam. Philosophical Transactions of the Royal Society A 382, 2270 (2024), 20230254. - Keen et al. (2022) Sam Keen, Martha Lomeli-Rodriguez, and Helene Joffe. 2022. From challenge to opportunity: virtual qualitative research during COVID-19 and beyond. International Journal of Qualitative Methods 21 (2022), 16094069221105075. - Kitzinger (1994) Jenny Kitzinger. 1994. The methodology of focus groups: the importance of interaction between research participants. Sociology of health & illness 16, 1 (1994), 103–121. - Kitzinger (1995) Jenny Kitzinger. 1995. Qualitative research: introducing focus groups. Bmj 311, 7000 (1995), 299–302. - Koubaa (2023) Anis Koubaa. 2023. GPT-4 vs. GPT-3.5: A concise showdown. (2023). - Liang et al. (2023) Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Ding, Xinyu Yang, Kailas Vodrahalli, Siyu He, Daniel Smith, Yian Yin, et al. 2023. Can large language models provide useful feedback on research papers? A large-scale empirical analysis. arXiv preprint arXiv:2310.01783 (2023). - Mazza (2006) Riccardo Mazza. 2006. Evaluating information visualization applications with focus groups: the CourseVis experience. In Proceedings of the 2006 AVI workshop on BEyond time and errors: novel evaluation methods for information visualization. 1–6. - Medennikov et al. (2020) Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, et al. 2020. Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario. arXiv preprint arXiv:2005.07272 (2020). - Nagle and Williams (2013) Barry Nagle and Nichelle Williams. 2013. Methodology brief: Introduction to focus groups. Center for Assessment, Planning and Accountability 1-12 (2013). - Nay et al. (2024) John J Nay, David Karamardian, Sarah B Lawsky, Wenting Tao, Meghana Bhat, Raghav Jain, Aaron Travis Lee, Jonathan H Choi, and Jungo Kasai. 2024. Large language models as tax attorneys: a case study in legal capabilities emergence. Philosophical Transactions of the Royal Society A 382, 2270 (2024), 20230159. - Nicholas et al. (2010) David B Nicholas, Lucy Lach, Gillian King, Marjorie Scott, Katherine Boydell, Bonita J Sawatzky, Joe Reisman, Erika Schippel, and Nancy L Young. 2010. Contrasting internet and face-to-face focus groups for children with chronic health conditions: Outcomes and participant experiences. International Journal of Qualitative Methods 9, 1 (2010), 105–121. - OpenAI (2023) R OpenAI. 2023. GPT-4 technical report. arXiv (2023), 2303–08774. - Park et al. (2023) Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442 (2023). - Radford et al. (2022) Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:2212.04356 [eess.AS] - Reynolds and McDonell (2021) Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7. - Richard et al. (2021) Brendan Richard, Stephen A Sivo, Robert C Ford, Jamie Murphy, David N Boote, Eleanor Witta, and Marissa Orlowski. 2021. A guide to conducting online focus groups via Reddit. International journal of qualitative methods 20 (2021), 16094069211012217. - Rosenbaum et al. (2002) Stephanie Rosenbaum, Gilbert Cockton, Kara Coyne, Michael Muller, and Thyra Rauch. 2002. Focus groups in HCI: wealth of information or waste of resources?. In CHI’02 extended abstracts on human factors in computing systems. 702–703. - Rough and Cowan (2020) Daniel Rough and Benjamin Cowan. 2020. Don’t Believe The Hype! White Lies of Conversational User Interface Creation Tools. In Proceedings of the 2nd Conference on Conversational User Interfaces. 1–3. - Sallam (2023) Malik Sallam. 2023. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare, Vol. 11. MDPI, 887. - Selter et al. (2023) Felicitas Selter, Kirsten Persson, Peter Kunzmann, and Gerald Neitzke. 2023. End-of-life decisions: A focus group study with German health professionals from human and veterinary medicine. Frontiers in Veterinary Science 10 (2023), 1044561. - Shanahan et al. (2023) Murray Shanahan, Kyle McDonell, and Laria Reynolds. 2023. Role play with large language models. Nature 623, 7987 (2023), 493–498. - Stalmeijer et al. (2014) Renée E Stalmeijer, Nancy McNaughton, and Walther NKA Van Mook. 2014. Using focus groups in medical education research: AMEE Guide No. 91. Medical teacher 36, 11 (2014), 923–939. - Stewart and Shamdasani (2017) David W Stewart and Prem Shamdasani. 2017. Online focus groups. Journal of Advertising 46, 1 (2017), 48–60. - Stewart and Shamdasani (2014) David W Stewart and Prem N Shamdasani. 2014. Focus groups: Theory and practice. Vol. 20. Sage publications. - Talebirad and Nadiri (2023) Yashar Talebirad and Amirhossein Nadiri. 2023. Multi-agent collaboration: Harnessing the power of intelligent llm agents. arXiv preprint arXiv:2306.03314 (2023). - Taubenfeld et al. (2024) Amir Taubenfeld, Yaniv Dover, Roi Reichart, and Ariel Goldstein. 2024. Systematic biases in LLM simulations of debates. arXiv preprint arXiv:2402.04049 (2024). - Troshani et al. (2021) Indrit Troshani, Sally Rao Hill, Claire Sherman, and Damien Arthur. 2021. Do we trust in AI? Role of anthropomorphism and intelligence. Journal of Computer Information Systems 61, 5 (2021), 481–491. - Turney and Pocknee (2005) Lyn Turney and Catherine Pocknee. 2005. Virtual focus groups: New frontiers in research. International Journal of Qualitative Methods 4, 2 (2005), 32–43. - Veloso (2020) Braian Veloso. 2020. WHATSAPP COMO FERRAMENTA PARA A ORGANIZAÇÃO DE GRUPOS FOCAIS ONLINE NA PESQUISA DA EDUCAÇÃO: UM RELATO DE EXPERIÊNCIA. In Anais do CIET: EnPED: 2020-(Congresso Internacional de Educação e Tecnologias— Encontro de Pesquisadores em Educação a Distância). - Wang et al. (2023a) Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. 2023a. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432 (2023). - Wang et al. (2023b) Yufei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, and Qun Liu. 2023b. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966 (2023). - Wenzek et al. (2019) Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, and Edouard Grave. 2019. CCNet: Extracting high quality monolingual datasets from web crawl data. arXiv preprint arXiv:1911.00359 (2019). - Wilkerson et al. (2014) J Michael Wilkerson, Alex Iantaffi, Jeremy A Grey, Walter O Bockting, and BR Simon Rosser. 2014. Recommendations for internet-based qualitative health research with hard-to-reach populations. Qualitative health research 24, 4 (2014), 561–574. - Wirtz et al. (2019) Andrea L Wirtz, Erin E Cooney, Aeysha Chaudhry, Sari L Reisner, and American Cohort To Study HIV Acquisition Among Transgender Women. 2019. Computer-mediated communication to facilitate synchronous online focus group discussions: feasibility study for qualitative HIV research among transgender women across the United States. Journal of medical Internet research 21, 3 (2019), e12569. - Xiong et al. (2016) Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. 2016. Achieving human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256 (2016). - Xu et al. (2023) Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, and Yang Liu. 2023. Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658 (2023). - Xu et al. (2024) Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. 2024. Hallucination is inevitable: An innate limitation of large language models. arXiv preprint arXiv:2401.11817 (2024). - Yan et al. (2024) Hanqi Yan, Qinglin Zhu, Xinyu Wang, Lin Gui, and Yulan He. 2024. Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning. arXiv preprint arXiv:2402.14963 (2024). - Yao et al. (2023) Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023). - Zhu et al. (2023) Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Zhicheng Dou, and Ji-Rong Wen. 2023. Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107 (2023).

Rendering Paper...