DIGITAL DOPPELGANGER

Studies in Adversarial AI
Prof. Dr. Emily Cross • Dr. Manuel Hendry • Dr. Andrea Orlandi
ETH Zurich • Zurich University of the Arts
Funded by the BIAL Foundation • May 21, 2025

1. Summary

Our daily lives are becoming increasingly infused with, and transformed by, artificial intelligence (AI). While experts predict few aspects of human life in industrialised societies will remain untouched by AI in the coming years, one area of AI development that is causing great excitement, while also raising serious concerns, is AI-imbued social agents (1). Some of these agents are already quite familiar – think of asking Alexa to check the weather or Siri to set a timer while cooking.

With the advent of large language models like ChatGPT, such systems have increased both in power and in scope. To maintain their servant-like character, the underlying neural networks are aligned to be "Helpful, Honest, and Harmless" (2). However, this alignment can lead to negative consequences like the inclination to produce responses that correspond to a users' misconstrued beliefs (3). Furthermore, the resulting interactions are often perceived as dull and lifeless, thus lowering the propensity for engagement (4). Domains such as education, coaching, psychodrama, art and even investment consulting have been demonstrated to profit from a style of interaction enriched with adversarial engagement (4). This includes, but is not limited to, providing virtual agents with capabilities to express emotion via both the voice and the face.

We aim to unravel the behavioral, physiological, and neural responses to such emotionally charged human—AI encounters to more fully characterise the scope and limits of imbuing AI agents with human-like traits. Our goal is to advance development of more relatable AI characters for responsible use to the benefit of their users. Our methodology combines behavioral measures, peripheral physiological recordings (heart rate and eye tracking), and brain scanning (functional magnetic resonance imaging; fMRI) while participants interact with (or observe) an embodied AI performance art device named "Stanley". This apparatus, developed by one of the current proposal's investigators, consists of a physical face animated with 3D-projection. It can track its interlocutor in the room and analyse their affective state to adjust its own replies in terms of voice and facial expressions (5).

Across a series of three complementary studies, first we assess participants' reactions during an artistic performance featuring Stanley, where small groups interact with our AI responding according to a predetermined emotional script (Study 1). Eye behaviors (fixations, gaze direction and pupil dilation) and heart rate of participants are recorded and analyzed individually (e.g., heart rate modulation) and at the group level (e.g., heart rate synchrony) to understand how the audience perceives salient moments of the interaction. These moments include elements of surprise, such as unexpected emotional replies from Stanley or the appearance of a participant's deep fake. We expect increased synchronization between physiological signals of participants attending simultaneously, indicating heightened arousal and attention driven by emotionally charged moments. Questionnaires will help to further quantify engagement with Stanley by assessing participants' perceptions of its socialness, likability, and intelligence.

In Study 2, we increase experimental control by reducing Stanley's degrees of freedom using a 2x2 factorial design in a pseudo-laboratory setup (this study will be run in a blackbox studio theatre). Here we use an automatic imitation task (7) to obtain indirect measures of social processing and perception of Stanley, alongside facial and vocal analysis to gauge emotional reactions. By presenting emotional and unemotional versions of Stanley, we can assess the role of emotional tone in shaping human—AI interactions. Participants engage with Stanley in two forms: embodied (projection onto a 3D mask) and as a projection on a large screen to compare 2D vs. 3D interactions.

Study 3 uses fMRI to explore brain networks involved in person recognition (occipitotemporal areas) and emotion processing (frontotemporal and limbic areas) during Human and AI encounters. We assess whether similar neurocognitive correlates underlie human and AI face perception and seek to disentangle the importance of visual and auditory channels in emotion perception. Participants watch videos of Stanley and a human actor making supportive or combative statements, with emotional reactions conveyed through facial and/or vocal expressions.

Overall, the project aims to deliver evidence of the (neuro)physiological impacts and behavioral consequences of AI's human-like emotional displays. As emotionally sophisticated AI agents increasingly permeate our lives, this project is situated to innovatively explore the mechanisms and consequences of our social engagement with these agents.

2. Technical Description

2.1 Literature Review

The increasing integration of artificial agents, including robots and virtual agents, into our social world has led to growing interest in their ability to display human-like emotional expressions and the consequences of such emotional displays for human interactants (6). Research in this area falls under the domain of affective computing, a field that combines insights from psychology, computing, cognitive science, and engineering to create systems that recognize, interpret, and simulate human emotions. The goal of this field is to enhance machines with emotional intelligence, although achieving this remains a distant challenge.

Faces play a crucial role in emotional communication (7), prompting the incorporation of dynamic facial expressions in artificial agents (6). Early studies demonstrated that humans can recognize basic emotions, such as happiness, sadness, and anger, displayed by robots (6), while emotions like fear and disgust remain harder to identify (8). Research has further shown that even simple emotional displays by artificial agents can influence human behavior. For example, in online gaming, virtual agents' expressions of joy after cooperation or guilt after selfish actions can encourage cooperative behavior in humans (9).

Exploring how humans perceive emotion in artificial agents is further informed by examining brain responses and comparing neural signatures associated with perceiving emotions expressed by humans versus artificial agents. Neuroimaging studies are characterizing the extent to which brain regions involved in emotional processing for human expressions, such as the amygdalae, fusiform gyri, superior temporal gyri, and medial prefrontal cortices, are also engaged when encountering emotional expressions displayed by artificial agents (10, 11). Results are mixed. One study showed decreased amygdala activity when viewing robotic compared to real human facial expressions (10), while another showed no significant differences in amygdala engagement when observing emotional displays by virtual agents compared to humans (12). The fusiform face area has been shown to respond more robustly to emotional human faces compared to virtual faces (12), while the opposite pattern has been reported for human versus robotic faces (10, 11).

The physical presence of robotic agents also plays a significant role in emotion perception (6). Participants express greater empathy towards physically embodied robots compared to disembodied ones (13), and a rich literature documents the superiority of embodied compared to screen-based artificial agents across a range of communication and collaboration tasks (14). While brain-based insights into the benefits of embodiment are more difficult to orchestrate with fMRI (at least from a first-person, in situ perspective), evidence from our own team suggests that experience with embodied artificial agents in the real world leads to engagement of brain regions associated with social perception when these same agents are encountered during fMRI (e.g., 15).

The variability in recognition accuracy and neural responsiveness when engaging with emotional displays by artificial agents likely depends on the specifics of the emotion (valence, intensity, etc.), and further research is needed to understand the functional implications of these neurocognitive patterns. Moreover, the imaging studies reviewed here reveal how complex and dynamic the human brain's response is to artificial displays of emotion. These findings further underscore the challenges in generalization, as results vary depending on the features of the agent (1, 15). Additionally, the public's familiarity with, and expectations of, artificial agents are rapidly changing and becoming more sophisticated (along with the agents themselves). This means research findings from 15 years ago are valuable as a point of departure, but perceptions and expectations about these agents will continue to evolve (1). Together, these factors highlight the need for new research in this domain to provide fresh insights into the neural underpinnings of encounters with social AI agents.

Turning our attention to insights yielded from computer science research, this domain has long sought to optimise development of believable and engaging humanoid agents. To this end, a considerable literature explores autonomous facial animation, robotic movement, and theater-based behavior (16). Machines have been used for scripted and improvisational theater, and many expressive humanoid robots have been developed. It has been suggested that creating believable virtual agents should draw from artistic fields such as literature, theater, film, radio drama, illustration, and animation (17). In these fields, believability has long been studied to enable the audience's suspension of disbelief. Accordingly, the interactive avatar "Stanley" (6), used in the current proposal, was developed using knowledge and techniques from realist acting, providing a framework of beliefs and desires as an inner monologue according to Konstantin Stanislavski's acting system. Stanley's application programming interface (API) allows granular control over its behavior in facial expression and spoken word, making it an ideal platform for neurophysiological research under controlled conditions.

The current pace of technological innovation often obscures the sociological and philosophical questions about our interactions with virtual humans. Early work on chatbots helps frame these questions. Early observations by Turkle (18) on anthropomorphization mechanisms remain valid today, guiding better implementation and countering overly ambitious claims about sentience and singularity (19). The current proposal suggests that in merging the particular strengths of neuroscientific and artistic methods of investigation, a more holistic vision of creating and augmenting human—computer interactions can emerge.

2.2 Research Plan and Methods

The project examines psychophysiological aspects of Human—AI interactions, especially contrarian exchanges, using an embodied AI agent ("Stanley"). We integrate cognitive neuroscience, AI, and art to explore how emotionally charged AI interactions affect brain, behavior, and physiology. Studies progress from a live theater context to a controlled blackbox theater, to a neuroimaging lab, allowing us to transition from complex, ecologically valid scenarios to more controlled settings, while deepening insights from behavior to peripheral physiology and advanced neuroscientific methods.

Study 1: Friendly Fire at the Shrink

Research Questions: Here we examine audience engagement with the antagonistic AI agent Stanley during an artistic performance held as part of a public performance in Zurich in March 2025. We investigate the physical traces of salient interaction moments, including elements of surprise (e.g., a highly charged emotional reply from Stanley) and the appearance of a deep fake of the person interacting with Stanley, at individual and collective levels.

Key questions include:

  • Does heart rate synchronize between the hotseat participant and observers (RQ1.1)?
  • How does anticipation of being next in the hotseat influence physiological arousal and synchrony (RQ1.2) or affect gaze behaviors (RQ1.3)?

Participants: 60 participants (15 in the "hotseat" who speak with Stanley and 45 observers; 3 observers per hotseat participant) to create 45 dyads, aligned with existing literature on heart rate synchrony (20). Sample size is further based on space, time and equipment constraints.

Procedure: Participants attend a 15-minute performance in sub-groups of four, monitored for heart rate via ECG and eye movements via eye-tracking goggles. One participant takes the "hotseat," interacting with Stanley, who plays the role of a therapist asking increasingly combative questions. At the performance's end, Stanley accuses the hotseat participant of saying something outrageous (e.g., wanting to kill their boss), which the participant denies. However, a deepfake video on supporting screens shows the participant appearing to say the exact preposterous statement Stanley suggested (see supplementary document for details).

After the performance, participants complete questionnaires assessing engagement, perceptions of Stanley's socialness (e.g., Carpinella's Robotic Social Attributes Scale), likability, animacy, anthropomorphism, and intelligence (e.g., Bartneck's Godspeed questionnaire).

Statistical Analyses: Analysis of physiological measures (heart rate (HR), heart rate variability (HRV), heart rate synchrony, and eye movements) is performed with toolboxes and custom Matlab scripts. All statistical analyses follow standard functions and custom scripts in the RStudio environment and include ANOVA for questionnaire responses and individual physiological measures (eye-gaze patterns and fixations, variations in heart rate), and cross-correlation analysis for physiological synchrony measures.

Hypothesized Outcomes: The more salient/emotionally arousing moments of the interactions are expected to be associated with increased heart rate synchrony and similar gaze behaviors (e.g., increased pupil dilation, longer fixations) between the participant in the hotseat and each observer (20-22). We expect the overall heart rate, as well as heart rate synchrony and pupil dilation, to peak upon appearance of the deepfake of the person in the hotseat. Higher heart rate and gaze pattern synchrony values are expected to be linked to similar evaluations of perceived engagement and socialness measured via questionnaires.

Study 2: Mechanisms and Consequences of Emotional Engagement with Emotional AI

Following on from the live performance situation, we next seek to parse contributions of distinct aspects of Stanley's presentation to participants' emotional engagement and social perceptions of Stanley in a more controlled environment.

Research Questions: This study explores how embodiment (3D-printed mask vs. screen projection) and emotional tone (emotional vs. neutral) influence audience perceptions of Stanley, as measured by peripheral physiological indices (heartrate, gaze direction, pupil dilation), performance on a behavioral automatic imitation task, and questionnaires responses.

Key questions include:

  • To what extent does physical presence (embodiment) of Stanley rendered on a 3D-printed mask, versus the appearance of a 2D rendering of Stanley on a screen make the audience perceive it as more human-like, likable, social, and engaging (RQ2.1)?
  • To what extent does a strong contrarian emotional tone (vs a neutral tone) shape people's perceptions the AI agent (RQ2.2)?

Participants: Based on a power analysis from an eye-tracking study (23) on human—robot interaction with a 2x2 factorial design (N=25, power=0.7, effect size d=0.5), 44 participants are needed to achieve 0.9 power. We will test 50 participants to account for data loss, and budget for an additional 20 participants for piloting/measure refinement.

Procedure: Using a 2x2 within-subjects factorial design (embodiment: 3D vs. 2D × emotion: combative vs. neutral), participants interact with four versions of Stanley, varying in presence and emotional tone. In each run, Stanley appears as a physical mask (3D) or a 2D video projection and engages in either emotionally charged or neutral interactions. Stanley's appearance (e.g., eye and skin color, facial hair, voice) and name will vary across runs to help participants perceive each version as distinct; these features are counterbalanced between conditions. As in Study 1, heart rate and eye movements are recorded. After each interaction, participants complete questionnaires and engage in a computer-based automatic imitation task to measure social perception (24). Video and audio recordings of each interaction are analyzed for facial and vocal expressions to assess arousal, motivation, and emotional responses (25).

Statistical Analyses: As in Study 1, analysis focuses on physiological and questionnaire data, with Matlab-based tools used for video and acoustic analysis.

Hypothesized Outcomes: More salient and emotionally arousing interactions are expected to associate with increased HR, pupil dilation, and longer fixations on Stanley's face, as well as higher scores on several questionnaire subscales. In addition, pro-social, positive interactions should result in increased automatic imitation effects (e.g., longer response latency; (24)). We hypothesize that such effects will be more pronounced in interactions with the embodied (vs projection) version of Stanley (RQ2.1), and in response to a stronger (vs neutral) emotional tone (RQ2.2).

Study 3: Neurocognitive Correlates of Emotional Engagement with Contrarian AI Research

Questions: The final study investigates neural responses to Stanley's emotional interactions and compares these with human equivalents. It further explores the impact of sensory channels (visual vs. auditory) and agent type (AI vs. human) on emotion perception.

Key questions include:

  • To what extent does physical presence (embodiment) of Stanley rendered on a 3D-printed mask, versus the appearance of a 2D rendering of Stanley on a screen make the audience perceive it as more human-like, likable, social, and engaging (RQ2.1)?
  • To what extent does a strong contrarian emotional tone (vs a neutral tone) shape people's perceptions the AI agent (RQ2.2)?

Participants: 35 participants, each tested for 1.5h (2h participation budgeted for set up). Sample size is determined by time and resource constraints.

Procedure: Participants view videos of a human actor or Stanley reciting poetry, as each agent speaks directly to the observer with varying emotional displays across sensory channels and agent types (2x2x3 within subjects factorial design; factors (1) agent (Stanley vs. human), (2) emotion (combative vs. neutral), channel (face vs. voice vs. face and voice). Post-scan questionnaires assess perceived engagement and social attributes of Stanley and the human actor.

Statistical Analyses: Following on from our previous work, fMRI data will be analyzed using SPM 12, with GLM, ROI, and RSA approaches examining emotional parameters derived from video analysis (15).

Hypothesized Outcomes: Greater activation of occipitotemporal regions is anticipated for visual (facial) emotional information compared to auditory cues (26). Observations of the human actor, especially in the combative condition, are expected to engage brain areas associated with emotional processing more than combative or neutral versions of Stanley (27).

Psychophysiological Measures

For Studies 1 and 2, cardiovascular measures are collected with wearable ECG devices (Polar Verity Sense). These measures include HR and HRV, which provide insights into autonomic nervous system activity and are influenced by arousal and emotion perception (28). HRV during resting-state has been linked to emotion regulation and cognitive functions (29). Thus, a 5-minute resting-state HR recording will be obtained for each participant before each study, with HRV used as a regressor in statistical analyses. HR synchrony between audience members will also be computed, reflecting interpersonal physiological synchrony and attention to relevant events (20, 21). Studies 1 and 2 use Tobii Pro Wireless Eyetracking glasses to monitor eye-gaze patterns, fixations, and pupil dilation, providing insights into emotional and cognitive processes (22) and addressing recent calls for more research into eye behaviors and social gaze during human—robot interaction (e.g., 30).

Study 3 uses the Philips Ingenia 3T fMRI scanner to assess neural correlates of emotional perception during human—AI interactions.

Neurophysiological measures

  • Cardiovascular measures (HR / HRV)
  • Eye movements
  • Functional magnetic resonance imaging (fMRI) Pupil diameter

Contingency Plan

Potential challenges include variability in physiological data due to individual differences. Mitigating strategies involve increasing sample sizes if resources allow and refining analysis techniques to enhance signal clarity. If technical issues arise with wearable or imaging devices, alternative or additional trials may be conducted to ensure data robustness.

The project follows Open Science principles, preregistering studies, sharing materials and data, and providing open access to all manuscripts.

3. References

1. E. S. Cross, R. Ramsey, Mind Meets Machine: Towards a Cognitive Science of Human-Machine Interactions. Trends Cogn Sci 25, 200-212 (2021).
2. Y. Bai et al., Training a Helpful and Harmless Assistant with ReinforcementLearning from Human Feedback. (2022).
3. L. Ranaldi, G. Pucci, When Large Language Models contradict humans? LargeLanguage Models' Sycophantic Behaviour. 10.48550/arXiv.2311.09410 (2024).
4. A. Cai, I. Arawjo, E. L. Glassman, Antagonistic AI. 10.48550/arXiv.2402.07350 (2024).
5. M. F. Hendry et al. (2023) Are you talking to me? a case study in emotionalhuman-machine interaction. in Proceedings of the Nineteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AAAI Press, Salt Lake City), p Article 43.
6. R. Hortensius, E. S. Cross, From automata to animate beings: the scope and limits ofattributing socialness to artificial agents. Ann N Y Acad Sci 10.1111/nyas.13727 (2018).
7. C. Darwin, The expression of emotion in man and animals (Penguin Books Limited,London, UK, 1872/2009).
8. D. Bazo, R. Vaidyanathan, A. Lentz, C. Melhuish (2010) Design and testing of ahybrid expressive face for a humanoid robot. in IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS) (Taipei, Taiwan), pp 5317-5322.
9. C. M. de Melo, J. Gratch, P. J. Carnevale, Humans versus Computers: Impact ofEmotion Expressions on People's Decision Making. IEEE Trans on Affective Computing 6, 127-136 (2014).
10. M. I. Gobbini et al., Distinct neural systems involved in agency and animacydetection. J Cogn Neurosci 23, 1911-1920 (2011).
11. T. Chaminade et al., Brain response to a humanoid robot in areas implicated in theperception of human emotional gestures. PloS one 5, e11577 (2010).
12. E. Moser et al., Amygdala activation at 3T in response to human and avatar facialexpressions of emotions. J Neurosci Methods 161, 126-133 (2007).
13. S. S. Kwak, Y. Kim, E. Kim, C. Shin, K. Cho (2013) What makes people empathizewith an emotional robot?: The impact of agency and physical embodiment on human empathy for a robot. in 2013 IEEE RO-MAN, pp 180-185.
14. J. Li, The benefit of being physically present: A survey of experimental workscomparing copresent robots, telepresent robots and virtual agents. Int J Hum-Comput St 77, 23-37 (2015).
15. L. E. Jastrzab, B. Chaudhury, S. A. Ashley, K. Koldewyn, E. S. Cross, Beyondhuman-likeness: Socialness is more influential when attributing mental states to robots. iScience 27, 110070 (2024).
16. M. Sagar, M. Seymour, A. Henderson, Creating connection with autonomous facialanimation. Commun. ACM 59, 82–91 (2016).
17. J. Bates, The role of emotion in believable agents. Commun Acm 37, 122-125 (1994).
18. S. Turkle, Computer as Rorschach. Science, Technology, & Human Values 5, 74. (1980).
19. T. Walsh, The Singularity May Never Be Near. (2016).
20. I. V. Stuldreher, N. Thammasan, J. B. F. van Erp, A. M. Brouwer, Physiologicalsynchrony in EEG, electrodermal activity and heart rate reflects shared selective auditory attention. J Neural Eng 17, 046028 (2020).
21. A. Marzoratti, T. M. Evans, Measurement of interpersonal physiological synchronyin dyads: A review of timing parameters used in the literature. Cogn Affect Behav Neurosci 22, 1215-1230 (2022).
22. V. Skaramagkas et al., Review of Eye Tracking Metrics Involved in Emotional andCognitive Processes. IEEE Rev Biomed Eng 16, 260-277 (2023).
23. C. Willemse, A. Wykowska, In natural interaction with embodied robots, we preferit when they follow our gaze: a gaze-contingent mobile eyetracking study. Philos Trans R Soc Lond B Biol Sci 374, 20180036 (2019).
24. J. Leighton, G. Bird, C. Orsini, C. Heyes, Social attitudes modulate automaticimitation. Journal of Experimental Social Psychology 46, 905-910 (2010).
25. M. Egger, L. Matthias, H. Sten, Emotion Recognition from Physiological SignalAnalysis: A Review. Electronic Notes in Theoretical Computer Science 343, 33-55 (2019).
26. H. Zhang et al., Facial Expression Enhances Emotion Perception Compared toVocal Prosody: Behavioral and fMRI Studies. Neurosci Bull 34, 801-815 (2018).
27. L. C. Kegel et al., Dynamic human and avatar facial expressions elicit differentialbrain responses. Soc Cogn Affect Neurosci 15, 303-317 (2020).
28. R. D. Lane et al., Neural correlates of heart rate variability during emotion.Neuroimage 44, 213-222 (2009).
29. G. Forte, F. Favieri, M. Casagrande, Heart Rate Variability and Cognitive Function: A Systematic Review. Front Neurosci 13, 710 (2019).
30. H. Admoni, B. Scassellati, Social Eye Gaze in Human-Robot Interaction: AReview. J Hum-Robot Interact 6, 25-63 (2017).

Previous Own Publications

31. Hendry, Manuel, Norbert Kottmann, Martin Fröhlich, Florian Bruggisser, Marco Quandt, Stella Speziali, Valentin Huber, and Chris Salter. "Are You Talking to Me? A Case Study in Emotional Human-Machine Interaction." In Proceedings of the 19th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, edited by Stephen Ware and Markus Eger. Palo Alto, California: AAAI Press, 2023. DOI: 10.1609/aiide.v19i1.27538
32. Jastrzab LE, Chaudhury B, Ashley SA, Koldewyn K, Cross ES. Beyondhuman-likeness: Socialness is more influential when attributing mental states to robots. iScience. 2024 May 22;27(6):110070. doi: 10.1016/j.isci.2024.110070.
33. Hsieh TY, Cross ES. People's dispositional cooperative tendencies towards robotsare unaffected by robots' negative emotional displays in prisoner's dilemma games. Cogn Emot. 2022 Aug;36(5):995-1019. doi: 10.1080/02699931.2022.2054781.
34. de Jong D, Hortensius R, Hsieh TY, Cross ES. Empathy and Schadenfreude inHuman-Robot Teams. J Cogn. 2021 Aug 5;4(1):35. doi: 10.5334/joc.177.
35. Cross ES, Ramsey R. Mind Meets Machine: Towards a Cognitive Science ofHuman-Machine Interactions. Trends Cogn Sci. 2021 Mar;25(3):200-212. doi: 10.1016/j.tics.2020.11.009.