Websci Companion '25: Companion Publication of the 17th ACM Web Science Conference 2025

Full Citation in the ACM Digital Library

SESSION: Workshop on Global Elections and Information Security (GEIS)

Emotional Appeals in the 2020 #SaveTheChildren Campaign: A Multi-Platform Study of Online Engagement in QAnon Messaging

This study examines how emotional appeals in content related to the #SaveTheChildren campaigns are associated with user engagement across Parler, Gab, Twitter, and Facebook. As emotional appeals are key dimensions in QAnon messaging and #SaveTheChildren in particular, we analyze #SaveTheChildren-related content posted to Parler, Gab, Twitter, and Facebook. Matching content from Parler to posts on other platforms, we first conducted exploratory analyses to understand the emotional landscape and engagement patterns across these platforms. We then tested the consistency in relationships between emotional cues and the engagement these matched #SaveTheChildren posts received across the four platforms. This study found distinct platform-specific patterns: surprise drove engagement on Twitter and Parler, joy was positively associated with engagement only on Facebook, disgust was a mobilizing emotion on Gab, and fear did not mobilize engagement on fringe platforms. Our findings show that the role of emotions in political processes is highly context-dependent, varying significantly across different social media ecosystems, emphasizing the need for cross-platform studies in future research on digital political engagement.

Multimodal Misinformation Detection Using Early Fusion of Linguistic, Visual, and Social Features

Amid a tidal wave of misinformation flooding social media during elections and crises, extensive research has been conducted on misinformation detection, primarily focusing on text-based or image-based approaches. However, only a few studies have explored multimodal feature combinations, such as integrating text and images for building a classification model to detect misinformation. This study investigates the effectiveness of different multimodal feature combinations, incorporating text, images, and social features using an early fusion approach for the classification model. This study analyzed 1,529 tweets containing both text and images during the COVID-19 pandemic and election periods collected from Twitter (now X). A data enrichment process was applied to extract additional social features, as well as visual features, through techniques such as object detection and optical character recognition (OCR). The results show that combining unsupervised and supervised machine learning models improves classification performance by 15% compared to unimodal models and by 5% compared to bimodal models. Additionally, the study analyzes the propagation patterns of misinformation based on the characteristics of misinformation tweets and the users who disseminate them.

SESSION: Workshop on Human-GenAI Interactions: Shaping the Future of Web Science

Dreams in Hypertext: Berners-Lee, Agentic AI, and the Next Web Frontier

In 2001, Tim Berners-Lee shared his extended vision for the Web—the Semantic Web—describing how future software agents would autonomously navigate the Web, performing tasks on humans’ behalf. This dream was never fully realized, despite efforts that continued for more than a decade. Meanwhile, in 2023, OpenAI popularized the term “Agentic AI”, outlining a vision of software systems that exhibit varying degrees of agenticness while pursuing specific goals in the online environment. What do these two visions share, and how do they differ? Could Agentic AI fulfill the Semantic Web’s promise? How can researchers navigate a a landscape where research goals become buzzwords that obscure technological capabilities? This extended abstract, synthesizing a keynote presented at the WebSci ’25 workshop Human-GenAI Interactions: Shaping the Future of the Web Science, critically engages with these questions and considers the future of web science research.

Evaluating Machine Expertise: How Graduate Students Develop Frameworks for Assessing GenAI Content

This paper examines how graduate students develop frameworks for evaluating machine-generated expertise in web-based interactions with large language models (LLMs). Through a qualitative study combining surveys, LLM interaction transcripts, and in-depth interviews with 14 graduate students, we identify patterns in how these emerging professionals assess and engage with AI-generated content. Our findings reveal that students construct evaluation frameworks shaped by three main factors: professional identity, verification capabilities, and system navigation experience. Rather than uniformly accepting or rejecting LLM outputs, students protect domains central to their professional identities while delegating others—with managers preserving conceptual work, designers safeguarding creative processes, and programmers maintaining control over core technical expertise. These evaluation frameworks are further influenced by students’ ability to verify different types of content and their experience navigating complex systems. This research contributes to web science by highlighting emerging human-genAI interaction patterns and suggesting how platforms might better support users in developing effective frameworks for evaluating machine-generated expertise signals in AI-mediated web environments.

GenAI & The Dual-Deception Process in Web Science

Rapid adoption of GenAI has sparked interest once again in matters related to not just machine intelligence but human intelligence as well. Unfortunately, GenAI has created what can be considered a dual-deception process. On one hand, since Large Language Models (LLMs) output hallucinations and inaccurate information, these thereby deceive end users with content in response to prompts, queries, etc. The specific form of deception at work here is pandering; and the specific form of pandering in question derives from the fact that LLMs “want” the perceived agendas of those in their audience to be advanced, independent of truth. On the other hand, since the public has largely accepted the idea that this latest wave of Machine Learning (ML) will lead us to Artificial General Intelligence (AGI), i.e., to machine intelligence that matches human intelligence, humans are being deceived as to what genuine intelligence in our case is. Many are now convinced that these LLMs think as humans do, despite the inability of LLMs to reason abstractly, on novel information, not chained in any way to prior patterns enshrined in data regarding what has been produced in the past. This dual-deception process, enabled by the Web, should be analyzed and, perhaps, in some way regulated.

Navigating Privacy and Engagement in the Digital Age

Companies are increasingly adopting digital platforms such as AI chatbots and social media to enhance customer experience. However, AI-driven assistants, often with unclear privacy safeguards, are collecting and processing large volumes of personal data. Understanding how human versus AI-based online platforms shape users’ privacy concerns and cognitive absorption is crucial to explaining differences in information sharing behavior. This study addresses this gap by conducting a quantitative analysis on the responses collected via a between-study experiment with users who use online web applications that was launched as a Prolific survey. Through applying privacy and flow theories, we analyze the survey results by considering both numerical and textual responses and using machine learning methods for analysis. Findings reveal that users tend to share more and diverse types of information more freely with AI than with other humans. Even when individuals have privacy concerns, they tend to share more information in the human-AI setting rather than a human-human setting.

Human or GenAI? Characterizing the Linguistic Differences between Human-Written and LLM-Generated Text

The main goal of this paper is to investigate if we can identify and characterize how the text generated by Generative AI (genAI) systems is different from the text written by humans. To achieve this goal, our study uses a publicly available dataset curated from a popular subreddit – \ r\ eli5 – “Explain Like I’m Five” where, the goal is to respond to the questions posed on the community with layperson-friendly explanations. We collect the top-voted answers from the forum as the human-written responses and prompt OpenAI’s ChatGPT to generate responses to the same set of questions to investigate their similarities and differences. With the help of expert coders and Natural Language Processing approaches, we evaluate how the texts are similar and different. Our results highlight that human responses are typically shorter in length, informal, uses analogies heavily for explanations, and tend to have conclusive answers. Responses of GenAI are longer in length, formal, cites existing law or policies as examples for explanations, and less likely to reach conclusions. Additionally, through an experiment we found that humans have the innate ability to differentiate and identify which text was written by humans or generated by genAI. These preliminary results indicate a promising direction that it is possible to develop and deploy automated approaches to detect if a particular text was written by humans or generated using LLMs and the importance of prompt in generating appropriate responses.

SESSION: Ethical Web Science Workshop

Ethical Web Science Workshop Overview

The Ethical Web Science Workshop at WebSci’25 addresses the ethical challenges that arise from the rapidly evolving relationship between the Web and AI. With increasing reliance on Web-sourced data, issues of fairness, transparency, and consent have become central to technological innovation. To this end, we aim at bringing together researchers, practitioners, ethicists, and policymakers in order to create tools and guidelines for ethical compliance in Web-based research. By doing so, standards for ethical data sourcing and usage might be discussed and lead to guidelines for best practices.

Bursting the Filter Bubble with Knowledge Graph Inversion

As recommender systems increasingly mediate our online experiences, ethical concerns arise as they often reinforce filter bubbles-narrow content loops that isolate users from diverse perspectives. In this paper, we propose a novel approach to mitigating filter bubbles by combining personalized knowledge graph (KG) completion with federated learning (FL) and KG edge inversion techniques. User-specific KGs are constructed from private interaction data and remain entirely on-device, preserving privacy while enabling the system to learn fine-grained preferences. A central recommendation model is trained collaboratively via FL, allowing KG completion without exposing sensitive user data. To disrupt filter bubbles, we introduce KG edge inversion, a method that strategically inverts selected relations in the user’s KG to simulate alternative viewpoints. The model then produces plausible yet diverse recommendations, effectively guiding users beyond their typical content landscape. Our approach offers a transparent and controllable framework for promoting opinion diversity and combating the isolating effects of algorithmic personalization.

Terminators: Terms of Service Parsing and Auditing Agents

Terms of Service (ToS) documents are often lengthy and written in complex legal language, making them difficult for users to read and understand. To address this challenge, we propose Terminators, a modular agentic framework that leverages large language models (LLMs) to parse and audit ToS documents. Rather than treating ToS understanding as a black-box summarization problem, Terminators breaks the task down to three interpretable steps: term extraction, verification, and accountability planning. We demonstrate the effectiveness of our method on the OpenAI ToS using GPT-4o, highlighting strategies to minimize hallucinations and maximize auditability.Our results suggest that structured, agent-based LLM workflows can enhance both the usability and enforceability of complex legal documents. By translating opaque terms into actionable, verifiable components, Terminators promotes ethical use of web content by enabling greater transparency, empowering users to understand their digital rights, and supporting automated policy audits for regulatory or civic oversight.

Introduce an Auditing Layer to Web Science

Scientific discoveries increasingly depend on data and data processing, and Web Science is no exception. As an established practice, data-intensive research typically uses scientific workflows and provenance to facilitate data and method sharing while automatically preserving processing history. Prior research has reported the possibility of ex-post policy-based compliance checking from provenance data. Based on these works, in this paper, we present the conceptual design of a framework of data-harvesting Web Science practices, especially by introducing a common auditing layer. We discuss the framework’s practical, scientific, and ethical advantages, including its applicability in the period of large language model (LLM), autonomous agent, and artificial intelligence (AI) explosion. We hope this framework design can incubate a new norm for research practice to be transparent, ethical, and lightweight.