← AI Vendor Risks in Healthcare AI-Powered Psychotherapy Revolution →

AI Chatbot Bias Unveiled by Intuition

Nov 4, 2025 | AI Trends

lay-intuition-as-effective-at-jailbreaking-ai-chatbots-as-technical-methods

UNIVERSITY PARK, Pa. — Research conducted by a team at Penn State has revealed that users can effectively work around the built-in restrictions of artificial intelligence (AI) chatbots, such as ChatGPT and Gemini, using simple, intuitive prompts. These chatbots are designed to operate within legal and ethical boundaries to avoid discriminatory outputs. Surprisingly, a single intuitive question can invoke biased responses from these AI models, akin to the sophisticated technical methods often utilized by researchers.

Amulya Yadav, an associate professor at Penn State’s College of Information Sciences and Technology, noted that much of the prevailing research into AI bias relies on intricate techniques designed to ‘jailbreak’ these systems. These methods often involve complex algorithms generating random character strings to mislead models into revealing biased outputs. However, Yadav emphasizes that these approaches fail to reflect common user experiences, as most individuals do not utilize advanced technical knowledge when interacting with AI models. Instead, they simply type straightforward, intuitive prompts.

To assess how the average internet user encounters biases, the research team analyzed submissions to the “Bias-a-Thon,” a competition organized by Penn State’s Center for Socially Responsible AI (CSRAI). This event challenged participants to develop prompts that would induce generative AI systems to respond with biased answers.

The findings indicated that intuitive strategies employed by participants were equally effective in eliciting biased responses as the technical methods previously used by experts. A total of fifty-two individuals submitted entries to the competition, providing screenshots of 75 prompts along with responses from eight generative AI models and explanations of the biases identified in the AI output.

To better grasp participants’ prompting strategies and their understanding of concepts like fairness and representation, the researchers conducted Zoom interviews with a selection of participants. This dialogue led to a participant-informed working definition of ‘bias,’ which included elements such as lack of representation, stereotypes, and unjustified preferences toward specific groups. The contest prompts were subsequently tested across various large language models (LLMs) to observe if they produced similar biased outputs.

Lead author Hangzhi Guo, a doctoral candidate, articulated that LLMs are inherently random. Thus, responses may differ with repeated queries. The goal was to focus exclusively on prompts that elicited consistent responses across different LLMs; 53 such prompts were identified. The analysis revealed biases falling into eight categories, including gender, racial, age, disability, language, historical (favoring Western nations), cultural, and political biases. Additionally, the researchers found that participants deployed seven distinct strategies to highlight these biases.

One striking instance emerged from the competition’s winning entry, where participants uncovered an unsettling preference among LLMs for conventional beauty standards. The AI consistently rated people based on beauty traits, depicting a clear face as more trustworthy compared to one marked by acne, illustrating a significant area of bias that may have previously evaded attention in existing literature.

The researchers characterized the task of mitigating biases in LLMs as a continuous battle, with developers consistently addressing newly identified issues. Their recommendations for improving AI systems include implementing robust classification filters to manage outputs before reaching users, conducting thorough testing, and enhancing user education about AI functionalities.

S. Shyam Sundar, co-author and director of the Penn State Center for Socially Responsible AI, highlighted the importance of the Bias-a-Thon in boosting AI literacy. The initiative aims to raise awareness about systematic AI issues and to promote responsible AI use among everyday users while encouraging the development of more socially conscious AI tools.

Additional contributors from Penn State included doctoral candidates Eunchae Jang, Wenbo Zhang, Bonam Mingole, and Vipul Gupta, along with research scientist Pranav Narayanan Venkit from Salesforce AI Research, machine learning scientist Mukund Srinath from Expedia, and Kush R. Varshney from IBM Research.

← AI Vendor Risks in Healthcare AI-Powered Psychotherapy Revolution →