
Recent research from experts at Cardiff University in the UK and Ca’ Foscari University of Venice reveals that large language models (LLMs) still fall short in understanding puns and humor. This study brings new insight into the deficiencies of AI in grasping the subtleties of language that rely on clever wordplay.
The study aimed to evaluate LLMs’ understanding of puns, revealing significant shortcomings. For instance, when presented with the pun, “I used to be a comedian, but my life became a joke,” researchers found that while LLMs could detect the structure of the pun, they struggled to understand its comedic aspect. When the phrase was altered to remove its double meaning, such as changing it to “I used to be a comedian, but my life became chaotic,” LLMs still recognized it as a pun.
Another example utilized was: “Long fairy tales have a tendency to dragon.” In this case, replacing “dragon” with a synonym or random word still led LLMs to perceive a pun. The research highlights the models’ reliance on memorized patterns and structures rather than a genuine comprehension of humor. Professor Jose Camacho Collados elaborated on this by stating, “In general, LLMs tend to memorize what they have learned in their training.” While they may recognize existing puns, this does not equate to true understanding.
The study’s findings are particularly relevant for developers looking to use LLMs in applications that require a nuanced understanding of humor or empathy. Prof. Collados noted, “We found their understanding of puns is an illusion,” emphasizing the fragility of LLMs in processing humor. Interestingly, when researchers tested phrases like “Old LLMs never die, they just lose their attention” and swapped out ‘attention’ for ‘ukulele,’ the AI still perceived it as a pun solely because of the phonetic similarity.
Conducted with rigor, the team noted that when LLMs encountered novel wordplay, their accuracy in distinguishing puns from non-puns dropped dramatically to as low as 20%. The researchers surmised that relying on LLMs for tasks involving human-like understanding could lead to confusion and misunderstandings due to these deficiencies. Their findings were shared at the 2025 Conference on Empirical Methods in Natural Language Processing in Suzhou, China, in a paper titled Pun unintended: LLMs and the illusion of humor understanding.