In a recent study published in the prestigious Scientific Reports journal, researchers have comprehensively analyzed creativity in human participants and artificial intelligence (AI) chatbots using the alternate uses task (AUT). This research offers valuable insights into the evolving landscape of AI-generated creativity and its far-reaching implications.
AI and human creativity under scrutiny
The emergence of generative AI tools, including Chat Generative Pre-Trained Transformer (ChatGPT) and MidJourney, has ignited widespread discussions regarding their impact on employment, education, and the legal aspects surrounding AI-generated content. Traditionally, creativity has been perceived as a distinctly human trait characterized by originality and practicality. However, this study challenges these long-standing notions.
Data collection and examination
For this study, researchers collected AUT data from native English-speaking human participants recruited through the online platform Prolific. Out of an initial pool of 310 participants, 256 diligent individuals with an average age of 30.4 were chosen for analysis. This diverse group consisted mostly of full-time employees or students from regions such as the USA, UK, Canada, and Ireland.
In 2023, three AI chatbots—ChatGPT3.5 (ChatGPT3), ChatGPT4, and Copy.Ai—underwent 11 distinct sessions featuring four object prompts. This rigorous approach ensured a robust comparison with the extensive human dataset.
The AUT procedure
The AUT procedure engaged participants with four objects: rope, box, pencil, and candle, focusing on the originality of responses rather than quantity. While humans were tested once per session, AI chatbots participated in multiple sessions with slightly adjusted instructions to ensure fair comparisons.
Before analysis, all responses underwent spell-checking, and any ambiguous short answers were excluded. The study assessed divergent thinking by measuring the semantic distance between objects and their AUT responses using the SemDis platform. Potential biases in responses, especially AI’s use of jargon like “Do It Yourself” (DIY), were addressed to ensure consistency.
Rating and thorough analysis
The originality of answers received ratings from six human raters who remained unaware whether the responses came from humans or AI. Each response was assessed on a scale of 1 to 5, following clear guidelines to maintain objectivity. Importantly, these collective ratings exhibited high inter-rater reliability.
The data underwent rigorous statistical analyses, incorporating fixed effects like group and object and potential covariates.
The study’s analysis uncovered intriguing insights into humans’ and AI chatbots’ creative divergent thinking. Notably, a moderate correlation emerged between the semantic distance and human subjective ratings, indicating that both scoring methods captured similar qualities, though not identical.
Across the extensive comparison between humans and AI, a consistent trend emerged. AI chatbots generally outperformed humans and achieved higher mean and maximum scores regarding semantic distance. When factoring in fluency as a covariate, it was noted that while it decreased mean scores, it increased maximum scores.
This pattern also extended to human subjective ratings of creativity, where AI consistently outperformed humans in both mean and maximum scores. Remarkably, AI chatbots consistently provided unconventional yet logical uses of objects, never scoring below a certain threshold.
Comparison of responses to specific objects
The study explored the responses of humans and individual AI chatbots to specific objects: rope, box, pencil, and candle. The results revealed that ChatGPT3 and ChatGPT4, two AI models, outperformed humans regarding mean semantic distance scores. Nevertheless, when considering maximum scores, there was no statistically significant difference between human participants and AI chatbots. Responses to the rope tended to receive lower semantic distance scores than other objects.
Human subjective ratings evaluating creativity disclosed that ChatGPT4 consistently received higher ratings than humans and other chatbots, showcasing its edge in this domain. Intriguingly, this advantage was not observed when the chatbots were tasked with the object “pencil.” Additionally, responses related to the object “candle” generally received lower ratings compared to other objects. Notably, two AI sessions, one from ChatGPT3 and the other from ChatGPT4 recorded maximum scores higher than any human for the object “box.”