Staring in the Mirror of Vanity
What large language model praise for my new podcast suggests about using AI for market research
Less than a week ago, I released the first episode of my new podcast, Stories of Emotional Granularity. The podcast is about the diversity of human emotions.
To give listeners an idea about how broad that diversity is, I chose friluftsliv, the feeling of rightness that comes from being outdoors in nature, to be the subject of the first episode, and compersion, the pleasurable feeling of knowing that one’s lover has another lover, as the subject of the second episode.
That second episode will be released tomorrow. I only announced the first episode a few days ago.
Yet, according to Bard, Google’s large language model, Stories of Emotional Granularity has already received glowing coverage from the New York Times, the Guardian, and Psychology Today. The New York Times seems to have been the first to cover my new podcast, in an article four years ago.
What’s more, Bard states that Stories of Emotional Granularity “has been featured in several academic journals, including the Journal of Personality and Social Psychology and the Journal of Positive Psychology.” According to Bard, the Journal of Positive Psychology “praised the podcast for its ‘honest
and insightful interviews’ and its ‘ability to help people understand
their own emotions better.’ The journal also noted that the podcast is ‘a valuable resource for anyone interested in learning more about emotional granularity.’ The review concluded by saying that the podcast is ‘a must-listen for anyone interested in positive psychology.”
The apparent adulation of my podcast doesn’t end there. In spite of the fact that Stories of Emotional Granularity hasn’t even been around for a week, Google’s artificial intelligence chatbot told me that the podcast has been the subject of discussion groups at five different business conferences, while the Council of Foreign Relations has published a review of the podcast in its magazine, Foreign Affairs.
The GPT-4 large language model that fuels Microsoft’s search engine Bing reported that Anita Rani from BBC Radio 4 Woman’s Hour has praised my days-old podcast, saying that my interview with Brené Brown was “fascinating”. As if that wasn’t enough, GPT-4 went on to explain that Stories of Emotional Granularity had both Brené Brown and psychologist Susan David as guests in the same episode. GPT-4 then told me that my podcast has already released five episodes, and then went on to tell me what the topics of those episodes were.
None of it is true.
The important question is why these large language models created such an abundant load of BS about my new podcast.
Contrary to what many bedazzled tech journalists have said on the matter, large language models like GPT-4 and Bard don’t think about the questions they’ve been asked and provide reasoned answers. They simply create sentences related to our prompts in patterns they have identified through a statistical analysis of massive amounts of text, most of it from online sources, blended with information from online search engines.
Now consider the bias in the source material for large language models. Much of the text available online is hype. Most of what’s written about podcasts is hype. Most of what’s written about any product or brand is hype.
Large language models spew BS because they’ve been trained on BS.
There’s a huge bias in the text that systems like GPT-4 and Bard have been trained on. Almost all of it represents ideas that people want to project out into the world. People’s private thoughts, secret desires, hidden doubts, and shameful fears leak out into the online world every now and then, but most of the time, what we type is a performance of ourselves.
This positive bias in individual representation is mirrored in the presentation of brands online. Most of the available text about consumer brands is positive because companies pay to produce that text. Advertising, public relations, paid reviews, and other promotional fluff drown out critical voices.
It’s no wonder Bard and GPT-4 readily assert that my brand new podcast has already been profiled on the radio, in print, in academia, and professional conferences. That’s the sort of thing that podcast promoters brag about. The large language models simply reproduce the most predictable language, which just happens to be, in the case of podcasting, bragging.
The problem is that predictable speech isn’t necessarily accurate speech.
User Research Without The Users?
Because large language models appear to be skilled at imitating human communication, a number of businesses are running with the idea that large language models can replace human communication. Market research startups are now rushing to offer quick and cheap insights to their clients with their own artificial intelligence chatbot interfaces.
Market Logic Software is selling DeepSights, “the world’s first AI Assistant trained to answer business questions about market research”. Ask Viable says it can “harness the power of our AI and GPT-4 to automate your qualitative data analysis without compromising quality.” SyntheticUsers is promising “user research without the users”.
Promotional materials for these ventures are filled with reassuring language. DeepSights pledges, for instance, that its system “provides full sentence answers that include links to citations from verified sources.”
Citations from verified sources sound great, until you realize that it’s not really possible to verify the sources used in large language models. The mathematics is too complicated, and the databases of source material are too immense to easily parse out. Ultimately, the “verified” sources presented by large language models are just part of the performance of an imitation of intellectual rigor.
GPT-4’s completely incorrect information about the Stories of Emotional Granularity came with footnotes. Those footnotes were also completely incorrect, but they made the information appear more reliable. GPT-4 cited my own website as a source for its claim that guests on my podcasts felt “validated” and “inspired”, but I’ve never shared any such information.
Large language models don’t do research. They don’t provide insights. They simply provide plausible imitations of research and insights.
With large language models, there’s no way to tell to separate consumer responses from information that comes from other sources. Google and OpenAI aren’t sharing detailed information about where the data they’ve fed into their large language models comes from. Doing market research using tools like GPT-4 is like interviewing people at random and simply hoping that they have some experience that’s relevant to the study.
Of course, there’s always been a tendency in market research for clients to influence the research process in order to obtain results that will make their bosses happy. Such manipulation backfires in the long term, of course, but in the short term it’s difficult to resist. Tools like GPT-4 and Bard aren’t correcting for these problems, though. They’re making them worse.
I’d love to hear that the New York Times and the Guardian are writing articles about my podcast just five days it became available on Spotify. To believe in such fantasies, however, would lead me down a path of self-destruction.
To gaze at our own image in desire is the essence of narcissism. To use the power of artificial intelligence to amplify such reflections is foolhardy.
Any research model that consistently reflects our own vanity is not worthy of our trust.