I've been chatting in circles with ChatGPT.
After two years of use, advancing from tentative exploration to borderline addiction, it occurred to me to wonder what I was doing. Initially, applying the rules of smart searching developed over 30 years of googling (since Yahoo days), I tried to use it like a search engine, which wasn't very successful because it wouldn't give me the primary sources I was seeking. But everyone was raving about it and I began to wonder what I was missing.
So like a lemming I ran straight to the cliff.
It didn't take me long to reach the edge; we get the same type of dopamine reward from using ChatGPT as we do from social media. Whereas the latter triggers a release of dopamine in response to social validation from other people (yea! I've found my tribe!), the former triggers dopamine release in response to self-validation (yea! I know something!). Although we all tick differently, as a general rule we get a lot of dopamine from cat videos, only a little from googling recipes, and a moderate amount from prompting ChatGPT.
Enough for an addiction.
Rather than reverting to loveless Google Scholar searches and the discipline of weighing primary sources (which now feels so onerous), I buckled down to prove that ChatGPT was good for something...by using it even more. And, in a sense, I succeeded. On several occasions I have collated information (that has since been validated) in a fraction of the time it would have taken had I used 'old-fashioned' googling. For example, tables comparing the primary receptor sites activated by different medications (which caught a contraindication overlooked by my doctor).
Wow, ChatGPT really can increase productivity--no wonder MO loves it!
Along the way, however, I noticed a few discomfiting things. The most insidious is the form of the output: whereas from a search we get a list of potential sources (putting the onus on us to sift the wheat from the chaff, test it for rancidity, grind it until palatable, and do much of the baking), ChatGPT gives an apparently palatable product--even when it's only half-baked. A failed internet search produces negative results (e.g., no or only fringe websites), so we know we aren't getting the full story. In contrast, a failed ChatGPT conversation almost always produces positive output (an answer, often phrased as truth rather than conjecture), lulling us into assuming we've got the whole story.
Even when it's fiction.
For example, one red flag I encountered was the tendency for ChatGPT to contradict itself. It would not be entirely unexpected to get different perspectives as a conversation evolved or different interpretations of fact under different circumstances, but as I drilled down I kept encountering contradictions in the "facts" themselves. Whenever I pointed this out, ChatGPT would reply "Oops!"...but what if I hadn't drilled down??
Okay, so if ChatGPT doesn't employ logic, what does it do?
That's when I finally asked ChatGPT to explain what it actually is. Here's a summary:
AI (system that performs tasks normally requiring human intelligence)
└── Generative AI (AI that generates something, like text)
└── LLMs (Large Language Models = statistical models based on a lot of text data)
└── Conversational AI (AI that engages in dialogue)
└── Generative AI (AI that generates something, like text)
└── LLMs (Large Language Models = statistical models based on a lot of text data)
└── Conversational AI (AI that engages in dialogue)
└── ChatGPT (LLM-based conversational AI tool using super-fast neural network architecture = a fancy Generative Pre-trained Transformer chatbot)
[FYI, all chatbots are conversational AI, and most conversational AI now (since 2018) use LLMs, but many companies still use their older (non-LLM) script-based chatbots for customer service interactions (e.g., Ivan the Terrible phone systems or the AI Assistants that trap you in an infinite loop when you dare to ask a question outside their script).]
What actually happens when you ask ChatGPT a question is roughly this:
- Your question is tokenized (split into numeric units) to facilitate determining what kind of task you want performed.
- This task is placed into the context of your conversation history and the sidebars of your request.
- The LLM plugs this into a probability model developed from training datasets to predict one token (e.g., word of a sentence) at a time.
- The predicted tokens are organized according to the guidance provided by your question (e.g., format) and filtered to improve clarity (e.g., grammar) and "safety" (e.g., qualifiers are added to keep them from getting sued).
- The output is iteratively refined as each token is generated.
These steps explain a lot about ChatGPT output.
For example, because the specific words of your ask are used as tokens (Step 1) for the analysis, the quality of it (e.g., the precision of your word choice) significantly affects the quality (approach, reasoning, format, context, tone) of the output. The worst that happens when you pose willy-nilly questions when googling is a failed search or landing on some sketchy sites, but ChatGPT produces a grammatically correct sentence that seems to make sense--even if it is completely factually inaccurate.
As Bruce used to say, "Garbage in, Garbage out".
Your first protection against GIGO is to mindfully design and structure your asks (called prompting) to guide ChatGPT (via good quality tokens) to produce the output you want. Since we rarely think of all the relevant parameters at the get-go, most
asks must involve a series of ever more specific prompts as we drill
down toward our target. This process creates the conversation history ChatGPT uses to put your ask-task into context (Step 2). If you login to your MS account, it will use all of your
past conversations as context.
Potentially contributing to an ever-tighter echo chamber.
This is not because of recommendation algorithms like your TikTok feed, but rather is inherent to how LLMs work. After you submit your prompt, your ask-task-tokens enter one side of the LLM and output-tokens emerge from the other side (Step 3). It's okay for it to be a bit of a black box, but to interpret the output of a model, you have to at least know what it is (and what it is not) trying to do.
LLMs are not trying to answer your question accurately.
In fact, they are not even trying to answer your question! The goal of an LLM is to come up with a series of statistically associated words by predicting token sequences using weighted probability distributions drawn from very large training datasets. In other words, to provide an output (answer) to your requested task (question) by fashioning a sentence (sequence of word tokens) that reflects the most commonly encountered (peak of the bell curve) sequences of those tokens observed in the training texts (which is then tidied up a bit in Step 4).
So, the most commonly rehashed slang (as regression to the mean converges on the average...aka C-grade 'knowledge').
This means that the sources/training texts are pretty darn critical. We have a tendency to think that 'all of human knowledge' is now floating around the internet, and therefore would be available as training texts for LLMs, but not so fast. Not only are videos (all the wisdom of YouTube!) not available to most LLMs (except via transcripts, captions, reviews, etc.), but only ~15% of books (globally) have even been digitized and fewer than ~20% of English-language books are currently in the public domain (copyright lasts 70 years after the death of every author). Almost all textbooks published in the past 20 years remain behind paywalls, along with half of scholarly articles. Further, ChatGPT claims (at least) that it doesn't continuously scrape sites or share information across users.
Leaving most--especially recent--human knowledge out of reach.
OpenAI naturally considers its training texts to be proprietary and does not disclose them, but ChatGPT reports they are mostly publicly accessible internet data and a "non-trivial" but quite small volume of licensed material (like news archives). [Note: current ChatGPT training texts predate 9/21, with new training cycles & release updates occurring every 1-2 yrs.] Although this includes formerly reliable material (Wikipedia, government, company, and university websites--all of which are now disappearing or restricting access), it also includes personal blogs, press releases, and public social media posts. In other words, not only do the LLMs not validate output text, but much (and increasingly more) of the training text derives from non-validated sources. The output, therefore, can just as well reflect the most widespread social messaging buzzwords as any form of 'information'.
Such as viral misinformation-bubble reposts.
Incidentally, ChatGPT is the first to admit that the subscription version does not use better quality training data than the free version. In both version, however, we may have already surpassed the peak of prediction accuracy, because since 2023 the training datasets for ChatGPT have also increasingly included--wait for it--AI generated content. In our Brave New World of circular reasoning, the blind truly are leading the blind.
Welcome to 1984--no wonder NP calls it a scourge!
Now, all that said, ChatGPT isn't going anywhere and I have no intention of dropping it. It still has tremendous potential, especially in fields (e.g., professional) for which a large body of material has traditionally been freely available. In some fields, the amount of quality training data is even likely to increase in the years to come. Judicious use of this tool can greatly expand the reach of any one individual. But, as ChatGPT put it, it's best to use it as a
Brainstorm Buddy, not a Truth Teller.
Brainstorm buddy is a good way to put it - it often helps me get my ideas flowing, if only because I hate all the ones ChatGPT offers.
ReplyDelete