It seems that GPT-4 Turbo – the most recent incarnation of the large language model (LLM) from OpenAI – winds down for the winter, just as many people are doing as December rolls onwards.
We all get those end-of-year Holiday season chill vibes (probably) and indeed that appears to be why GPT-4 Turbo – which Microsoft’s Copilot AI will soon be upgraded to – is acting in this manner.
As Wccftech highlighted, the interesting observation on the AI’s behavior was made by an LLM enthusiast, Rob Lynch, on X (formerly Twitter).
@ChatGPTapp @OpenAI @tszzl @emollick @voooooogel Wild result. gpt-4-turbo over the API produces (statistically significant) shorter completions when it “thinks” its December vs. when it thinks its May (as determined by the date in the system prompt).I took the same exact prompt… pic.twitter.com/mA7sqZUA0rDecember 11, 2023
The claim is that GPT-4 Turbo produces shorter responses – to a statistically significant extent – when the AI believes that it’s December, as opposed to May (with the testing done by changing the date in the system prompt).
So, the tentative conclusion is that it appears GPT-4 Turbo learns this behavior from us, an idea advanced by Ethan Mollick (an Associate Professor at the Wharton School of the University of Pennsylvania who specializes in AI).
OMG, the AI Winter Break Hypothesis may actually be true?There was some idle speculation that GPT-4 might perform worse in December because it “learned” to do less work over the holidays.Here is a statistically significant test showing that this may be true. LLMs are weird.🎅 https://t.co/mtCY3lmLFFDecember 11, 2023
Apparently GPT-4 Turbo is about 5% less productive if the AI thinks it’s the Holiday season.
Analysis: Winter break hypothesis
This is known as the ‘AI winter break hypothesis’ and it’s an area that is worth exploring further.
What it goes to show is how unintended influences can be picked up by an AI that we wouldn’t dream of considering – although some researchers obviously did notice and consider it, and then test it. But still, you get what we mean – and there’s a whole lot of worry around these kinds of unexpected developments.
As AI progresses, its influences, and the direction that the tech takes itself in, need careful watching over, hence all the talk of safeguards for AI being vital.
We’re rushing ahead with developing AI – or rather, the likes of OpenAI (GPT), Microsoft (Copilot), and Google (Bard) certainly are – caught up in a tech arms race, with most of the focus on driving progress as hard as possible, with safeguards being more of an afterthought. And there’s an obvious danger therein which one word sums up nicely: Skynet.
At any rate, regarding this specific experiment, it’s just one piece of evidence that the winter break theory is true for GPT-4 Turbo, and Lynch has urged others to get in touch if they can reproduce the results – and we do have one report of a successful reproduction so far. Still, that’s not enough for a concrete conclusion yet – watch this space, we guess.
As mentioned above, Microsoft is currently upgrading its Copilot AI from GPT-4 to GPT-4 Turbo, which has been advanced in terms of being more accurate and offering higher quality responses in general. Google, meanwhile, is far from standing still with its rival Bard AI, which is powered by its new LLM, Gemini.