I have been following the development of Machine Learning for about 6 years. When I first started, I trained some models and played around a bit, but for the last 5 years I've just been following the AI news. I've decided to write this post because it seems to me that something significant has happened in the AI space this past month that might have wide ranging consequences for all of us - and sooner than we might think.
Let me start by listing the significant new things that have happened recently.
- GPT-4 has been released.
In 2017 Google invented Transformer models for natural language processing, which is the invention that is propelling most of the recent advancements in AI. This culminated in OpenAi's launch of "Chat-GPT" based on GPT 3.5, which has now been improved to GPT-4. The two major improvements are the ability to process pictures/videos/audio and more token memory.
- OpenAI begins alpha testing GPT-4 Plugins
OpenAI has successfully added tool use to their Chat-GPT. This probably needs to be seen: https://www.youtube.com/watch?v=ZSfwgKDcGKY
But basically, they demonstrate an ability for their software to use tools. Their model recognizes when it should reach for a tool, uses it, and processes the result.
- Nvidia announces AI Foundations
Nvidia is launching AI as a cloud service, where you can upload your data, and train your own GPT-4 on that data.
These 3 advances excite and worry me, because there does not seem to be any significant pieces missing for creating something autonomous, that is sufficiently intelligent to complete almost any task.
As a thought experiment, consider this:
If one wanted to make a drone to kill terrorists, how could this in principle be programmed using something like a GPT-4 robot?
We could start by creating "plugins" that could be used to control the drone and teach the bot to use them. Then we could provide an input prompt – a detailed directive describing what the bot needs to do, and how it needs to do it. We could direct it to patrol a certain city and look for some specific terrorists. We could give a detailed description of what we mean by terrorist and what these terrorists look like and so on. The current limit is about 25 thousand words.
Once we have a detailed description of the task, we could start capturing video/pictures from the drone cameras and feed them back into the model continuously. The bot could then process these images, generating a text description of what it sees, which it could then feed back into itself. If it finds that the textual representation of the camera images satisfy some criteria for action, it could them use the correct plugin to – for instance - fire missiles.
After a while, the token limit will be reached. The drone would then have restart its pattern from the initial directive etc.
Right now, what is publicly available cannot do this, but the framework is there. Now it is only a matter of computing power. We can be fairly sure the capability is coming or is here already. With Nvidia opening GPT-4 up to the masses, it will probably happen sooner.
This is just a basic example, but with tool use, the sky is the limit. As Bill Gates said:
“Natural language is now the primary interface that we’re going to use to describe things - even to computers. It’s a huge advance.”
I see this happening. Natural language will eventually be the only programming language we need.
Lets have a discussion about AI, and where it is going.
Are you worried? excited?
You want to tempt the wrath of the whatever from high atop the thing?
I don't think what we have now can even be called AI. I would call it SI; Simulated Intelligence. These are just advanced language models, there isn't any actual thought behind these models. Call me when a model appears that can actually change it's own mind.
What is scary however, is how these models can be abused. You gave an example, I'm not sure that's feasible, but sure, something like that could probably be implemented "soon". And that is scary, because with no actual thought, that terrorist killing robot would probably be just as good at its job as Tesla's are at driving. Which means it will probably only kill innocent people before it stops for no reason at all.
When I kill her, I'll have her
Die white girls, die white girls
I agree that the current most pressing threat from these systems is how bad they are compared to the trust people place in them - both regarding autonomous vehicles and information in general.
As for if what this is is AI at all, I'm not sure I agree. These large language models seem to be doing "something" that other networks are not. Bill Gates said to his OpenAi team, to wake him up when their model could pass a certain biology exam (that requires creative application of biology knowledge to pass), without being trained on the specific questions beforehand. A couple of months later, GPT-4 passed the exam.
You might say that there is no thought behind it, but we have no good definition of what a thought is. In fact, we have no idea how our own brain arrives at "thoughts" or consciousness. Therefore I think it is difficult to say that this definitely is NOT intelligence.
Her opinion is, that understanding requires a model of what you want to understand. This is precisely what neural networks have and how we believe our brains also store information. I would rather turn it around and say, that it could be the case that intelligence is an emergent property of natural language.
Anyhow, interesting perspective. Thanks for responding.
You want to tempt the wrath of the whatever from high atop the thing?
I'm not sure natural language is a prerequisite at all for intelligence. Animals are certainly intelligent, do they have a natural language model in their brain? Maybe they do, or maybe we can define what they do have as such, but I don't think it's readily apparent that that is how they think.
I agree that some kind of intelligence will eventually emerge from these models, and may already have, depending on our definition, but our second problem is how do we define intelligence, and maybe more interesting consciousness? By narrow definitions, we have already created intelligence, but I don't think those definitions are useful. What we really mean by intelligence, in layman's terms, or in general (we might call it general AI), is intelligence that can be creative, and infer thought and conclusions that seem to reach beyond their input. This is incredibly hard to define and measure, and may be impossible. As you imply, this may be how our brains work; we may be "simply" advanced models, and our thoughts, creativity, and consciousness may simply be emergeent properties of these. In fact, I find it hard to argue that we are anything else, or conceive how we can be anything else. Hence why we don't have free will (callback to our other discussions).
There may be something in the biology of our brains that make our brain vastly more advanced/different than neural networks, Sabine briefly mentioned this in the video you linked, but didn't go into it, or maybe enural networks just aren't advanced enough yet, or maybe there's something else we haven't discovered yet. I think it is one of the former, in which case it is just a question of time, but that leads us to the problem I mentioned earlier; how do we measure intelligence, and how do we detect consciosness.
I would add a third problem. How useful is GAI if it just like ours? We make mistakes, and GAI will too, as it can be no better than its input. I don't mean it can't aggregate and infer thoughts and conclusions greater than its input, it can and will, but it will still be bound by faulty input, or lack of input where our knowledge falls short. How will we deal with that? How can we possibly know?
When I kill her, I'll have her
Die white girls, die white girls
I think what I was trying to say was, that perhaps our notion of what intelligence is, comes from how our brains process language. It seems that people - even AI engineers/researchers - were surprised by the capabilities of GTP-4. These transformer models are basically trying to guess the next word in a sequence. Nobody expected that simple token guessing would produce the kind of complex behavior exhibited by GTP-4. It passes most of the tests we have for sentience, Turing etc. it displays strong Theory of Mind ( the ability to know what others are thinking). It can with minimal instruction learn to use tools - something which we traditionally have regarded as a clear sign of intelligence - that only humans and a few animals were able to do. All these abilities have somehow arisen from the processing of huge amounts of written language. This is what makes me suspect, that humans use their language processing ability to evaluate intelligence - if that makes any sense. That's the only explanation I can think of, for why language models have gotten so much further than other networks.
The problem #1 and #2 is, as you say, with how we define intelligence and consciousness. We have no good definition for these things, and therefore we are also incredibly bad at testing for them. Are our brains vastly more advanced compared to these models? a couple of months ago I would say absolutely, but now I am not so sure. I did not expect GTP-4 to get as far as it has. Most likely there are some key pieces missing, but its not clear.
I would add a third problem. How useful is GAI if it just like ours? We make mistakes, and GAI will too, as it can be no better than its input. I don't mean it can't aggregate and infer thoughts and conclusions greater than its input, it can and will, but it will still be bound by faulty input, or lack of input where our knowledge falls short. How will we deal with that? How can we possibly know?
Humans have a certain fixed amount of neurons. A computer can have an unlimited amount of nodes in their network. So even if AGI works the same as a human, it promises to be superior. As for input, that is indeed becoming a bottleneck for training the next generation of models as GPT-4 has already allegedly been trained on "a significant portion of the internet". So where will new input come from? The answer seems to be in the multi modality. GPT-4 can process images, video and sound, so there is a lot of input in all video and sound ever recorded. Eventually you would have to gather data from the real world.
I think the question you are asking is: can these models produce new knowledge themselves? I think they can, but it would eventually require some sort of real world agency. You can already today ask GPT-4 to come up with a good scientific experiment that would produce new knowledge. It will outline it for you, but cannot conduct the experiment and cannot feed the result back into its training data. But if it could somehow do that, then yes, I believe that would count as new knowledge.
To be clear - these models are far from perfect and have stunning weaknesses. They will continue to make mistakes as we do. But there does seem to be glimpses of AGI within these models. How useful future generations of models will be... I dare not even venture a guess.
You want to tempt the wrath of the whatever from high atop the thing?
Some researchers got access to an early, unsecured version of GTP-4 while OpenAi was developing it. Their paper "Sparks of Artificial General Intelligence: Early experiments with GPT-4" can be downloaded here.
One of the authors, Sebastian Rubeck explains their findings in this lecture:
Above post speaks of "sparks of AGI" and claims that GTP-4 has Theory of Mind.
Here an AI Hype critic, Gary Marcus, is saying this is not true. He is basically taking Vuzmans position:GPT-5 and irrational exuberance
Essentially none of the fancy cognitive things GPT-4 is rumored to do stand up. Just today I discovered a new paper about Theory of Mind; in the essay by Davis and myself above, we criticized some dubious claims that ChatGPT passed theory of mind. Maarten Sap from Yejin Choi’s lab dove deeper, with a more thorough investigation, and confirmed what we suspected: the claim that large language models have mastered theory of mind is a myth: https://twitter.com/MaartenSap/status/1643236012863401984?s=20
You want to tempt the wrath of the whatever from high atop the thing?
OK, so some smart people seem to disagree about whether Theory of Mind is an emergent property of these transformer networks. The "Sparks of AGI" guys prefaced their lecture by saying that their findings were based on an unrestricted version of OpenAi's GPT-4 and that results could not be reproduced. This might account for why Gary Marcus cannot confirm - or it might not. The consensus seems to be in favor of ToM, but it is definitely too early to conclude anything.
If we put aside for a bit the notion of AGI, intelligence and consciousness, there is the very real threat of what these models can do today and will be able to do tomorrow.
If you have 1 hour to spare, I encourage you to watch "The A.I. Dilemma " (from the guys behind the netflix documentary The Social Dilemma). They manage to sum everything that is happening right now up quite nicely. If it doesn't scare you, I don't know what will. https://www.youtube.com/watch?v=xoVJKj8lcNQ
You want to tempt the wrath of the whatever from high atop the thing?
Greg Brockman, the President and co-founder of OpenAi made a statement on twitter yesterday, addressing some of the AI related topics of late.
The underlying spirit in many debates about the pace of AI progress—that we need to take safety very seriously and proceed with caution—is key to our mission. We spent more than 6 months testing GPT-4 and making it even safer, and built it on years of alignment research that we pursued in anticipation of models like GPT-4.
We expect to continue to ramp our safety precautions more proactively than many of our users would like. Our general goal is for each model we ship to be our most aligned one yet, and it’s been true so far from GPT-3 (initially deployed without any special alignment), GPT-3.5 (aligned enough to be deployed in ChatGPT), and now GPT-4 (performs much better on all of our safety metrics than GPT-3.5).
We believe (and have been saying in policy discussions with governments) that powerful training runs should be reported to governments, be accompanied by increasingly-sophisticated predictions of their capability and impact, and require best practices such as dangerous capability testing. We think governance of large-scale compute usage, safety standards, and regulation of/lesson-sharing from deployment are good ideas, but the details really matter and should adapt over time as the technology evolves. It’s also important to address the whole spectrum of risks from present-day issues (e.g. preventing misuse or self-harm, mitigating bias) to longer-term existential ones.
Perhaps the most common theme from the long history of AI has been incorrect confident predictions from experts. One way to avoid unspotted prediction errors is for the technology in its current state to have early and frequent contact with reality as it is iteratively developed, tested, deployed, and all the while improved. And there are creative ideas people don’t often discuss which can improve the safety landscape in surprising ways — for example, it’s easy to create a continuum of incrementally-better AIs (such as by deploying subsequent checkpoints of a given training run), which presents a safety opportunity very unlike our historical approach of infrequent major model upgrades.
The upcoming transformative technological change of AI is something that is simultaneously cause for optimism and concern — the whole range of emotions is justified and is shared by people within OpenAI, too. It’s a special opportunity and obligation for us all to be alive at this time, to have a chance to design the future together.
Biggest takeaways for me are:
1. GPT-5 is probably more than a year out, because everyone is waiting on new hardware to arrive.
2. A paper is already out that describes an autonomous GPT-4 agent that makes scientific experiments. In the paper they conclude:
In this paper, we presented an Intelligent Agent system capable of autonomously
designing, planning, and executing complex scientific experiments. Our system
demonstrates exceptional reasoning and experimental design capabilities, effectively
addressing complex problems and generating high-quality code.
3. Pretty much the entire AI research community is worried about the rate of progress and where research is heading.
You want to tempt the wrath of the whatever from high atop the thing?
Yeah I loved it, the thing that I keep coming back to, is that they actually did these test on -4 during last fall, before security was added, and the unicorn was better than later, when openai added security to -4 for public use.
Also, that whole discussion we had last year, about the guy at Google that quit, saying the AI had become sentient, and how people scoffed at him. Now it seems quite plausible.
The conventional view serves to protect us from the painful job of thinking.
- John Kenneth Galbraith
Grizlas has suggested that I log back onto the site and join in the discussion on AI.
I haven't read through the whole discussion (perhaps I will find the time to do so later), but I would love to add a few experiences I have had with AI.
I have used it a fair bit in my work. I typically feed the "free" Chat-GPT version a short story, speech or an article, that I want to work on in my next English class and then I set it to generate work questions, multiple choice questions, difficult words + one sentence definitions that students can pair up, and so on...
Sometimes it makes my work quite a bit easier. A few times it has really surprised me, with giving me an angel to approach a text, that I hadn't seen before (one example is that in a chapter in Of Mice and Men it suggested some work questions about the narrator. None of the guides and previous material I had, had mentioned this about the chapter, but once I saw the AI suggest it, the chapter seemed perfect to train students in writing about narrator).
At other times it doesn't give me that many usefull suggestions and I revert back to preparing the material for next class as I usually do.
I am thinking about "going pro" and testing whether GPT 4.0 can help me more.
I was talking to Grizlas about that I was asking the AI the wrong questions. I should be asking it how I advance in my career, or how I ask my supervisor for a better position But I keep coming back to that it is a language model, and might be best suited for working with texts, language generation and writing.
(btw, it is far better at working with non-ficiton texts, than fiction texts in my experience)
Your prompt engineering is probably far superior to mine. Funny to think that in a few years, perhaps all programmers will be doing what you are doing.
Regarding prompt engineering, I keep thinking about the turtle scene from Blade Runner:
Tangentially related. Scott Shafer and Coffezilla have spent years helping people avoid scams, particularly by critically investigating the myriad of online personalities that sell courses for this, that and whatever. These scammers that used to be marketing experts, then crypto experts, then finfluencers. Now it seems they are all pivoting to be AI experts, selling expensive courses on how you too can be an AI expert
The conventional view serves to protect us from the painful job of thinking.
- John Kenneth Galbraith
I found another way that Chat-GPT (still 3.5) surprised me. I asked it to be the examiner on a high school level oral exam about a topic in Psychology (or a text in an English exam). Then I asked it to ask me questions and rate my answers. It was often quite instructive and correct.
Considering that most examiners do about a dozen of these exams in a row, typically two days straight, the level of questions and feedback far exceeds anything a student might encounter irl.