Gongumenn

May 07 2026 22:46:24

View Thread

Gongumenn | General | General Discussion

Page 5 of 5

5

99

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 13-09-2024 08:15

Openai just dropped their new model o1 (Strawberry) which is supposed to offer what is referred to as System 2 thinking.

This is exactly what the critics (makers of Blocksworld) say that LLMs lack, so it will be interesting to see the Blocksworld benchmarks for this model.

I've done some experiments with it, and I can't say that I am that impressed yet.

The most impressive thing is this ability to see the model's chain of thought. It definitely looks like some sort of thought process, albeit a bit alien

You want to tempt the wrath of the whatever from high atop the thing?

Edited by Grizlas on 13-09-2024 08:21

Send Private Message

0

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 27-09-2024 19:32

A new Planbench paper is out: LLMS STILL CAN’T PLAN%3B CAN LRMS?
A PRELIMINARY EVALUATION OF OPENAI’S O1 ON PLANBENCH

And here are the Planbench results:

o1 almost aces regular Blocksworld with 97.8% accuracy, compared to the previous best model 62.6%. On Mystery Blockworld it does substantially better than previous models - 52.8% compared to the best model 4.3%.

Goalposts are then promptly moved, by increasing problem steps and further randomizing of strings. The results on these new benchmarks show that o1 still can't plan all that well. Here's their conclusion:

You want to tempt the wrath of the whatever from high atop the thing?

Edited by Grizlas on 27-09-2024 19:33

Send Private Message

0

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 09-10-2024 21:35

John Hopfield and Geoffrey Hinton win the 2024 Nobel Prize in Physics for their pioneering work on neural networks. Also, David Baker, Demis Hassabis and John Jumper win the 2024 Nobel Prize in Chemistry for developing AlphaFold (Google Deepmind).

Geoffrey Hinton, a complete savage, almost immediately uses this oppurtunity to take a swing at Sam Altman

https://fortune.com/2024/10/09/openai-sam-altman-geoffrey-hinton-nobel-prize-physics-ilya-sutskever-toronto/

You want to tempt the wrath of the whatever from high atop the thing?

Edited by Grizlas on 09-10-2024 21:36

Send Private Message

0

RE: AI discussion

Field Marshal

Group: Administrator, Klikan, Regulars, Outsiders

Location: Copenhagen

Joined: 09.06.06

Posted on 10-10-2024 06:48

The conventional view serves to protect us from the painful job of thinking.
- John Kenneth Galbraith

Send Private Message

0

RE: AI discussion

Field Marshal

Group: Administrator, Klikan, Regulars, Outsiders

Location: Copenhagen

Joined: 09.06.06

Posted on 12-10-2024 09:11

The conventional view serves to protect us from the painful job of thinking.
- John Kenneth Galbraith

Send Private Message

0

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 22-03-2025 10:24

Vibe Coding is a new thing. What do people think about it?

You want to tempt the wrath of the whatever from high atop the thing?

Send Private Message

0

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 29-03-2025 09:01

You want to tempt the wrath of the whatever from high atop the thing?

Send Private Message

1

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 29-03-2025 09:33

You want to tempt the wrath of the whatever from high atop the thing?

Send Private Message

1

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 29-03-2025 09:37

You want to tempt the wrath of the whatever from high atop the thing?

Send Private Message

1

RE: AI discussion

General

Group: Klikan

Location: Argir

Joined: 12.06.06

Posted on 30-03-2025 19:15

Hahaha that's just brilliant

Nailed Jesus by the way

no pun intended

Why would I want to end every post the same way?

Edited by OKJones on 30-03-2025 19:17

Send Private Message

0

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 05-07-2025 12:57

A long nerdy talk by John Carmack (creator of Doom/Quake). He says this, which I completely agree with and find very interesting (at 4:22)

All the cool kids are doing an LLM startup and raising huge amounts of money to do things, and I am, you know, amazed and excited about all the things that are happening there.

I’m a daily user of many of these technologies, but I do still look at this: fundamentally, LLMs can’t be the whole answer, you know? These uh, transformer-based models — it’s not the way a human brain works, and what they do, as magical as it is, is not handling so many of the fundamental things that, you know, cats and dogs do, let alone small children.

Where a lot of people don’t, you know, they imagine that these LLMs are reading all of these books, reading the whole Internet, and they don’t understand that, no, it’s taking all of human knowledge, putting it in a giant blender, and then training from there, and it’s magical: it works really well. But when you’re put into a situation where you have to learn something new, there are fundamental things that so many of the researchers here are working on that are just not understood yet. We do not have even a line of sight to an answer for these things.

You want to tempt the wrath of the whatever from high atop the thing?

Send Private Message

1

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 10-07-2025 16:30

And while the hype continues with the release of Grok 4:

Grok 4 is post-grad level in everything.

Another benchmark: LiveCodeBench Pro emerges, made by a research team from eight universities led by competitive programming experts and international olympiad medalists.

All models fail miserably:

Grok has not been tested on this benchmark, but I doubt it would fare any better.
On the flip side - we now have a benchmark made by some extremely smart people - i.e. the smartest shit we humans can come up with. If it is ever beaten without there being some cheating involved, it will be difficult to move the goalposts.

You want to tempt the wrath of the whatever from high atop the thing?

Edited by Grizlas on 10-07-2025 16:41

Send Private Message

2

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 19-07-2025 11:40

And already we have a new development. OpenAI achieves GOLD at IMO 2025, with 35/42 points:

https://x.com/alexwei_/status/1946477742855532918

Google Deepmind came very close to gold last year with their AlphaProof model score of 28/42 - which was some sort of hybrid RL/LLM. Now OpenAI has achieved IMO gold using what they call a "reasoning LLM":

So, how come all the "reasoning models" get 0% on the LiveCodeBench Pro benchmark which consists of unpublished IMO problems? Only explanation I can think of, is that these experimental reasoning models have not been published yet. Given this development I doubt LiveCodeBench Pro will stay at 0% for long.

You want to tempt the wrath of the whatever from high atop the thing?

Send Private Message

2

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 21-07-2025 22:04

It seems Google Deepmind also managed to get IMO gold this year:

This time they did not use the AlphaProof approach; They used an advanced version of their Gemini LLM and scored exactly the same as OpenAI: 35/42 solving 5 of the 6 problems.

This is substantially sooner, than even the most dystopian AI "experts" predicted.
Eliezer Yudkowsky predicted in 2022 a 16% chance of an AI getting IMO Gold in 2025

You want to tempt the wrath of the whatever from high atop the thing?

Send Private Message

1

RE: AI discussion

General

Group: Klikan

Location: Argir

Joined: 12.06.06

Posted on 12-10-2025 01:06

Will this all come crashing down on some point? Does OpenAI have a business model?

Why would I want to end every post the same way?

Send Private Message

0

RE: AI discussion

General

Group: Administrator, Klikan, Regulars, Outsiders

Location: Denmark

Joined: 08.06.06

Posted on 12-10-2025 11:52

Great video. I'm expecting a huge crash at some point. There is just too wide a gap between the promised land (self-driving cars, private robots, AGI etc.etc.) and where we are today. I don't see this bubble going on long enough to reach it.

You want to tempt the wrath of the whatever from high atop the thing?

Send Private Message

0

RE: AI discussion

Field Marshal

Group: Administrator, Klikan, Regulars, Outsiders

Location: Copenhagen

Joined: 09.06.06

Posted on 13-11-2025 06:47

Rick discusses the current AI "number one" song... He hates it of course (its a shit song), and it turns out the company went for a specific list in a specific genre, which is so small that it only cost them $3000 to get all this buzz.

The conventional view serves to protect us from the painful job of thinking.
- John Kenneth Galbraith

Send Private Message

0

RE: AI discussion

Admiral

Group: Klikan, Outsiders, Administrator, Regulars

Location: Copenhagen, DK

Joined: 10.06.06

Posted on 13-03-2026 14:13

When I kill her, I'll have her
Die white girls, die white girls

http://flickr.com/photos/heini/

Send Private Message

0

RE: AI discussion

Field Marshal

Group: Administrator, Klikan, Regulars, Outsiders

Location: Copenhagen

Joined: 09.06.06

Posted on 29-03-2026 05:34

The conventional view serves to protect us from the painful job of thinking.
- John Kenneth Galbraith

Send Private Message

0

Page 5 of 5

5

Jump to Forum:

Back to front page