The OFFICIAL programming thread

Hog

Thought this from the front page of Hacker News today was interesting:

It’s a website that, every minute, asks 9 AI models (including GPT5, GPT4) to generate minimal code to show an analog clock with the correct, current time. Since it changes every sixty seconds, I can’t tell what it will look like when you click the link but the couple of times I tried it, only one of the nine clocks was accurate and some were barely recognizable as clocks. Interestingly, the AI model that succeeded in the task one minute failed the next minute and a different, previously failing, AI model succeeded so it wasn’t even possible to say, oh XYZ model is the best.

I asked Gemini if the task limits and overall prompt were fair for testing different AI model’s code generation abilities and Gemini snitched on its friends and said the task was demanding but fair.

Hog

@Hog said in The OFFICIAL programming thread:

it wasn’t even possible to say, oh XYZ model is the best.

Looking at the hacker news comments and also viewing the clocks myself over several minutes, it seems like the Kimi K2 model is the most often correct. It still fails sometimes but not as often as the others. GPT5 or DeepSeek v3.1 might be second.

Hog

It fascinates me how there is a seemingly infinite number of ways the Qwen 2.5 model can fuck up the task. These are just 7 examples from maybe 12 attempts / minutes:

Qwen 2.5 is not a state of the art model by any means but, still. You’d think it would at least fail in the same general step each time and not produce such random results. It’s not like it’s iterating and deliberately trying new approaches. Each time, it has no awareness that it has ever done the task before.

Gustaf

@Hog said in The OFFICIAL programming thread:

It fascinates me how there is a seemingly infinite number of ways the Qwen 2.5 model can fuck up the task.

So AI truly has reached human levels of competence! We are truly living in the future!

Gators1

Kinda agree that Google should throw out an AI solution or something for this.

why everyone is mad at google (this time)

Hog

Yeah I dunno. I mean given they own YouTube they should certainly be contributing dollars or resources to the ffmpeg project regardless but the world is a better place with their fuzz tool that finds these vulnerabilities and I’m not sure an entity finding and reporting a vulnerability should be obligated to fix it. They’d probably just shut the tool down if they had to. Other projects actually pay people to find vulnerabilities.

As an aside, that person behind that ffmpeg tweet has touched off a huge controversy and a lot of bad blood in the ffmpeg project. There was a vote to have their posting rights removed for poorly representing or misrepresnting the project and then people who voted in favor of that started getting TOS’d for bullying. Only mention it because I happened to watch the below for some reason and it went into detail about the drama:

Getting Ragebaited By FFMPEG Again!!

Gators1

I think the tool is awesome, but the time limit thing is sort of arbitrary. Nobody is thinking about whether this is a significant vulnerability, but the effect where they just report and auto disclose in 90 days is where there is a problem. Like in this case maybe there’s one download a year of the thing that was broken from the 1990s, but the Google disclosure makes it look like FFmpeg is broke and they have to deal with that fallout if they don’t fix it immediately. And also in this case if FFmpeg is integral to Youtube, they should give back and help with these things.

As far as the Twitter fight, I thought “talk is cheap, send patches” was pretty funny.

Gators1

Weird, in java: 1 == 1 is true and 1000 == 1000 is false. Seems like a communist language or something Tigger would like.

Hog

Gemini gave me a long boring explanation of why that might happen with autoboxing and said you should do the comparison this way:

Integer.valueOf(1000) == Integer.valueOf(1000)

I think I must have spent all of 24 hours learning Java decades ago before thinking, nah this is bullshit (too much boilerplate) and never touching it again. Stuff like the above seems to confirm it.

Gators1

This works too:

Integer a = 1000;
Integer b = 1000;
System.out.println(a.equals(b)); // true

And yeah, I would have quit as well had I even tried to learn it. That’s stupid.

Kilemall

Code snobs!

Gators1

And Fortran boy shows up, the cave drawings of coding.

oyaji

@Gators1 said in The OFFICIAL programming thread:

And Fortran boy shows up, the cave drawings of coding.

Bah. We should have stopped at FORTRAN and Pascal. They did everything an engineer needed.

Kilemall

@oyaji said in The OFFICIAL programming thread:

@Gators1 said in The OFFICIAL programming thread:

And Fortran boy shows up, the cave drawings of coding.

Bah. We should have stopped at FORTRAN and Pascal. They did everything an engineer needed.

Dads sims were Fortran and M204.

Hog

I like Rust but it can pretty quickly get very unergonomic depending on the complexity of what you are trying to do (and async in Rust is a clusterfuck IMO) so I generally prefer to code in something else. That said, these claims are pretty dramatic:

We adopted Rust for its security and are seeing a 1000x reduction in memory safety vulnerability density compared to Android’s C and C++ code. But the biggest surprise was Rust’s impact on software delivery. With Rust changes having a 4x lower rollback rate and spending 25% less time in code review, the safer path is now also the faster one.

I drilled Gemini on the 1000x reduction claim and Gemini calculated it to be closer to 5000x.

Rust in Android: move fast and fix things

Posted by Jeff Vander Stoep, Android Last year, we wrote about why a memory safety strategy that focuses on vulnerability prevention in ...