AI300

Omar Khattab / MIT CSAIL Asst professor
1891.
for all i can tell, what makes reasoning models work so well is that, at sufficient pretrain/RL scale, every relevant next-reasoning step can actively be “softly contaminated” and composed up›
LLM score 92 · 5 months ago
alphaXiv
1892.
Discrete Diffusion just got a huge upgrade! with "Categorical Flow Maps", it is now much faster + capable of test time inference.›
LLM score 92 · 5 months ago
Lucas Beyer / Meta Researcher
1893.
Back in the day when we figured out that large-scale pretraining seems to be a great thing, I then very quickly decided to spend a significant amount of time and effort towards removing all our downstream transfer/eval task's images from our pretraining data.›
LLM score 92 · 5 months ago
Andrej Karpathy / AI researcher
1894.
I think it must be a very interesting time to be in programming languages and formal methods because LLMs change the whole constraints landscape of software completely.›
LLM score 92 · 5 months ago
Sander Dieleman / DeepMind Research Scientist
1895.
Training continuous diffusion models for discrete data with cross-entropy is neat (😎), but training flow maps with cross-entropy is even neater! Nice find! https://t.co/ZV1TDTykzY
LLM score 92 · 5 months ago
Omar Khattab / MIT CSAIL Asst professor
1896.
Though bash is a completely valid REPL, the amount of time coding agents lose during experimentation because they iterate on scripts instead of a Jupyter-like in-memory REPL is basically dumb.›
LLM score 92 · 5 months ago
Omar Khattab / MIT CSAIL Asst professor
1897.
Not saying this is a hard one, but: - ChatGPT 5.2 Thinking indeed struggled for me›
LLM score 82 · 5 months ago
Omar Khattab / MIT CSAIL Asst professor
1898.
ok looks like we’re past peak monthly downloads for this ancient 2021 ColBERTv2 model checkpoint›
LLM score 92 · 5 months ago
Omar Khattab / MIT CSAIL Asst professor
1899.
Not saying this is a hard one, but: - ChatGPT 5.2 Thinking indeed struggled for me›
LLM score 82 · 5 months ago
Sergey Levine / Physical Intelligence Cofounder
1900.
If we train VLAs to respond to diverse multimodal prompts, then we can steer them better: [grasp the carrot]/[move to x,y,z]/[put the carrot on the plate].›
LLM score 92 · 5 months ago
Ben Burtenshaw / Hugging Face Researcher
1901.
so 'harness' has increased a lot in the ai vocab over the last few months.›
LLM score 85 · 5 months ago
Cameron Wolfe / Researcher at Netflix
1902.
Writeup on Rubric-Based RL is out now: https://t.co/io6zEeeEAZ›
LLM score 92 · 5 months ago
Thomas Wolf / Hugging Face Cofounder
1903.
Shifting structures in a software world dominated by AI.›
LLM score 85 · 5 months ago
Ben Burtenshaw / Hugging Face Researcher
1904.
your new Qwen 3.5 workhorse is here.›
LLM score 12 · 5 months ago
Merve Noyan / Hugging Face ML Engineer
1905.
Qwen3.5 @Alibaba_Qwen is out! > largest model (A17B/397B) in series, context window of 262k tokens›
LLM score 25 · 5 months ago
Lucas Beyer / Meta Researcher
1906.
After initially being hyped about the speed, I have to say that 5.3-codex-spark, even on xhigh, is actually quite a bit dumber than 5.3-codex, to the point that I'm back to using the latter most of the time.
LLM score 92 · 5 months ago
Lewis Tunstall / Hugging Face Researcher
1907.
A few thoughts after reading the @OpenAI paper on scattering amplitudes over the weekend:›
LLM score 85 · 5 months ago
Merve Noyan / Hugging Face ML Engineer
1908.
upcoming months we (me + @ariG23498) will focus on following›
LLM score 92 · 5 months ago
Igor Babuschkin / Cofounder of xAI
1909.
What’s the best open alternative to OpenClaw right now? Doesn’t make sense to put all your data into it if it’s owned by OpenAI.›
LLM score 75 · 5 months ago
Omar Khattab / MIT CSAIL Asst professor
1910.
not sure what's going on but OAI folks are going all in on developing new scaffolds/harnesses/programs in the last 24 hours›
LLM score 82 · 5 months ago
alphaXiv
1911.
The first Transformer -> SSM hybrid distillation that proves you only need ~2% of attention heads to keep in-context retrieval!›
LLM score 92 · 5 months ago
Damek Davis / Assoc. Professor Wharton Stats
1912.
This is a really cool project. My first thought was obviously to ask microagent to make picoagent.›
LLM score 92 · 5 months ago
Cameron Wolfe / Researcher at Netflix
1913.
I’m publishing a long-form overview of using rubrics for RL tomorrow.›
LLM score 92 · 5 months ago
Damek Davis / Assoc. Professor Wharton Stats
1914.
The second class is a crash course on stochastic optimization in machine learning.›
LLM score 85 · 5 months ago
Thang Luong / DeepMind Principal Scientist
1915.
Yes, we provided 3 things for AI-assisted math: * Human-AI interaction (HAI) card (photo), inspired by model cards›
LLM score 92 · 5 months ago
Ben Burtenshaw / Hugging Face Researcher
1916.
it's good to know Dario's upper bound for 2026: - <$1tn in compute›
LLM score 82 · 5 months ago
Lewis Tunstall / Hugging Face Researcher
1917.
We trained a tiny 4B model to reason for millions of tokens through IMO-level problems.›
LLM score 92 · 5 months ago
Lucas Beyer / Meta Researcher
1918.
I've seen some people compliment this article being well/clearly written.›
LLM score 82 · 5 months ago
Christian Szegedy / ex xAI Cofounder
1919.
There must be huge low-hanging fruit in figuring out how to train metacognition incrementally.›
LLM score 85 · 6 months ago
Christian Szegedy / ex xAI Cofounder
1920.
In addition to adversarial attacks, the increasing use of LLMs for personal communication and AI-based image/video editing tools makes it even harder to detect whether a communication is authentic or not.›
LLM score 92 · 6 months ago