jd_3d
-
Meta has not given up on open-source
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 73 comments
-
I created a single-prompt benchmark (with 5-questions) that anyone can use to easily evaluate LLMs. Mistral-Next somehow vastly outperformed all others. Prompt and more details in the post.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 41 comments
-
Meta released a paper last month that seems to have gone under the radar. ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization. This is a better solution than BitNet and means if Meta wanted (for 10% extra compute) they could give us extremely performant 2-bit models.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 62 comments
-
What's the best free/open-source memory bandwidth benchmarking software?
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 1 comments
-
New open weight 1 trillion param total / 69B active MOE model released by YuanLab (Yuan3.0 Ultra)
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 0 comments
-
AIME 2026 Results are out and both closed and open models score above 90%. DeepSeek V3.2 only costs $0.09 to run the entire test.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 50 comments
-
GLM-5 vs Opus 4.6
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 17 comments
-
RAM Memory Bandwidth measurement numbers (for both Intel and AMD with instructions on how to measure your system)
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 60 comments
-
1 year later and people are still speedrunning NanoGPT. Last time this was posted the WR was 8.2 min. Its now 127.7 sec.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 24 comments
-
New challenging benchmark called FrontierMath was just announced where all problems are new and unpublished. Top scoring LLM gets 2%.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 273 comments
-
That jump in ARC-AGI-2 score from Gemini 3
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 6 comments
-
Interesting to finally have some real param numbers on these bigger closed-source models (Grok). I listed a few other big models for reference. See source in text
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Simple Bench (from AI Explained YouTuber) really matches my real-world experience with LLMs
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 228 comments
-
Meta's Byte Latent Transformer (BLT) paper looks like the real-deal. Outperforming tokenization models even up to their tested 8B param model size. 2025 may be the year we say goodbye to tokenization.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 190 comments
-
New GPU startup Bolt Graphics detailed their upcoming GPUs. The Bolt Zeus 4c26-256 looks like it could be really good for LLMs. 256GB @ 1.45TB/s
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 139 comments
-
Meta on track to be first lab with a 1GW supercluster
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 91 comments
-
Meta on track to be the first with a 1GW supercluster
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Did Mark just casually drop that they have a 100,000+ GPU datacenter for llama4 training?
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 174 comments
-
Implementing Reflexion into LLaMA/Alpaca would be an really interesting project
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 5 comments
-
Does Google not understand that DeepSeek R1 was trained in FP8?
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 112 comments
-
University of Hong Kong releases Dream 7B (Diffusion reasoning model). Highest performing open-source diffusion model to date. You can adjust the number of diffusion timesteps for speed vs accuracy
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 172 comments
-
NoLiMa: Long-Context Evaluation Beyond Literal Matching - Finally a good benchmark that shows just how bad LLM performance is at long context. Massive drop at just 32k context for all models.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 111 comments
-
SOLO Bench - A new type of LLM benchmark I developed to address the shortcomings of many existing benchmarks
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 129 comments
-
Anole - First multimodal LLM with Interleaved Text-Image Generation
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 85 comments
-
A new Microsoft paper lists sizes for most of the closed models
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 152 comments
-
LlamaCon is less than a week away. Anyone want to put down some concrete predictions?
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Inspired by the spinning heptagon test I created the forest fire simulation test (prompt in comments)
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 45 comments
-
With no update in 4 months, livebench was getting saturated and benchmaxxed, so I'm really looking forward to this one.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 2 comments
-
DeepSeek does not need 5 hours to generate $1 worth of tokens. Due to batching, they can get that in about 1 minute
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 28 comments
-
New LiveBench results just released. Sonnet 3.7 reasoning now tops the charts and Sonnet 3.7 is also top non-reasoning model
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 59 comments
-
Elon's bid for OpenAI is about making the for-profit transition as painful as possible for Altman, not about actually purchasing it (explanation in comments).
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 283 comments
-
What do you think of the rabbit r1 and its Large Action Model (LAM)?
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 47 comments
-
Over-Tokenized Transformer - New paper shows massively increasing the input vocabulary (100x larger or more) of a dense LLM significantly enhances model performance for the same training cost
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 49 comments
-
Mistral Small 3 one-shotting Unsloth's Flappy Bird coding test in 1 min (vs 3hrs for DeepSeek R1 using NVME drive)
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 74 comments
-
Gemini 2.0 Flash beating Claude Sonnet 3.5 on SWE-Bench was not on my bingo card
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 169 comments
-
Mistral Small 3 one-shotting Unsloth's Flappy Bird coding test in 1 min (vs 3hrs for DeepSeek R1 using NVME drive)
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 0 comments
-
The first performant open-source byte-level model without tokenization has been released. EvaByte is a 6.5B param model that also has multibyte prediction for faster inference (vs similar sized tokenized models)
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 81 comments
-
Aider has released a new much harder code editing benchmark since their previous one was saturated. The Polyglot benchmark now tests on 6 different languages (C++, Go, Java, JavaScript, Python and Rust).
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 43 comments
-
Chinese AI startup StepFun up near the top on livebench with their new 1 trillion param MOE model
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 84 comments
-
Meta releases the Apollo family of Large Multimodal Models. The 7B is SOTA and can comprehend a 1 hour long video. You can run this locally.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 159 comments
-
Kudos to the LMArena folks, the new WebDev arena is showing great separation of ELO scores and shows Claude 3.5 Sonnets dominance in this domain
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 34 comments
-
Livebench updates - Gemini 1206 with one of the biggest score jumps I've seen recently and Llama 3.3 70b nearly on par with GPT-4o.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 77 comments
-
A team from MIT built a model that scores 61.9% on ARC-AGI-PUB using an 8B LLM plus Test-Time-Training (TTT). Previous record was 42%.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 65 comments
-
Qwen 2.5 casually slotting above GPT-4o and o1-preview on Livebench coding category
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 105 comments
-
Meta needs to create a skunkworks AI team that can work on projects with quick turnaround times to stay relevant in-between Llama releases. They can use the old 25k H100 clusters while Llama 4 trains on the 100k+ H100 cluster. Thoughts?
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 5 comments
-
If an RTX 4090/5090 with 48GB of VRAM were introduced how much extra (over a standard 24GB version) would you pay for it?
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 17 comments
-
Gemini 1.5 Pro 002 putting up some impressive benchmark numbers
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 57 comments
-
Is this marketing BS, or how did NVIDIA speed up inference by 15x on Blackwell (and will any of that trickle down to RTX 5090)? VRAM bandwidth is only 2.5x faster
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 71 comments
-
LiveBench results now with o1-preview slotting in 2nd place. Apparently o1-mini is the reasoning king.
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 18 comments
-
Updated benchmarks from Artificial Analysis using Reflection Llama 3.1 70B. Long post with good insight into the gains
Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 144 comments