alexp702
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 · Hugging Face
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 138 comments
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 · Hugging Face
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 138 comments
alexp702@reddit
How much VRAM needed for Qwen 3.6 27B Q8 with 262K context?
Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 129 comments
alexp702@reddit
NVIDIA GB300 Grace Blackwell Ultra pricetags
Posted by X-N2O@reddit | LocalLLaMA | View on Reddit | 126 comments
alexp702@reddit
I have 2x PC's. One with a 5090 and one with a 4080. Is there an easy way to use both together networked?
Posted by F0UR_TWENTY@reddit | LocalLLaMA | View on Reddit | 34 comments
alexp702@reddit
Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA
Posted by Uiqueblhats@reddit | LocalLLaMA | View on Reddit | 16 comments
alexp702@reddit
For users have have both 6000 PRO MaxQ and Workstation Edition (or Server Edition), how much slower is the MaxQ vs the WS/SV on compute? (Prompt processing, Diffusion, etc)
Posted by panchovix@reddit | LocalLLaMA | View on Reddit | 32 comments
alexp702@reddit
Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA
Posted by Uiqueblhats@reddit | LocalLLaMA | View on Reddit | 16 comments
alexp702@reddit
M5 vs DGX Spark vs Strix Halo vs RTX 6000
Posted by Signal_Ad657@reddit | LocalLLaMA | View on Reddit | 261 comments
alexp702@reddit
The RTX 5000 PRO (48GB) arrived and it is better than I expected.
Posted by Valuable-Run2129@reddit | LocalLLaMA | View on Reddit | 212 comments
alexp702@reddit
Bad news: Apple drops high-memory Mac Studio configs
Posted by jzn21@reddit | LocalLLaMA | View on Reddit | 137 comments
alexp702@reddit
Bad news: Apple drops high-memory Mac Studio configs
Posted by jzn21@reddit | LocalLLaMA | View on Reddit | 137 comments
alexp702@reddit
Sell my 3090FE for a 5060ti 16gb? Does it make sense for energy consumption?
Posted by ThrowRA_194_M@reddit | buildapc | View on Reddit | 18 comments
alexp702@reddit
Forgive my ignorance but how is a 27B model better than 397B?
Posted by No_Conversation9561@reddit | LocalLLaMA | View on Reddit | 286 comments
alexp702@reddit
Given how good Qwen become, is it time to grab a 128gb m5 max?
Posted by Rabus@reddit | LocalLLaMA | View on Reddit | 151 comments
alexp702@reddit
What starts to become possible with two 3090s that wasn't with just one?
Posted by GotHereLateNameTaken@reddit | LocalLLaMA | View on Reddit | 83 comments
alexp702@reddit
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 96 comments
alexp702@reddit (OP)
RDMA Mac Studio cluster - performance questions beyond generation throughput
Posted by quietsubstrate@reddit | LocalLLaMA | View on Reddit | 3 comments
alexp702@reddit
M5 Max 128G Performance tests. I just got my new toy, and here's what it can do.
Posted by affenhoden@reddit | LocalLLaMA | View on Reddit | 90 comments
alexp702@reddit
M5 Max 128G Performance tests. I just got my new toy, and here's what it can do.
Posted by affenhoden@reddit | LocalLLaMA | View on Reddit | 90 comments
alexp702@reddit
Qwen3.5 MLX vs GGUF Performance on Mac Studio M3 Ultra 512GB
Posted by BitXorBit@reddit | LocalLLaMA | View on Reddit | 57 comments
alexp702@reddit
Whats up with MLX?
Posted by gyzerok@reddit | LocalLLaMA | View on Reddit | 54 comments
alexp702@reddit
A few early (and somewhat vague) LLM benchmark comparisons between the M5 Max Macbook Pro and other laptops - Hardware Canucks
Posted by themixtergames@reddit | LocalLLaMA | View on Reddit | 60 comments
alexp702@reddit
A few early (and somewhat vague) LLM benchmark comparisons between the M5 Max Macbook Pro and other laptops - Hardware Canucks
Posted by themixtergames@reddit | LocalLLaMA | View on Reddit | 60 comments
alexp702@reddit
Qwen 3.5 VS Qwen 3
Posted by SlowFail2433@reddit | LocalLLaMA | View on Reddit | 18 comments
alexp702@reddit
Qwen 3.5 VS Qwen 3
Posted by SlowFail2433@reddit | LocalLLaMA | View on Reddit | 18 comments
alexp702@reddit
Which one are you waiting for more: 9B or 35B?
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 220 comments
alexp702@reddit
Post your hardware/software/model quant and measured performance of Kimi K2.5
Posted by fairydreaming@reddit | LocalLLaMA | View on Reddit | 47 comments
alexp702@reddit
Has anyone got GLM 4.7 flash to not be shit?
Posted by synth_mania@reddit | LocalLLaMA | View on Reddit | 130 comments
alexp702@reddit
Has anyone got GLM 4.7 flash to not be shit?
Posted by synth_mania@reddit | LocalLLaMA | View on Reddit | 130 comments
alexp702@reddit
Qwen3-Coder-480B on Mac Studio M3 Ultra 512gb
Posted by BitXorBit@reddit | LocalLLaMA | View on Reddit | 34 comments
alexp702@reddit
Qwen3-Coder-480B on Mac Studio M3 Ultra 512gb
Posted by BitXorBit@reddit | LocalLLaMA | View on Reddit | 34 comments
alexp702@reddit
Qwen3-Coder-480B on Mac Studio M3 Ultra 512gb
Posted by BitXorBit@reddit | LocalLLaMA | View on Reddit | 34 comments
alexp702@reddit
Mac Studio as an inference machine with low power draw?
Posted by aghanims-scepter@reddit | LocalLLaMA | View on Reddit | 41 comments
alexp702@reddit
Mac Studio as an inference machine with low power draw?
Posted by aghanims-scepter@reddit | LocalLLaMA | View on Reddit | 41 comments
alexp702@reddit
🧠 Inference seems to be splitting: cloud-scale vs local-first
Posted by Code-Forge-Temple@reddit | LocalLLaMA | View on Reddit | 9 comments
alexp702@reddit
Start of 2026 what’s the best open coding model?
Posted by alexp702@reddit | LocalLLaMA | View on Reddit | 57 comments