PrimeIntellect is actually awesome
Posted by Icy_Gas8807@reddit | LocalLLaMA | View on Reddit | 19 comments
I tested prime intellect 3:
- Q4_K_L
- 71.82GB
- Uses Q8_0 for embed and output weights. Good quality, recommended.
Model seams intelligent enough for most of my daily tasks, will be using it along with gpt-oss-120B. This did give me a hope, if this trend continues and hoping to get great models like this at below 160B @fp4, inference possible in strix halo chips.
Also, now I want to connect it to web search. I know it is previously discussed: (https://github.com/mrkrsl/web-search-mcp) this seams to be the best option without jargon of adding api. Are there any better alternatives?
BrilliantEmotion4461@reddit
Have your read the book?
Metamorphosis of the Prime Intellect?
Icy_Gas8807@reddit (OP)
is it good? anything related to prime intellect project?
Klutzy-Snow8016@reddit
Apparently, the benchmark numbers were inflated and it actually scores basically the same as GLM-4.5 Air: https://x.com/rawsh0/status/1994781830227075159
Icy_Gas8807@reddit (OP)
Seams fine for me, will try same prompts with 4.5 air and update you with results.
Miserable-Dare5090@reddit
i am having chat template problems. anyone else?
Koalababies@reddit
I manually dropped in the template file with the updates from this PR
https://huggingface.co/PrimeIntellect/INTELLECT-3/discussions/2/files#d2h-526183
It's working well after this update.
chillahc@reddit
Here's the link rom the`refs/pr/2`-branch in a easier-to read version: https://huggingface.co/PrimeIntellect/INTELLECT-3/blob/refs%2Fpr%2F2/chat_template.jinja - Works for me, too š¤āļø
Agreeable-Rest9162@reddit
Looks good. The benchmarks they published inspire confidence. If you need web search, try this MCP using searxng:
The instance is maintained by me (the dev behind Noema), and it allows JSON requests, which allow a Search MCP like this to work. I couldn't find another instance that allows this on Searx space, so I made my own, which also sustains the web search functionality in my Noema app. If you want to learn more about privacy with this instance you can go to:
https://noemaai.com/noema-search
and
https://noemaai.com/privacy
arousedsquirel@reddit
Can you explain the community how you build this instance? Is there a github repo available so people can look and build their own search private mcp? š
Agreeable-Rest9162@reddit
Its run on an oracle server through docker. So you just build searxng using the docker instance and its available at the public URL. The MCP code being used is from here: https://github.com/ihor-sokoliuk/mcp-searxng.git
Icy_Gas8807@reddit (OP)
Great job, pretty detailed readme file as well.
So, the llm are connected to the web search via MCP, web searched queries are converted to markdown file, and it is sent back -> perfect š.
Iām just wondering, are the markdown files parsed without html overheads? He claims LLMs are actually not bad at reading parsed html files -> (https://x.com/rasbt/status/1943415645007495257?s=20) your thoughts on this?
Agreeable-Rest9162@reddit
Well, I want to make it clear that it isn't my repo; the part I'm contributing is just the instance being hosted on my server. However, yes, with the web_url_read tool, you are basically getting the page after stripping most HTML overhead. The server parses the HTML, extracts the main content, and returns clean markdown (headings, paragraphs, lists).
Rasbt is right that LLMs can handle reasonably clean HTML. Still, for a web search, markdown is usually better because it has far fewer wasted tokens, unless you specifically need to reason about tags, attributes, or forms.
Icy_Gas8807@reddit (OP)
Will definitely check it out, thanks!!
Wise-Comb8596@reddit
Is your plan to attach a 5060ti to the strix? I thought about that
Icy_Gas8807@reddit (OP)
I have a workstation as well, strix is bad at image gen or basically any of diffusion based tasks. Iām planning to connect this to dual 5060ti via ethernet cable, language based -> strix, anything else -> workstation.
doradus_novae@reddit
Working on getting this badboy on my gpus right now, you likey?
RiskyBizz216@reddit
yikes, 11.3 tok/s... so fast
is there a "slow" mode?
/s
Front_Eagle739@reddit
Yeah i actually like it. Glm 4.5 air just wasnt up to scratch for my use cases so i was using glm 4.6 or 4.5 iq2_m. This one seems quite functional if not quite as smart as 4.6/4.5 but when i need to go to bigger contexts with a model that follows prompts now this is the one i will reach for
UndecidedLee@reddit
The screenshotted reply reminds me of old ChatGPT.