PrimeIntellect is actually awesome

Posted by Icy_Gas8807@reddit | LocalLLaMA | View on Reddit | 19 comments

I tested prime intellect 3: - Q4_K_L
- 71.82GB
- Uses Q8_0 for embed and output weights. Good quality, recommended.

Model seams intelligent enough for most of my daily tasks, will be using it along with gpt-oss-120B. This did give me a hope, if this trend continues and hoping to get great models like this at below 160B @fp4, inference possible in strix halo chips.

Also, now I want to connect it to web search. I know it is previously discussed: (https://github.com/mrkrsl/web-search-mcp) this seams to be the best option without jargon of adding api. Are there any better alternatives?

[-]

BrilliantEmotion4461@reddit

Have your read the book?

Metamorphosis of the Prime Intellect?

[-]

Icy_Gas8807@reddit (OP)

is it good? anything related to prime intellect project?

[-]

Klutzy-Snow8016@reddit

Apparently, the benchmark numbers were inflated and it actually scores basically the same as GLM-4.5 Air: https://x.com/rawsh0/status/1994781830227075159

[-]

Icy_Gas8807@reddit (OP)

Seams fine for me, will try same prompts with 4.5 air and update you with results.

[-]

Miserable-Dare5090@reddit

i am having chat template problems. anyone else?

[-]

Koalababies@reddit

I manually dropped in the template file with the updates from this PR

https://huggingface.co/PrimeIntellect/INTELLECT-3/discussions/2/files#d2h-526183

It's working well after this update.

[-]

chillahc@reddit

Here's the link rom the`refs/pr/2`-branch in a easier-to read version: https://huggingface.co/PrimeIntellect/INTELLECT-3/blob/refs%2Fpr%2F2/chat_template.jinja - Works for me, too 🤓✌️

[-]

Agreeable-Rest9162@reddit

Looks good. The benchmarks they published inspire confidence. If you need web search, try this MCP using searxng:

    "Search": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-searxng"
      ],
      "env": {
        "SEARXNG_URL": "https://search.noemaai.com/"
      }
    },

The instance is maintained by me (the dev behind Noema), and it allows JSON requests, which allow a Search MCP like this to work. I couldn't find another instance that allows this on Searx space, so I made my own, which also sustains the web search functionality in my Noema app. If you want to learn more about privacy with this instance you can go to:
https://noemaai.com/noema-search
and
https://noemaai.com/privacy

[-]

arousedsquirel@reddit

Can you explain the community how you build this instance? Is there a github repo available so people can look and build their own search private mcp? 🙏

[-]

Agreeable-Rest9162@reddit

Its run on an oracle server through docker. So you just build searxng using the docker instance and its available at the public URL. The MCP code being used is from here: https://github.com/ihor-sokoliuk/mcp-searxng.git

[-]

Icy_Gas8807@reddit (OP)

Great job, pretty detailed readme file as well.

So, the llm are connected to the web search via MCP, web searched queries are converted to markdown file, and it is sent back -> perfect 😍.

I’m just wondering, are the markdown files parsed without html overheads? He claims LLMs are actually not bad at reading parsed html files -> (https://x.com/rasbt/status/1943415645007495257?s=20) your thoughts on this?

[-]

Agreeable-Rest9162@reddit

Well, I want to make it clear that it isn't my repo; the part I'm contributing is just the instance being hosted on my server. However, yes, with the web_url_read tool, you are basically getting the page after stripping most HTML overhead. The server parses the HTML, extracts the main content, and returns clean markdown (headings, paragraphs, lists).

Rasbt is right that LLMs can handle reasonably clean HTML. Still, for a web search, markdown is usually better because it has far fewer wasted tokens, unless you specifically need to reason about tags, attributes, or forms.

[-]

Icy_Gas8807@reddit (OP)

Will definitely check it out, thanks!!

[-]

Wise-Comb8596@reddit

Is your plan to attach a 5060ti to the strix? I thought about that

[-]

Icy_Gas8807@reddit (OP)

I have a workstation as well, strix is bad at image gen or basically any of diffusion based tasks. I’m planning to connect this to dual 5060ti via ethernet cable, language based -> strix, anything else -> workstation.

[-]

doradus_novae@reddit

Working on getting this badboy on my gpus right now, you likey?

[-]

RiskyBizz216@reddit

yikes, 11.3 tok/s... so fast

is there a "slow" mode?

[-]

Front_Eagle739@reddit

Yeah i actually like it. Glm 4.5 air just wasnt up to scratch for my use cases so i was using glm 4.6 or 4.5 iq2_m. This one seems quite functional if not quite as smart as 4.6/4.5 but when i need to go to bigger contexts with a model that follows prompts now this is the one i will reach for

[-]

UndecidedLee@reddit

The screenshotted reply reminds me of old ChatGPT.