Jan-nano, a 4B model that can outperform 671B on MCP
Posted by Kooky-Somewhere-2883@reddit | LocalLLaMA | View on Reddit | 490 comments
Hi everyone it's me from Menlo Research again,
Today, I’d like to introduce our latest model: Jan-nano - a model fine-tuned with DAPO on Qwen3-4B. Jan-nano comes with some unique capabilities:
- It can perform deep research (with the right prompting)
- It picks up relevant information effectively from search results
- It uses tools efficiently
Our original goal was to build a super small model that excels at using search tools to extract high-quality information. To evaluate this, we chose SimpleQA - a relatively straightforward benchmark to test whether the model can find and extract the right answers.
Again, Jan-nano only outperforms Deepseek-671B on this metric, using an agentic and tool-usage-based approach. We are fully aware that a 4B model has its limitations, but it's always interesting to see how far you can push it. Jan-nano can serve as your self-hosted Perplexity alternative on a budget. (We're aiming to improve its performance to 85%, or even close to 90%).
We will be releasing technical report very soon, stay tuned!
You can find the model at:
https://huggingface.co/Menlo/Jan-nano
We also have gguf at:
https://huggingface.co/Menlo/Jan-nano-gguf
I saw some users have technical challenges on prompt template of the gguf model, please raise it on the issues we will fix one by one. However at the moment the model can run well in Jan app and llama.server.
Benchmark
The evaluation was done using agentic setup, which let the model to freely choose tools to use and generate the answer instead of handheld approach of workflow based deep-research repo that you come across online. So basically it's just input question, then model call tool and generate the answer, like you use MCP in the chat app.
Result:
SimpleQA:
- OpenAI o1: 42.6
- Grok 3: 44.6
- 03: 49.4
- Claude-3.7-Sonnet: 50.0
- Gemini-2.5 pro: 52.9
- baseline-with-MCP: 59.2
- ChatGPT-4.5: 62.5
- deepseek-671B-with-MCP: 78.2 (we benchmark using openrouter)
- jan-nano-v0.4-with-MCP: 80.7
jeffutter@reddit
Wow, I'm real excited for this model. I've been testing the unsloth quant `Jan-nano-128k-GGUF:Q8_4K_XL` with `ollama`. It seems to get stuck in circular thinking pretty badly, especially when multiple (even very simple) tool calls are involved.
It'll call one tool ok (getting the current date) and then output reams and reams of thinking second guessing how it should use that tool call and what it should do next.
Any suggestions on what might cause this or how to remedy it?
Kooky-Somewhere-2883@reddit (OP)
ollama seems to be reported to have some yarn scaling issue.
My recommendation is for you to follow the llama.server and vllm config we gave in huggingface Readme for now.
jeffutter@reddit
Oh nice! Thanks for the tip. I got it setup with vlllm - now I just need a Frontend that supports mcps. I’ve been using oterm, but that’s ollama only (doesn’t work with lightllm proxy)
Kooky-Somewhere-2883@reddit (OP)
you can use jan.ai frontend or cherry ai both should be fine
jeffutter@reddit
I'm currently unable to use the jan.ai frontend due to this issue: https://github.com/menloresearch/jan/issues/5537
I did manage to get it setup with `vllm` and `open-webui`. The tool calling is still not great. I'm not sure if this is a `jan` problem, or a `mcpo`/`open-webui` problem though.
I'm asking a small query that should use two tools `todays_date` to get the current date and `get_tasks` that takes `completed=Bool` and `due_before=Date`.
It keeps trying to call `get_tasks(completed=Date)` and it calls it over and over and over again in one response, never trying anything different 🤷
Usual-Instruction445@reddit
This looks really cool! will there be a version compatible with M series chips in the future?
Kooky-Somewhere-2883@reddit (OP)
Hm... i'm a bit confused do you mean MLX?
we already have GGUF which can run on the M chip right now? https://huggingface.co/Menlo/Jan-nano-gguf
Usual-Instruction445@reddit
I tried the lm studio download on the website and it wont work. I assumed it was my hardware.
Kooky-Somewhere-2883@reddit (OP)
NOTICE
We recommend using Q8 GGUF.
If you are "really tight" on VRAM we recommend iQ4_XS GGUF
These are tested! other quants like Q4_0 and Q4_K_M have significantly degraded performance.
-p-e-w-@reddit
Q4_K_M has “significantly” degraded performance compared to IQ4_XS?
Are you sure you tested this correctly? What is the criterion you used for the test? KL divergence compared to FP?
Kooky-Somewhere-2883@reddit (OP)
not extensively for some reason it almost didnt work well at all we just test qualitatively for now.
it gave up on researching and gave bad quality response like literally you can distinguish.
safest bet is Q8 for quality
EmployeeLogical5051@reddit
what about Q6?
Kooky-Somewhere-2883@reddit (OP)
don't know haven't tested
EmployeeLogical5051@reddit
ohh alright, i will test myself to see the response quality.
animax00@reddit
If you have tested, how was the quality…?
EmployeeLogical5051@reddit
Unfortunately i didnt test Q6, just went with Q8.
mk8933@reddit
Does Jan Ai have attachment support?
Psychological_Cry920@reddit
Hey, Jan attachment support still in WIP.
mk8933@reddit
Thanks
Papabear3339@reddit
Quants can degrade tiny models a lot harder then big ones.
There are special fine tuning methods to try and get around that (like awq), but it sounds like that wasn't used here.
-p-e-w-@reddit
I know that, but IQ4_XS is smaller than Q4_K_M, by about 15%, and is generally considered a worse-quality quant.
lazarus102@reddit
'really tight' is relative to the individual and their finances. It could be more helpful if you were to make a short list of which cards are bottom requirements for which quant. Though it may need context length and other settings factored in, for people to set realistic expectations.
Kooky-Somewhere-2883@reddit (OP)
Some of the mcp like browser-use can easily eat up to 32k tokens with just a few step that come across big site, but anyways i will share my own home setup.
This takes up to 9-10GB of VRAM.
lazarus102@reddit
"ngl" is a good setting. cuz you always want an AI that's not gonna lie.
Kooky-Somewhere-2883@reddit (OP)
ngl
Ok_Bug1610@reddit
Any thoughts on using Unsloth Dynamic 2.0 GGUFs Quantization? to make a UD-Q2_K_XL, UD-Q3_K_XL, and UD-Q4_K_XL variant? They offer 2-4x performance gains in my testing with very little accuracy loss, not to mention 5-10x using vLLM due to hardware underutilization (can squeeze a lot more parallel processes out of a quantized model).
Is anyone here aware of an android app that can run custom quantized models? I would love to test this out on pure mobile and also an RPi 4. Though the model isn't particularly capable, I can output near \~624 tok/sec on a single RTX 3070 8GB using Qwen 0.6B UD-Q2_K_XL. I'm going to buy a newer graphics card, but I don't have access to particularly top-end hardware, and I tested an Intel Arc A770 using Volkan drivers and IPEX-LLM but I can't even reach like a third of the 3070 results.
Kooky-Somewhere-2883@reddit (OP)
i think they have quant https://huggingface.co/unsloth/Jan-nano-GGUF
Ok_Bug1610@reddit
Sorry, the UD (Unsloth Dynamic) model's are not the same thing. I'd suggest reading the Unsloth's paper above and Run DeepSeek-R1 Dynamic 1.58-bit. Basically, though the models are roughly the size of their "dense" quantized counterparts, it's actually an average bit size. Their dynamic quantization method quantizes the "critical" parts of the model at higher precision, while giving more aggressive compression to the less critical ones (and you can apply this more or less agressively, depending on the target: speed, size, accuracy, etc.). This means they use 1-8 bits quantization (dynamically) with an "average" of X bits (\~2.5bits: UD-Q2_K_XL, \~3.5bits: UD-Q3_K_XL, and \~4.5bits: UD-Q4_K_XL). This yields better performance and accuracy over standard "dense" quantization, albeit slightly larger. It's worth a read and playing around with!
yoracale@reddit
The Jan GGUFs are Unsloth dynamic 2.0! So we do do dynamic quantization for the quants
Ok_Bug1610@reddit
Sorry, that wasn't immediately clear but awesome to hear, thanks. I will be testing them out shortly. It's nice to hear the method is becoming a standard. Awesome work, and thank you so much!!
az226@reddit
What is DAPO? Can you share the code?
Kooky-Somewhere-2883@reddit (OP)
https://arxiv.org/abs/2503.14476
fullouterjoin@reddit
Decoupled clip and Dynamic sAmpling Policy Optimization (DAPO) [sic] algorithm, and fully open-source a state-of-the-art large-scale RL system.
DAPO takes the cake as the most twisted brand-initialism I have seen in ML. 🎂
az226@reddit
Thanks!
General_Cornelius@reddit
Been 5 days, how are the vibes, any one had a chance to use it?
Kooky-Somewhere-2883@reddit (OP)
I would say it's a mixed bag, but mostly our fault.
We have recently fixed the prompt template inside the GGUF so that it has better behavior (aka use tool more) that matches the expected UX much better, you should try.
General_Cornelius@reddit
Any good uses you have noticed? Looking to enrich information with search
Kooky-Somewhere-2883@reddit (OP)
of course its good to use please go ahead and run with mcp.
the model is good i think its just setting ip challenges
Karim_acing_it@reddit
Amazing! Downloaded Jan app (v0.5.17) for Windows, downloaded the Q8 by inserting the HF-GGUF link (surprisingly Jan app doesn't advertise Jan-nano at all, you can't even find it when searching for it using its name) and tried to replicate your prompt, albeit slightly different topic.
Thinking started, but no "deep research". Was my prompt just wrong or do I need to download/do anything else? Couldn't find anything in the Jan settings either. Sorry for the noob question... and congrats to your achievement!
Kooky-Somewhere-2883@reddit (OP)
To make the model go deeper you should ask it to give a scientific report etc…. try a bit harder.
Karim_acing_it@reddit
Thanks, I just ask because other may read this as well: I even replicated your "Breaking news today about finance, highlight shocking ones" prompt and again, I just get it to start thinking.
I left everything at default settings as described above. How are we able to replicate your results? I am sure it's something wrong on my side, but I can't be the only one-
Kooky-Somewhere-2883@reddit (OP)
hi just got time to get back to you
we realized some of you guys may have issues due to flexible and different settings, so i have decided to bake some enhancement system prompt into the model to make sure it tool call a bit more you can retry download the gguf now you will see the difference
Karim_acing_it@reddit
Amazing to hear, but I didn't have mcp enabled and hadn't setup your entire installation guide. Instead on relying on docker etc, it would be absolutely amazing if all the mcp hosting could happen within Jan, I think this in itself would be the biggest gain over all other GUIs out there and make Jan-Nano accessible to a muuuch larger audience. Thanks!
ley_haluwa@reddit
Off topic: What app/frontend is in the video? Looks minimalistic and polished
Psychological_Cry920@reddit
Hi u/ley_haluwa, this is Louis, a contributor to the Jan App featured in the video.
The app is still in its Beta phase, but we’d really appreciate it if you could give it a try and share your feedback.
Here are the links to our Beta build, we’d love for you to take it for a spin!
Windows: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_x64-setup.exe
macOS Universal: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_universal.dmg
Linux Deb: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.deb
Linux AppImage: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.AppImag
Let me know!
dionisioalcaraz@reddit
I downloaded the AppImage but when I load the model (Jan-nano-UD-Q4_K_XL) from the GUI it takes a long time saying 'loading model...', after some minutes the name of the model appears in the prompt just like the video, but when I ask a question it starts again 'loading models...' without responding. The model works ok with llama.cpp. Any clue?
Psychological_Cry920@reddit
Hey, can you help me grab log files in the app data folder. I will take a look then.
dionisioalcaraz@reddit
Thanks for your answer! I just needed to upgrade, now it loads the model fine. But now I have other issues, here is the log https://matcamp.neocities.org/app.log.txt
Psychological_Cry920@reddit
Ah you will need NodeJS installed for npx MCP like browser mcp.
dionisioalcaraz@reddit
Thanks! Sorry to bother you again, I installed NodeJS from Debian repository, still can't activate browser mcp, now I have this message in the log:
+node: /tmp/.mount_Jan-be9MfriZ/usr/lib/libcrypto.so.3: version `OPENSSL_3.4.0' not found (required by /usr/lib/x86_64-linux-gnu/libnode.so.115)
I have openssl 3.5.0 installed, is that the problem?
Deleting all files created by Jan and running again the appimage shows this in the log at launch:
[2025-06-19][18:57:03][app_lib::core::setup][ERROR] Failed to run mcp commands: Failed to read config file: No such file or directory (os error 2)
Psychological_Cry920@reddit
Hey, can you help me check in the app data folder to see if the mcp_config file is there?
dionisioalcaraz@reddit
Yes, here it is:
{ "mcpServers": { "browsermcp": { "command": "npx", "args": [ "@browsermcp/mcp" ], "env": {}, "active": false }, "fetch": { "command": "uvx", "args": [ "mcp-server-fetch" ], "env": {}, "active": false }, "serper": { "command": "npx", "args": [ "-y", "serper-search-scrape-mcp-server" ], "env": { "SERPER_API_KEY": "7683477c30d5fed0e0279cb33e0475c1b4f4a9ab" } }, "filesystem": { "command": "npx", "args": [ "-y", "@modelcontextprotocol/server-filesystem", "/path/to/other/allowed/dir" ], "env": {}, "active": false }, "sequential-thinking": { "command": "npx", "args": [ "-y", "@modelcontextprotocol/server-sequential-thinking" ], "env": {}, "active": false } }}
When I try to toggle the switch a pop-up window says: "Failed to start MCP server fetch: connection closed: initialize response. Please check the parameters according to the tutorial"
and in the app.log : "node: /tmp/.mount_Jan-bePdJ9Wo/usr/lib/libcrypto.so.3: version `OPENSSL_3.4.0' not found (required by /lib/x86_64-linux-gnu/libnode.so.115)"
dionisioalcaraz@reddit
Thanks for your answer! I just needed to upgrade, now it loads the model fine. But it can't connect to the internet, it shows "It seems there is an issue connecting to the browser extension. Please ensure that the Browser MCP extension is installed and connected properly....". Using the toogle switches in the MCP server settings I allowed MCP permissions and browsermcp, but still doesn't works.
Kooky-Somewhere-2883@reddit (OP)
oh hi colleague, just about to reply him, yes we recorded the demo using amazing Jan, it's in beta but the look and use of Jan is amazing.
MeYaj1111@reddit
im kind of a dumb ass with this stuff. I installed the Jan beta linked above and I see the model "Jan-nano-Gguf" in the list available to download so I installed that but even using the identical prompt from your video it will not do any deep research stuff, just spits out an answer immediately with no searching or anything. Can you point me in the right direction?
Kooky-Somewhere-2883@reddit (OP)
you need to instal serper MCP
oxygen_addiction@reddit
How?
Kooky-Somewhere-2883@reddit (OP)
https://menloresearch.github.io/deep-research/
oxygen_addiction@reddit
Thanks.
Psychological_Cry920@reddit
Thank you u/Kooky-Somewhere-2883, I’m really glad you featured it!!!
Kooky-Somewhere-2883@reddit (OP)
Thank you for the awesome app !!!
No-Source-9920@reddit
i absolutely love the beta so much more than the stable version, ive been using it since you made this post and it is amazing
Psychological_Cry920@reddit
Glad you like it!!
Hawk_7979@reddit
I tried this version of the app, and it’s absolutely amazing. However, I discovered a security concern related to plaintext secrets stored in the mcp environment variables.
It’s better to store encrypted values once they’ve been saved, just like N8N does.
epycguy@reddit
in what way? either you have a password on startup (which can be keylogged) or you have the encryption key on disk which will just get stolen alongside your settings file
Psychological_Cry920@reddit
What a great point. We will work on this!
nerdyvaroo@reddit
Thats a pretty big app :O almost a GB damn, what all is in there?
Psychological_Cry920@reddit
Yeah it's sad. We are shipping CUDA dependencies in recent versions. Thinking about downloading in app when needed.
nerdyvaroo@reddit
it's not a good idea to ship the CUDA dependencies in the app in my opinion.
Better to do that if cuda not available, just use cpu for llama.cpp (it still didn't work for me on an nvidia GPU on void linux so there is that. better to not have it. If I use my AMD Radeon 7700Xt then those CUDA dependencies as just wasted storage in that situation.)
Psychological_Cry920@reddit
A note here we switched the app over to Tauri instead of Electron build now, and that literally cut the app size down by more than half (only linux left still around a GB - WIP)
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta
nerdyvaroo@reddit
lol I use linux so lemme know once thats working :D
Also HOLY SHIT thats a big improvement
Psychological_Cry920@reddit
Sure!!!
Psychological_Cry920@reddit
Yeah I do agree
Psychological_Cry920@reddit
This definitely what should be fixed by the release!!!
undisputedx@reddit
Hi loius,
I am trying this rc4 beta on win 10, its stuck on loading model phase.
Psychological_Cry920@reddit
Hey, thanks for reporting. We noticed a regression issue when running in Windows. Please help us update the app from the settings page or download the new beta build here
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta
undisputedx@reddit
bunch of errors during update rc 6 install and still stuck on loading model phase.
Psychological_Cry920@reddit
Btw it looks like you got a dangling process while updating. Can you help me find a kill process cortex-server. Then continue the install process.
undisputedx@reddit
uninstalling and reinstalling rc5 has worked. Thanks.
Psychological_Cry920@reddit
yayyyyy
undisputedx@reddit
is there any tutorial to make mcp work?
Psychological_Cry920@reddit
Unfortunatelly, we have not done for the docs yet. For now, go to Settings > MCP Servers and you will see default servers there. Start with fetch, toggle it on (good to start with), back to the chat you will see tools enabled. To try better search and scrape, you can go to Serper website to get an API Key, input it to Serper MCP server env in app then you will see google_search and scrape tools enabled in chat.
Psychological_Cry920@reddit
Oops. Please help me share the cortex.log file, I will take a look at this. You can find the log file in Settings > Data Folder > Logs
__eita__@reddit
Hi there! I've been following Jan's progress and all I have to say is that your guys are making great progress!
Couldn't help but notice that you guys are also making a Tauri build. As someone who's interested in the concept of developing for cross-platform, could you describe a little bit your decision? Are you going full Tauri in the future? Also, are you planning to use Tauri for Android/iOS apps?
PS: great thing that you guys decided to go Apache License 2.0
Psychological_Cry920@reddit
Yessss! That is the reason why we decided to go full Tauri to scale to mobile. That is our goal for the next couple of sprints.
Psychological_Cry920@reddit
We do notice some limitations of Electron, particularly NodeJS in terms of scalability. Tauri is an excellent choice, as we can optimize more from the Rust part for performance. Additionally, we can utilize native APIs (mobile plugins) to work with certain frameworks down to the native layer.
__eita__@reddit
Thank you for your response!
It would be awesome to hear about this experience on a blog post or something.
Also, it seems that you guys are going to provide llama.cpp integration?
I think that from all open source options for local LLM desktop apps, you are the group taking the best decisions in the last months.
Gonna be watching :)
Psychological_Cry920@reddit
Thank you for your kind words. We build in public so that please feel free to join us in some of the discussions. Yes, we’ll work on some blog posts on this shift 🚀🚀
liquidnitrogen@reddit
What is the app for when we have gguf?
Psychological_Cry920@reddit
Hi, Jan app is one of our in-house products which focus on a clean GUI for “normies”. Everyone can use the GGUF file with other llm apps which support tool use. In this version, we introduce MCP support to work with the model better, and enable us to implement upcoming model updates tailored to different use cases that require additional work from the application layer to further boost the model’s capabilities.
ab2377@reddit
this question was actually very on-topic, thanks!
Confident-Artist-692@reddit
Hi, just tried to load this into LM Studio but it threw up an error and wouldn't work.
Kooky-Somewhere-2883@reddit (OP)
HI you can check this issue, it might fix yours
Kooky-Somewhere-2883@reddit (OP)
https://huggingface.co/Menlo/Jan-nano-gguf/discussions/1#684e3b09078845bb1355901c
kadir_nar@reddit
Open Source 👑
Kooky-Somewhere-2883@reddit (OP)
Thank you <3
Healthy-Ad-8558@reddit
May I ask why you went with Qwen3-4b instead of Phi4 or maybe even Gemma 3? Follow up question, do you plan on using larger models in the future. One more thing, it seems like IBM's Granite 4.0 will be wickedly efficient, if it turns out to be as good as they're claiming it to be, would you consider using it?
Kooky-Somewhere-2883@reddit (OP)
we did try gemma3, we couldn't get the model to generate any correct sample like at all, none so it's not even improving.
could be our model or training issue, or could be at size 4B there isn't much great choice.
disillusioned_okapi@reddit
pretty cool stuff. btw, one reputable way to assert your claim here would be to get included in the Berkeley Function-Calling Leaderboard.
If the model is as good for general tool calling as it is in your MCP benchmarks, it might end up in the top 10, and might even replace xLAM-2-8b as the best small model for tool calling.
The process for testing and submitting new models is fairly well-documented. I hope you consider this 🙏
PrizeNew8709@reddit
Is there an API to consume this deep search mode?
Whiplashorus@reddit
Hello How can I use the deep research mode ?
I installed the latest 0.6 version and jan nano Q8 but I don't find any button or even doc about it
Psychological_Cry920@reddit
Hi u/Whiplashorus! Unfortunately, this feature is still in preview and we on the way working on 0.6.1 which will have this feature enabled for now you can download preview build here to try it out https://jan.ai/docs/desktop/beta
Kooky-Somewhere-2883@reddit (OP)
THANK YOU
r/LocalLLaMA
Since sunday when we released the model:
- We have 15 downloads and increasing now
- We have thousands of upvotes from you guys
- We are trending on huggingface
As you may have known Menlo Research is a small research team that is trying our best. Community is everything to us. We will remember and take all the feedbacks to improve the models and the app.
We will keep you guys updated and release the tecnical report soon!
Kooky-Somewhere-2883@reddit (OP)
hi87@reddit
Thanks for sharing! Can you share which UI that is in your video?
yoracale@reddit
Jan AI - Apache 2.0 licensed. They were also the ones who trained the model: https://github.com/menloresearch/jan
Psychological_Cry920@reddit
Thanks u/yoracale for the share.
We’d really appreciate it if you guys can try it out and share some feedbacks
Here are beta links:
Windows: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_x64-setup.exe
macOS Universal: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_universal.dmg
Linux Deb: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.deb
Linux AppImage: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.AppImag
hi87@reddit
Thank you for these links. I really love the simplicity and the snappy UI. Reminds me of Cherry Studio without the clutter (which is great).
Some feedback:
I can't seem to use any Gemini models from the providers. Even the older ones that show up don't all work. Tool calling doesn't work at all with the gemini models.
After adding a new MCP server I had to quit and restart the application for the chat to register them.
Ability to configure agents with their own MCP servers/tools.
Thanks for sharing again
Psychological_Cry920@reddit
Thank you for great feedback
oxygen_addiction@reddit
Loving the app. Having LM Studio as a default provider would be good, as it is more poplar than Ollama nowadays (at least for Normies).
MmmmMorphine@reddit
My only real wish is for them to integrate some agent management into the ui. Can't seem to find any decent minimalistic ones that let you easily specify a set of models as agents and how they should interact using a couple of drop down menus
WriedGuy@reddit
Open source gonna rule
Kooky-Somewhere-2883@reddit (OP)
LET GOOOO
WriedGuy@reddit
Hey how you added internet search, scraping and other features and I want to add my personal features how I can add it
Kooky-Somewhere-2883@reddit (OP)
let install more mcp
just dont use too many at the same time the model is smol it will be confused
Cluzda@reddit
how many MCPs is too many, what number is considerable a sane amount?
On the other hand, you can often cluster some domains in sub-agents, right?
Kooky-Somewhere-2883@reddit (OP)
as in number of tools
Kooky-Somewhere-2883@reddit (OP)
anything beyond 10
hutoreddit@reddit
Please can anyone teach me how set up search engine API in Jan ?, I didnt see anyplace to set up web search for JAN, what search and scraper api you used for the video.
Please any body ?
Kooky-Somewhere-2883@reddit (OP)
https://menloresearch.github.io/deep-research/
here bro
Psychological_Cry920@reddit
Hi u/hutoreddit please help us download Jan beta version here, search MCP is not available on stable version yet
https://www.jan.ai/docs/desktop/beta
hutoreddit@reddit
Thank you !
hutoreddit@reddit
Thank you !
Plus-Childhood-7139@reddit
Crazy… curious why this doesn’t get the hype Deepseek received.
Kooky-Somewhere-2883@reddit (OP)
its only beating on one single aspect which is to do tool call and use search engine so obviously very niche
still cool to use tho! very efficient i use it to summarize quick link website and do some web crawling and search!
Plus-Childhood-7139@reddit
Who cares. In the end I just need something that calls the right tools to do things. Pretty much an assistant I can call and she gets the job do e
lompocus@reddit
How many tokens did you use for the training? How expensive was the training?
Kooky-Somewhere-2883@reddit (OP)
i think if you convert the number to runpod ? you can probably do with relatively cheap cost probably below 100$ if you rent h200 (which is because it's faster and take not that much time)
lompocus@reddit
So cheap! When I look at your video I wish some queries got chopped-up (like, "search for x" and then subsequently "extract the highlights"). If I wanted to fine-tune like you did, what would be a good way to generate training data? Did you create a bunch of examples referencing a list of mcp adapters at mcprepository.com for example?
Kooky-Somewhere-2883@reddit (OP)
We will include this details in upcoming technical report!!!!
RobotRobotWhatDoUSee@reddit
Ah, excellent, I am very interested in this technical report when you all have it. Thanks!
Kooky-Somewhere-2883@reddit (OP)
My demo isn't the best performing version cuz i use my home computer with kv-cache for q8 and the q8 version weight.
In theory with this size I should host BF16 to make sure of best quality, but i only got 12gb VRAM at home lol.
Kooky-Somewhere-2883@reddit (OP)
we use our homebrew A6000.
8xA6000 for training code.
4xA6000 for a vllm server for inferencing and generate answers.
That_Neighborhood345@reddit
how many hours? So we could figure out the cost?
Kooky-Somewhere-2883@reddit (OP)
h200 is around 2-3 hours
shing3232@reddit
It would be interesting to have a QAT variant of IQ4
RobotRobotWhatDoUSee@reddit
This looks great. Do you have a paper on the training, rtc?
Sussymannnn@reddit
Can you bring it on android?
Kooky-Somewhere-2883@reddit (OP)
we dont see any android chat app that support mcp yet
unum_omnes@reddit
Did you use a specific system prompt for this? I'm trying to replicate the behavior shown off in the video, where the agent scrapes multiple web pages. Even with a system prompt that instructs the LLM to scrape multiple webpages, it only usually scrapes one.
ajmusic15@reddit
For some reason it is impossible for me to load models using Jan.ai on my 5080. No matter how different the llama.cpp settings are and how many weights I load on GPU (Depending on the settings), it always uses CPU while the tool's own task manager says it is using GPU. I don't understand that.
AbaloneStriking9397@reddit
which chat/mcp client are you using buddy?
Psychological_Cry920@reddit
Hi u/AbaloneStriking9397, it's is Jan, currently in beta. Please help us give it a try and share your feedback.
https://www.jan.ai/docs/desktop/beta
anshulsingh8326@reddit
what is this software? Doesn't look like a web ui
Kooky-Somewhere-2883@reddit (OP)
its jan beta https://www.jan.ai/docs/desktop/beta
MaruluVR@reddit
Since this is based on Qwen3, is there any chance of getting a 30B-A3B finetune with the same training data?
Kooky-Somewhere-2883@reddit (OP)
yes, but u see VRAM 😭
MaruluVR@reddit
If you do the training entirely locally I feel you.
But if you think using cloud GPUs for training is fine, then you can get a Nvidia H200 SXM with 141 GB for only $0.80 per hour.
Kooky-Somewhere-2883@reddit (OP)
wait where to get h200 only for 0.8
MaruluVR@reddit
https://hpc-ai.com/ had them on sale a few days ago, not sure if the sale is still ongoing tough
Kooky-Somewhere-2883@reddit (OP)
thank you will take a look
These-Dog6141@reddit
can you pin a FAQ to make the model have like in the video clip in OP post? I loaded the model (q8) in Jan and it is not tool calling and just halucinating
Kooky-Somewhere-2883@reddit (OP)
do you have mcp
These-Dog6141@reddit
no i dont think so i just loaded the downloaded model in jan and started chatting
Psychological_Cry920@reddit
Hey here is a quick guide, as we are still working on docs, sorry for that. https://menloresearch.github.io/deep-research/
SilentLennie@reddit
Very interesting, there is a lot to be gained from fine tuning and working with good tool use and MCP.
Kooky-Somewhere-2883@reddit (OP)
yes!
SilentLennie@reddit
I already noticed that with tool use and MCP you could get better results, I hadn't tried fine tuning yet.
Something I noticed, the gguf file doesn't have tools support ? Even though I see tools mentioned in the original hugging face repo.
Kooky-Somewhere-2883@reddit (OP)
oh really? can you point out where , sorry we are really really noob at gguf - im better at training model
SilentLennie@reddit
So I was trying to use it with Ollama. I knew how to take a gguf file and use Modelfile to make a model in Ollama. Ollama says: no tools support.
I was trying to convert it your file, I just compiled llama.cpp to see how it's done. First time for me too. Gemini is trying to help me.
Kooky-Somewhere-2883@reddit (OP)
oh for ollama you can use qwen3 template or sth directly from ollama
i have never used ollama myself so im not very sure how to
can you try llama.server?
SilentLennie@reddit
I can use this:
./bin/llama-server -m models/jan-nano-4b-Q8_0.gguf --host :: --jinja
But I'm still trying to see if it actually works well for MCP in our case.
CC u/qnixsynapse
SilentLennie@reddit
./bin/llama-server -m models/jan-nano-4b-Q8_0.gguf --host :: --jinja does work.
qnixsynapse@reddit
Jan nano GGUF has proper tool support. I think Ollama uses custom chat templates. Please open an issue on their repo with this.
Kooky-Somewhere-2883@reddit (OP)
What will happen if you let the model run on 128k context window and almost full precision??
I recorded the model running in almost full-power, with ability to do extremely long followup on tool calling make it give out very good DeepResearch report
Enjoy, here is the recording
https://youtu.be/hnTnu-7q-WE
Kooky-Somewhere-2883@reddit (OP)
you can do the same by following the tutorial to set up yarn in base model repo https://huggingface.co/Qwen/Qwen3-4B#processing-long-texts
Commercial-Celery769@reddit
How do I enable web search I downloaded jan and the jan nano q8 but I don't see an option for it. Is it a custom tool that im missing or am I overlooking something? I looked at the docs on your site but for whatever reason many of the pages are showing up as broken HTML for me.
Mediocre_Leg_754@reddit
A beginner's question. All these data sets that you have, how do you ensure that the underlying LLMs don't have any exposure to these data sets?
GodIsAWomaniser@reddit
wow looks awesome!
Could i ask what mcp servers you hooked it up to and what interface you are using in the demo?
Interested to replicate.
Kooky-Somewhere-2883@reddit (OP)
this one
https://github.com/marcopesani/mcp-server-serper
GodIsAWomaniser@reddit
Sorry, as a follow up question could you please share with me what interface you're using? I have searched for similar ones and I couldn't find it.
Psychological_Cry920@reddit
Hi, it's is Jan! currently in beta. Please give it a try and share your feedback with us.
https://jan.ai/docs/desktop/beta
GodIsAWomaniser@reddit
ah i downloaded the current release and went "wtf this isnt right"
Ill grab the beta and have a try.
Thanks!
Psychological_Cry920@reddit
Haha, thanks!
GodIsAWomaniser@reddit
TYSM!
Effective_Stage7405@reddit
Just to let you know: I've been trying this model on my 2022 Xiaomi Pro 12GB RAM / 512GB storage and gives about 8T/s which is amazing!
I used smallchat v.09. Head here to download the apk:
https://github.com/shubham0204/SmolChat-Android
I used the 8bit Quant. I tell you, this is a game changer for offline mobile AI. Congrats guys!
Kooky-Somewhere-2883@reddit (OP)
thank you i will try the app
306d316b72306e@reddit
Like with most expert posts on here.. I look forward to the benchmark that defends the claims; in this case right in the post title..
No-Refrigerator-1672@reddit
Seems like it's a Qwen 3 4B finetune, which begs me a question: do you have data on performance degradation in summarization and multilingual tasks? I'm actually running a separate vanilla Qwen3 4B as auxiliary model for non mission critial uses, and if your performance degradation is minimal, it would be tempting to replace it with your model and them use it for MCP too.
Kooky-Somewhere-2883@reddit (OP)
Our team uses a flip benchmark, it's a new way to cheap for degradation and the result showing 1-2% degradation. We will include this in our technical report.
But are you confident in your multilingual requirement for a 4B model? I can somewhat confidently say that if you are okay with language ability of the Qwen3-4B, our model also will have relatively similar performance.
But again, not everyone is happy with base performance of Qwen3-4B.
No-Refrigerator-1672@reddit
I need multilingual capabilities for tag generation in OpenWebUI. In my experience, 4B variety can't say anything useful in languages other than English, but surprisingly it grasps the main idea of texts pretty well, so if you ask it to generate tags/shorts summary in English for a chat involving another language, it accomplishes the task good enough. Particularly I've tested it with Latvian, which is a European language with roughly 2M-3M speakers.
beryugyo619@reddit
My favorite test for these small Chinese LLMs is to ask it recipes in Japanese. They start speaking like the cake core from Portal 1.
jgwinner@reddit
GLaDoS got out?
Let's hope Alibaba doesn't have any neurotoxin in stock
beryugyo619@reddit
Some of hilarious ingredients from dozen or so Qwen3-4B and 0.6B responses to variations of
カレーの作り方 "/no_think"
yep she's out
jgwinner@reddit
haha Right, even without the green Neurotoxin she's trying to kill us.
Or dispose of us, I mean wtf AI: "human flesh miso"??
Then again, there was a guy that ate a 747, so countertop, maybe. I draw the line at the miso.
Kooky-Somewhere-2883@reddit (OP)
This is a crazy idea but you can also use a Google Translate or DeepL MCP.
Let bro translate it and read 😂.
istinetz_@reddit
to be fair, google translate is crazy expensive at scale
like, literally 10x more expensive than even using gpt-4o for translation
Kooky-Somewhere-2883@reddit (OP)
i never used the api, interesting to know
sub_RedditTor@reddit
Exactly.. No need for multilingual model , especially if it's for coding .
tookmyplates@reddit
lol when i first started fine-tuning qwen3 models i kept breaking them to the point where they reverted to speaking Chinese. Some of my models still use Chinese, but only when they feel like they're saying something that might ride the safety line in English
VoidAlchemy@reddit
Is this "flip benchmark" you mention basically the Accuracy is Not All You Need paper type benchmarking? I'd be interested in more details of your benchmark implementation for more general use as well. Thanks and great job!
Kooky-Somewhere-2883@reddit (OP)
noted we will include, today is sunday so we dont have the full team here to work on but surely very soon
coding_workflow@reddit
What dataset you used for training? Mind sharing.
This sound very intersting in tools use. Will give a tests.
How did you run evaluation? Based on what?
Kooky-Somewhere-2883@reddit (OP)
Thank you everyone ❤️
This is the first time we have a model that is trending on top 6 on Huggingface, literally the first time.
Without yall support this wouldn't be possible.
DuxLunae@reddit
And that's a proper demo, not zooming in and out like it’s a Michael Bay movie
Kooky-Somewhere-2883@reddit (OP)
i loled 🤣🤣
Tim541@reddit
What is this app which you're running it on?
Psychological_Cry920@reddit
Heyyy, it's Jan, currently in Beta. Please help us give it a few shots, you can download it here:
https://jan.ai/docs/desktop/beta
In this beta build, we introduce MCP support and enhance the UI. Please give us feedback if you try it sometime.
CptKrupnik@reddit
what MCPs used? how do you facilitate search? is everything here local?
Kooky-Somewhere-2883@reddit (OP)
its google search basically
CptKrupnik@reddit
with api-key? or something like playwright?
Kooky-Somewhere-2883@reddit (OP)
api key
serper.dev is very cheap
Kooky-Somewhere-2883@reddit (OP)
we use serper mcp
Kooky-Somewhere-2883@reddit (OP)
we did train 8b and 14b but not release cuz its not as finished in term of method as this 4b.
I would say the big model learns much faster, but have overthinking issue.
pmttyji@reddit
Awesome. I'm gonna load this tonight. I bookmarked this HF page.
I have 2 suggestions for you on JanAI. I'm
These could push users to create tutorials on JanAI.
Thanks for JanAI(Using it since year start) & now for this model. I love Jan.
Psychological_Cry920@reddit
Thanks for all the great suggestions!
pmttyji@reddit
I have feedback(This is for 0.5.17 version). Please feel free to ....
Thanks. Today only noticed that you have released some minor versions. I'll check those.
Psychological_Cry920@reddit
Oh woaaa, I will note down all of this here for the next release.
pmttyji@reddit
thanks again
das_rdsm@reddit
I am confused here, the main branch and the dev branch seem to be really different, is the app going through big changes?
Psychological_Cry920@reddit
Ah Yes, we reworked the UI and application backend to scale for mobile in the upcoming sprints. - Switched from Electron to Tauri, this will help reduce the app size half, also enable us to build mobile targets as well. - No longer NextJS as frontend, it’s now React only. We are also moving off cortex and integrating llama.cpp server directly.
Ok_Appeal8653@reddit
You have to be careful author, I tried around a bit, and it was normal to aska a simple question, and it doesnt answer, its thought gets tuck until the end of the max anser lenght wiht stuff like:
[...]
Final Answer:
The mechanical power input to an induction generator is equal to the electrical power output. Therefore,
Pelectrical=Pmechanical
Pelectrical =Pmechanical
This equality represents the fundamental principle of energy conversion in such generators.
Final Answer:
The mechanical power input is equal to the electrical power output. Therefore,
Pelectrical=Pmechanical
Pelectrical =Pmechanical
This relationship holds under ideal conditions where there are no losses in the system.
Final Answer:
The mechanical power input is equal to the electrical power output. Therefore,
Pelectrical=Pmechanical
Pelectrical =Pmechanical
This equality represents the fundamental principle of energy conversion in an induction generator.
Final Answer:
The mechanical power input to an induction generator is equal to the electrical power output. Therefore,
Pelectrical=Pmechanical
Pelectrical =Pmechanical
This equality represents the fundamental principle of energy conversion in such generators.
Final Answer:
The mechanical power input is equal to the electrical power output. Therefore,
Pelectrical=Pmechanical
Pelectrical =Pmechanical
This relationship holds under ideal conditions where there are no losses in the system.
Final Answer:
The mechanical power input is equal to the electrical power output. Therefore,
Pelectrical=Pmechanical
Pelectrical =Pmechanical
This equality represents the fundamental principle of energy conversion in an induction generator.
[...]
Kooky-Somewhere-2883@reddit (OP)
I knew many issues with the model, this one probably isnt the model but some sampling issue
Psychological_Cry920@reddit
Hi u/Ok_Appeal8653, I think you got context_shift settings ON by default from previous version, please help me go to Settings > Provider -> Llama cpp -> Disable context shift in the setting.
You probably want to increase the context size of the model, it will prompt you to increase in the RC6 when the issue arise.
Ok_Appeal8653@reddit
Ok, I I think that this should be the problem,a s I am not using the beta version of the app right now, and I dont see this option. I will download the beta version and test ti later, thanks.
Psychological_Cry920@reddit
Yayy!
Psychological_Cry920@reddit
Psychological_Cry920@reddit
Also, please don't turn on so many MCP servers or MCP tools to reduce the input prompt, as this can quickly lead to an out-of-context size issue.
Ok_Appeal8653@reddit
In theory, as I don't have the beta version, the model doesn't have any tool activated.
Psychological_Cry920@reddit
Ah copy!
DMTJones@reddit
What is the tool you're using to run the novel in that video?
Psychological_Cry920@reddit
Hi u/DMTJones it's is Jan beta, with Serper MCP tool enabled (You can see it in the app beta's settings page).
https://jan.ai/docs/desktop/beta
haikusbot@reddit
What is the tool you're
Using to run the novel
In that video?
- DMTJones
^(I detect haikus. And sometimes, successfully.) ^Learn more about me.
^(Opt out of replies: "haikusbot opt out" | Delete my comment: "haikusbot delete")
Severe-Video3763@reddit
What MCP tools do you typically give it access to?
Psychological_Cry920@reddit
Hi u/Severe-Video3763 I currently use Serper because other tools might produce a very large output that requires a very high context length setting.
SelectionCalm70@reddit
Which datasets did you use to post train the model?
Severe-Video3763@reddit
Congrats on the launch. Any idea how Perplexity Sonar does on SimpleQA for comparison?
MagoViejo@reddit
Any way to make this work with open WebUI+ollama? I tried the beta windows build, but it seems to only support llama.cpp directly.
Kooky-Somewhere-2883@reddit (OP)
can you try webui with llama.server?
i have not tested ollama, but i heard from friends some of them have templating issues.
you can refer to my fix here, the same apply to ollama or lmstudio
https://huggingface.co/Menlo/Jan-nano-gguf/discussions/1
HilLiedTroopsDied@reddit
llamacpp server docker container, mount point to your model.gguf
MagoViejo@reddit
thanks , that may actually work.
HilLiedTroopsDied@reddit
it does work, I do it on my home server < unraid
MagoViejo@reddit
I have put that prompt into the system prompt, and see this kind of odd behaviour
\boxed{Naná (2016)}
when asking about film adaptations of Emil Zola books. I think I'm missing something else there, as just a model and a template will not make an mcp server, I think I need to configure the tools that will be executed somewhere, right?
Kooky-Somewhere-2883@reddit (OP)
you need to set up MCP
MaximaProU@reddit
A bit off topic but is Jan faster than Ollama?
Kooky-Somewhere-2883@reddit (OP)
Jan use llama.server you need to ask if llama.cpp is faster thsn ollama
MaximaProU@reddit
Ollama also uses llama.cpp Jan uses cortex.cpp which is based on llama.cpp
So is cortex.cpp faster than Ollama?
Kooky-Somewhere-2883@reddit (OP)
Currently Jan only uses cortex as a proxy, and soon we will be llama.server native
llama.cpp IS AMAZING!
Kooky-Somewhere-2883@reddit (OP)
and frankly i dont know 🤣, i think llama.server is clean and good and llama.cpp is amazing !!!!!
Yogeshwar_maya@reddit
Why it's not running in LM studio?
Kooky-Somewhere-2883@reddit (OP)
u can try https://huggingface.co/Menlo/Jan-nano-gguf/discussions/1#684e3b09078845bb1355901c
Yogeshwar_maya@reddit
Thanks!
ZiggityZaggityZoopoo@reddit
Yeah, obviously. Qwen has officially embraced MCP while DeepSeek couldn’t care less
kironlau@reddit
I can sucessfully load the model on Jan, but how to load it in other llama.cpp platform?
I cannot load and chat in LM Studio, seems it use a differenct chat template
Kooky-Somewhere-2883@reddit (OP)
https://huggingface.co/Menlo/Jan-nano-gguf/discussions/1#684e3b09078845bb1355901c
kironlau@reddit
thanks
Caffdy@reddit
maybe a dumb question, but what software is that on the video? what front'end
Psychological_Cry920@reddit
Hey, it's Jan, this our new beta version supports MCP, and work better with Jan-nano in the post.
https://jan.ai/docs/desktop/beta
Caffdy@reddit
took a quick glance at the project and look very interesting. Just one quick question, what is your privacy policy? do Jan collect or log chats or sends anything to an external server/service?
Psychological_Cry920@reddit
Hey we don’t upload any logs!!! Everything stay on your machine
myfavcheesecake@reddit
Hi!
I'm using:
Jan-beta_0.5.18-rc4-beta_x64-setup.exe
I'm Windows 11 and I'm unable to load any models.
However itself the stable version works though.
Psychological_Cry920@reddit
Hi u/myfavcheesecake, could you please share me the log file? I will take a look at this.
myfavcheesecake@reddit
Hey looks like the new beta solved this! Thanks!
Psychological_Cry920@reddit
Nicee! Thanks
Psychological_Cry920@reddit
Oops! I can reproduce here as well, on it!!!
Psychological_Cry920@reddit
Hi u/myfavcheesecake, we just pushed a new version to fix the issue. Could you please help us update the app again (App > Settings > Check for update) or download from here https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta
myfavcheesecake@reddit
Thanks! Works great now!
Psychological_Cry920@reddit
Yayyy!
ShinobuYuuki@reddit
Yeah, we switch the app over to Tauri instead of Electron build now, and that literally cut the app size down by more than half, it is also snappier imo
Optimal-Builder-2816@reddit
Are there any specific instructions for how to ask this model to use provided tools? Wondering if there's prompting guidance.
Kooky-Somewhere-2883@reddit (OP)
we baked some default prompt into the chat template dyou dont need to do much
Optimal-Builder-2816@reddit
Oh ok cool, does qwen normally need any additional prompting for tools?
Kooky-Somewhere-2883@reddit (OP)
you can use jan-nano without prompting, just some default mcp prompt (mcp come with prompt), for qwen i need to use very very long system prompt with multiple examples and it's still very hit and miss
Optimal-Builder-2816@reddit
Excited to check it out!
eggs-benedryl@reddit
Okay, so this is all over my head but did make me look in to MCP stuff.
I'm curious if I have this right. Before this agent use and MCP tools would be handled similarly to comfy ui or what have you, a series of automated handoffs query, to result, to query etc.
When I test my confirmed working MCP wikipedia tool, all prior models appear to complete the very first step and no further. This jan model appears to do it all, one query to the next.
Is that why this is so revolutionary? If what I'm getting from this Jan model wasn't nearly as simple and easy before.. Bravo this is dope.
Kooky-Somewhere-2883@reddit (OP)
thank you.
we just train the model on visiting the page when it needed using RL.
Phantomx_77@reddit
Particularly how much did the training cost?
Kooky-Somewhere-2883@reddit (OP)
we use internal cluster which is just a6000
if you rent in runpod i believe anything below 100$
Phantomx_77@reddit
Nice! I have rtx 4050 in my laptop and what can i expect with it?
Kooky-Somewhere-2883@reddit (OP)
it should work
Unhappy-Branch3205@reddit
Awesome! Great job!
Kooky-Somewhere-2883@reddit (OP)
thank you ❤️
Tricky_Reflection_75@reddit
What MCP is show cased in the video?
Kooky-Somewhere-2883@reddit (OP)
serper mcp
TheREXincoming@reddit
Damn the model is so smol and so gud
Kooky-Somewhere-2883@reddit (OP)
thank you 😍
Admirable-Bedroom-65@reddit
Are you planning to add reasoning to this anytime soon?
Kooky-Somewhere-2883@reddit (OP)
probably but need a lot of change
Edzomatic@reddit
I just tried it (Q8) with Jan UI and web search with serper worked really really well, however with the other tools like browsermcp its performance was quite poor but I guess that's to be expected from a 4b model.
Also since the model is finetuned from qwen3 is it supposed to have thinking? Because it didn't think during my testing
Kooky-Somewhere-2883@reddit (OP)
we trained the model on nonthink so it might not perform well with think
Asleep-Ratio7535@reddit
Is this Jan? It looks so different from before.
Psychological_Cry920@reddit
Yes, we completely revamped Jan from the UI to the application backend.It's now quite customizable in terms of appearance, so Alan changed the settings as shown in the video. For now, it's quite lightweight in terms of size compared to previous versions (it's still heavy on Linux, and we're working on it).
SaratogaCx@reddit
I just tried out the beta and please let me go full width in the chat interface. Every AI chat seems to have decided that I can't pick my own line length and it sucks. The released version lets me set the width to fill the window but I couldn't find a setting in rc5 that would let me (I could have missed it though).
Asleep-Ratio7535@reddit
Great job. I am enjoying it right now. It looks great! Great job. I found a bug (?) in the default models from remote providers. It's hardcoded. Deleting one model just puts it to the end of the queue. But the mcp looks very decent. I think this is a killer now. I will use Jan for the API now.
Psychological_Cry920@reddit
Woho! Thanks for the bug report. Noted it down here 🚀🚀🚀
epSos-DE@reddit
Nice interface. It needs a canvas mode and a folder with canvas files for the canvas mode !
Kooky-Somewhere-2883@reddit (OP)
hope we will have that soon
MichaelBui2812@reddit
This really interesting given that the model is quite small to run in almost any consumer PC. I have some questions:
Thanks a lot!
Kooky-Somewhere-2883@reddit (OP)
yes it can use browser just use browser use mcp.
you just need a hard question, the model will decide it want to read the details purely agentic.
SupeaTheDev@reddit
Very interesting. Stuff like this is what might change the world the most. We need LLM cost closer to zero to have agents everywhere in the world (which I believe will Be the end result)
Dr4x_@reddit
Hi, I'm not an expert so I'm trying to understand what is the breakthrough here. If I recall correctly qwen3 models are already good at MCP tool calling, so from what I understand the improvement is about better information extraction from a response provided by a research tool. But I feel I'm missing something
quan734@reddit
i think they ran a ReCall/ReSearch RL on top of qwen3-4b so its better at multi hop search, not just for MCP/Tool calling
nguyenm@reddit
80% reason of why I pay for a GPT Plus subscription, and thinking of switching to Gemini Pro, is the Deep Research function. This is pretty solid for a developmental build, so keep up the good work. Although I would take a wild guess that context length would be plausible bottleneck if the search query action produces less-than-stellar results.
With the o3 Deep Research, sometimes it shows the thinking token as it can watch videos and listen to audio but I am skeptical of such. For context I was asking it if Thanos had snapped to the Adam's Family song, the original one, how much of us fleshbags would there be left. The thinking tokens mentioned it tried to watch the 1960s video of the theme song, then produced a result of 33 snaps after which there'd be 8 humans left on earth.
Tangentially, I almost always default to Xanh SM now because of predictability in bike model and slightly-more professional drivers (due to the strictness of Vin Group management lol).
IrisColt@reddit
🤣 Sudden comedy gold.
sonik13@reddit
I've been using AI incorrectly this whole time.
mynameismypassport@reddit
Incorporating the 'Addams reasoning benchmark' into future tests for sure.
IrisColt@reddit
By the way, tricky question (the snapper should also be considered in the population). Added to my battery of excellent questions. Thanks!!!
Kooky-Somewhere-2883@reddit (OP)
Yes you are on point! we will need to evaluate more on context len issue and more deepresearch related development, and also, bigger mode.
Stay tuned!!!
Latter_Virus7510@reddit
🔥🥰❤️
Kooky-Somewhere-2883@reddit (OP)
🥰🥰🥰
SandboChang@reddit
maybe a silly question as I am not familiar with MCP: Do you need to pair it with a specific search engine like Google for it to work? Or does it crawl the internet on its own? My understanding is that you need to have some index for a search to be effective, so I wonder what is used in this aspect.
Or, does the search here is looking over a local data base?
Kooky-Somewhere-2883@reddit (OP)
you can also use local rag database if you have as long as it has mcp
Kooky-Somewhere-2883@reddit (OP)
it needs a search engine MCP like serper mcp
Swimming_Power_2960@reddit
I am a bit confused. what exactly is it better at compared to all the other models mentioned in the post? Because the model itself can't bet better right? Because its super small compared to a huge model like the ones listed in the "SimpleQA" part of the post. I am genuinely confused about why and how this model is so special.
Kooky-Somewhere-2883@reddit (OP)
you can try setting up qwen4B with some mcp and this model you will see the difference immediately
Swimming_Power_2960@reddit
https://imgur.com/a/pFyoUuB maybe i am using it incorrectly but so far i can't say i am impressed.
Swimming_Power_2960@reddit
So its just extremely good at tool calling for its size? Is that what makes it special?
qnixsynapse@reddit
Awesome. This tiny 2.3GB models calls tools like Pro man!
Kooky-Somewhere-2883@reddit (OP)
Born This Way
ForceItDeeper@reddit
that would make this a pretty solid choice for something like homeassistant, right?
Ive havent paid much attention lately and Ive got some reading up to do. Im just learning aboot MCP, and between that and the capabilities of a 4B model makes me think LLMs have gotten to the point where they are able to mostly everything id want it to.
Thanks for the time and work put into this. I love the idea of AI without sacrificing every bit of privacy and I appreciate all you peeps smarter than me working to make that happen
Kooky-Somewhere-2883@reddit (OP)
This is pretty amazing, i think it will make something like amazon alexa but like much much smarter?
You_Wen_AzzHu@reddit
How to enable deep research on the Linux UI?
Kooky-Somewhere-2883@reddit (OP)
because the model is very under prompt, you can use some prompting + mcp (search api mcp) it will carry out the research for you, just ask it to research deeply etc.
You_Wen_AzzHu@reddit
i sense some config is missing
Kooky-Somewhere-2883@reddit (OP)
i have sample vllm serve command for u
vllm serve Menlo/Jan-nano --host 0.0.0.0 --port 1234 --enable-auto-tool-choice --tool-call-parser hermes --chat-template ./qwen3_nonthinking.jinja
You_Wen_AzzHu@reddit
fixed tool use on model added serper api key. but the research stop after google search
Kooky-Somewhere-2883@reddit (OP)
oh this vllm version we didn't include the deepresearch default prompt, can you try include this.
```
In this environment you have access to a set of tools you can use to answer the user's question. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.
Tool Use Rules
Here are the rules you should always follow to solve your task:
Always use the right arguments for the tools. Never use variable names as the action arguments, use the value instead.
Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.
If no tool call is needed, just answer the question directly.
Never re-do a tool call that you previously did with the exact same parameters.
For tool use, MARK SURE use XML tag format as shown in the examples above. Do not use any other format.
```
You_Wen_AzzHu@reddit
Added to instructions section of assistant. Same issue. Changed the one tool per message to all tools, same issue.
Kooky-Somewhere-2883@reddit (OP)
oh i see can you reduce the number of tools? just keep web search and scrape and retry too many can be confusing for the model atm
You_Wen_AzzHu@reddit
Added to Jinja. Same issue . Serper is requested since I say request # increase on dashboard. But serper research results are not shown on UI. Jan-beta log shows found tool google_search in server and nothing else. Will try to limit the tools. Thanks.
Kooky-Somewhere-2883@reddit (OP)
this is very strange, i recommend you to download the llama.server version inside jan to test and compare, maybe something is off
Kooky-Somewhere-2883@reddit (OP)
use q8
You_Wen_AzzHu@reddit
My mistake . The Google _search tool works in deed but since the result is very minimal I mistakenly thought it would do further tool calls to other tools.
Kooky-Somewhere-2883@reddit (OP)
damn have u considered the answer is quite sufficient? cuz if it found the information is enough it wont try
Kooky-Somewhere-2883@reddit (OP)
click on this, and endable tool use
Kooky-Somewhere-2883@reddit (OP)
You_Wen_AzzHu@reddit
Aromatic_Fun_6118@reddit
How did you set it up like that? Is there any guide on how to do that for beginners? :) Thank you for your work!
Kooky-Somewhere-2883@reddit (OP)
you can download the beta jan app that is being posted around this comment section
Psychological_Cry920@reddit
Thank you for your kind words. We build in public so that please feel free to join us in some of the discussions. Yes, we’ll work on some blog posts on this shift 🚀🚀
Kooky-Somewhere-2883@reddit (OP)
It's quite funny seeing Jan-nano talking about .... itself
nerdyvaroo@reddit
Kooky-Somewhere-2883@reddit (OP)
Candance ?
nerdyvaroo@reddit
candace nuts fit in yo mouthh?
vinigrae@reddit
Goteem
Kooky-Somewhere-2883@reddit (OP)
ew
nerdyvaroo@reddit
;-; its the joke for when someone says candace... Apologies.
spiffco7@reddit
Thank you team Menlo
Kooky-Somewhere-2883@reddit (OP)
omg thank you for trying out ❤️
Clueless_Nooblet@reddit
I tried with Jan-nano, endless "loading model" loop. what am I doing wrong?
Psychological_Cry920@reddit
Hi u/Clueless_Nooblet, which beta version are you using there? We just patched a model loading issue on the new beta version here. Please help me update to this version https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta
Clueless_Nooblet@reddit
The google search tool seems to have problems, though. Even with rc6
Psychological_Cry920@reddit
Ah you will need a Serper API Key inputted in the MCP server env. Otherwise it’s just a placeholder key 😂
Clueless_Nooblet@reddit
Oh my, I'll have to look all that stuff up. Thanks a lot for your help :)
Psychological_Cry920@reddit
Haha sorry for the inconvenience caused. We will think for a better UX (e.g. built-in free endpoint setup)
Clueless_Nooblet@reddit
It's fine! I know your pain, I created Writingway (also on github). AI is still early days, we're suffering through all kinds of problems still ;)
Thanks a lot!
Psychological_Cry920@reddit
Sweeeet!!
Clueless_Nooblet@reddit
Hi, I just updated to rc6, and it seems to work now!
Psychological_Cry920@reddit
Yayyy!
Kooky-Somewhere-2883@reddit (OP)
oh wait i realized this is apl issue
Kooky-Somewhere-2883@reddit (OP)
please use the recommended sampling params of qwen3
ImprovementMedium716@reddit
Add help guides for how to integrate mcp servers please
Psychological_Cry920@reddit
Heyy, please help me go to Settings > MCP Servers. You will see all default servers there. Start with fetch (good to start with) then later Serper when u grab an api key from their website and fill it there
WhoKnows_Maybe_ImYou@reddit
Can I use this model with Ollama and Open-webui?
Kooky-Somewhere-2883@reddit (OP)
yes but i believe ollama has some template issue now
kyznikov@reddit
how do i use the google search and scrapping feature? when i ask the model, it says it has limited access and can't access internet. im using Jan app with your model jan-nano-4b-Q8_0
Psychological_Cry920@reddit
Hi u/kyznikov so sorry for that, we patched the issue just now, can you help updating the app or download the new beta version from this link
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta
kyznikov@reddit
it works now but the model can't access the internet, what's wrong?
Kooky-Somewhere-2883@reddit (OP)
oh there is something wrong with mcp, very strange you see the error returned for some reason it cant search
kyznikov@reddit
i think it works now. i didn't know i had to open chrome, install browser mcp extension, and had it open for it to work
Kooky-Somewhere-2883@reddit (OP)
amazing, but browser use can use a lot of VRAM, using serper and scrape is the way to go (if you have the api key)
Psychological_Cry920@reddit
Hmm weird. Do you have a valid Serper API key there. Help me check it again to see if there is any typo there
kyznikov@reddit
it works now, but it seems it can't access the internet. what's wrong?
DesperateAdvantage76@reddit
How do you keep Google's AI response at the top of the search from skewing your results?
Kooky-Somewhere-2883@reddit (OP)
we dont have this objective, its optimized for the correct response
Psychological_Cry920@reddit
Hi everyone 👋, we just patched the model run issue on Windows, please help us update the beta build or download from here. So sorry for that!
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta
Kooky-Somewhere-2883@reddit (OP)
so fast
Psychological_Cry920@reddit
Thanks! Mate
Kooky-Somewhere-2883@reddit (OP)
amazing
Jack_Fryy@reddit
What chat app is that in the video?
Psychological_Cry920@reddit
Hi u/Jack_Fryy, it's Jan - beta version, please help us try it out here.
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta
Jack_Fryy@reddit
Cool will try check it out
Psychological_Cry920@reddit
Yayy!
Psychological_Cry920@reddit
Hello everyone! This is Louis from the Menlo Research team, currently contributing to Jan App. I’m thrilled to see that Jan-nano has received such positive feedback from everyone. I’d love to ask for your help giving our beta app a few shots, the more feedback we receive, the better our upcoming release will be. 🙏 Here is the beta build link:
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc4-beta
Thank you everyone.
hungry_hipaa@reddit
Is there a plan to support Apple Silicon? It shows it's not detecting a GPU. Also being a little lazy here and will look into it but can I run Serper and the other tools? This is on a MBP M2 Max 96GB. Wish you all the best on this project and will be following closely even if it has to be on a non Mac :)
Psychological_Cry920@reddit
Hi it supports Apple Sillicon by default. You got a monster there!!!!! Woho
Kooky-Somewhere-2883@reddit (OP)
Hello Louis this is Alan
Psychological_Cry920@reddit
Hi mate!!!!
MikeBirdTech@reddit
Love this 🥰
Menlo keeps adding value to open source AI
Psychological_Cry920@reddit
Thank you!
Kooky-Somewhere-2883@reddit (OP)
thank you ❤️
whisgc@reddit
This is awesome work. Thanks and kudos to the entire team. All the best
Psychological_Cry920@reddit
Thank you!
Kooky-Somewhere-2883@reddit (OP)
thank you ❤️
xxPoLyGLoTxx@reddit
Looks interesting - thanks for creating.
The processing seems to be referencing URLS. Is the model entirely local or does it access the web?
Kooky-Somewhere-2883@reddit (OP)
it accessing the web through a serper MCP (tool to scrape web and do google search)
xxPoLyGLoTxx@reddit
OK thanks for answering! Might be a dumb question, but is there any way to ease woes about privacy of the prompts?
This looks really cool as a perplexity alternative and I'm excited to try it. But I do value privacy.
Kooky-Somewhere-2883@reddit (OP)
there are many web search api provider, maybe you try brave api instead of google serper?
Loighic@reddit
How can I set up tool calling like you have here?
Thin_Protection9395@reddit
How are people using MCP with local models these days? I use the openai python api mostly, but I feel like there might be a more up-to-date way?
Kooky-Somewhere-2883@reddit (OP)
I use FastMCP its very cool chec it out!
Exarch_Maxwell@reddit
What's the UI called?
Psychological_Cry920@reddit
Hi it's Jan, please help us give it a few shots. Here is the beta build link:
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta
Asleep-Ratio7535@reddit
https://github.com/menloresearch/jan
AdditionalWeb107@reddit
I am curious what is the baseline performance without MCP. Why does MCP help improve performance?
Kooky-Somewhere-2883@reddit (OP)
it google search better, so it finds more information to answer you better.
Just like human coder is better with stackoverflow.
AdditionalWeb107@reddit
Oh I see. So in essence the tool calling capabilities are enhanced and hence the quality of input to the model?
Kooky-Somewhere-2883@reddit (OP)
sorta, but even so the base model doesn't google search as good for example. so it makes better decisions to use tools
Psychological_Cry920@reddit
Everyone, there's a regression issue with the Jan beta windows build. We're working on a new beta build to patch it (estimated time: 1-2 hours). So sorry for this.
AppearanceHeavy6724@reddit
Very strange to see SimpleQA used with RAG.
Kooky-Somewhere-2883@reddit (OP)
well if you consider google search is RAG
AppearanceHeavy6724@reddit
Of course it is. Why? Everything that pulls something into context from some external storage and summarizes it is form of RAG.
Kooky-Somewhere-2883@reddit (OP)
Well then yes, cuz perplexity also using this benchmark!! if you view it that way. We just aim to have some relevant counter parties to benchmark to.
kiruz_@reddit
I tried to use that Jan app together with your model but whenever I download Q8 model + try to run it - it never loads. I checked my resources and looks like it doesn't try to load anything as GPU and RAM stays the same. Im on 9950x3d + 5090. LLM studio works fine for me. Any idea how to debug your app? I tried that beta version.
Psychological_Cry920@reddit
Hi u/kiruz_ we noticed a regression issue with Windows, build. We will push a new build in 1-2 hours. Please help me update and try again then. Will let you know.
kiruz_@reddit
Lovely, thanks for quick prompt! Will check for sure after new build
RedditDiedLongAgo@reddit
Why the Corpos always use throwaway accounts?
Kooky-Somewhere-2883@reddit (OP)
i am well and alive person :) i'm the first author of most paper from menlo
jeez
Kooky-Somewhere-2883@reddit (OP)
and we are a small startup as well, not corpo
Outside-Ordinary3603@reddit
wait a minute you are stating this right?
**THIS LOCAL 4B MODEL CAN:**
UNDERSTAND ANY SHORT FACT SEEKING QUESTION IN ANY DOMAIN (e.g. who was the author of the painting Guernica?
SEARCH ONLINE FOR THE ANSWER
GIVE THE CORRECT ANSWER 85% OF THE TIME OUTPERFORMING THE BEST LLMS AVAILABLE
correct?
Kooky-Somewhere-2883@reddit (OP)
current number is 80.7% we aim to get 85 in next version
RedditDiedLongAgo@reddit
So... no?
Kooky-Somewhere-2883@reddit (OP)
i just want to correct the 85% in the comment
Kooky-Somewhere-2883@reddit (OP)
and its on simpleqa, not literally everything
arleth94@reddit
Chúc mừng! 🎉
Psychological_Cry920@reddit
Cảm ơn nhé!!
Kooky-Somewhere-2883@reddit (OP)
Cảm ơn 😍
bwjxjelsbd@reddit
Does this support MLX?
Kooky-Somewhere-2883@reddit (OP)
We have not converted it to mlx since Jan is not supporting MLX right now.
Feel free to convert the base weight to mlx.
NoPresentation7366@reddit
Wow i'm super impressed, I've made some inferences and for such a small one, it's very impressive. Thank you for your works mates! 💗😎
liquidnitrogen@reddit
just tried this on LM Studio on Mac and got `Error rendering prompt with jinja template: "Error: Parser Error: Expected closing statement token. OpenSquareBracket !== CloseStatement.`
Kooky-Somewhere-2883@reddit (OP)
hi please check this solution
https://www.reddit.com/r/LocalLLaMA/s/VzIp5wqihF
liquidnitrogen@reddit
Thank you, it works great now !!!
RichardPinewood@reddit
What is that model equivalent to OpenAI / Claude ?
lyhiving@reddit
Nice found
RIPT1D3_Z@reddit
Great job!
Psychological_Cry920@reddit
Thanks!!! The team worked very hard on this. It would be great if you can try our app beta version as well. https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc4-beta
Due-Memory-6957@reddit
What the hell was this music lol
Kooky-Somewhere-2883@reddit (OP)
WHATEVER IT TAKES!
CritStarrHD@reddit
whoa! im curious is this the best open source deep research model available? what other options do we have?
Kooky-Somewhere-2883@reddit (OP)
there are many ways to do deep research, this model excels just in agentic way.
smolagent, for example is pretty good in plan based or workflow based.
CritStarrHD@reddit
nice! im tryna run the model on jan beta and it seems to be stuck on loading the model, anyway to fix this?
notwhobutwhat@reddit
Amazing effort, this is the sort of thing that makes me wonder if smaller language models that can leverage tools effectively is the future of personal AI use cases.
Burning massive amounts of cash on hardware and power that can host 601B (or 1T+ in the case of GPT) parameters, when something like this can collate plenty of relevant context from tools then sort it and understand it effectively, seems completely wasteful.
(PS loving Jan to date, but stacking it with this model out of the box as a turnkey AI solution for personal use? Game changer.)
stoppableDissolution@reddit
Bunch of specialized small models >>> one huge generalist. Bet this is the way the field is going to advance.
Psychological_Cry920@reddit
Yessss! This is Louis from the Jan team. We’re working on packaging our flagship models with Jan to help handle some of the common, repetitive, everyday tasks the team deals with.
Kooky-Somewhere-2883@reddit (OP)
Probably Jan team will do this!!
Love the new ui (Im from research from Menlo)
Muted-Celebration-47@reddit
How to use mcp in jan?
Psychological_Cry920@reddit
Hi u/Muted-Celebration-47, please help download the beta version from the link below, then go to settings -> MCP Servers. You will see some default servers there, and you can add your favorite servers from there as well.
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc4-beta
Kooky-Somewhere-2883@reddit (OP)
yes mcp only in beta Jan now
ExplanationEqual2539@reddit
Crazy bruh
Kooky-Somewhere-2883@reddit (OP)
BRUH
osamaromoh@reddit
BRUHH
Kooky-Somewhere-2883@reddit (OP)
bruhhh
osamaromoh@reddit
I’m gonna test your model in a bit. How is it doing with structured outputs?
Kooky-Somewhere-2883@reddit (OP)
should be doing very well? input and output of tool use is in xml and json
osamaromoh@reddit
I’m gonna integrate it into my PydanticAI workflow and give it a test.
Kooky-Somewhere-2883@reddit (OP)
i dont think it will do good 🤣but hopefully, please let us know the result
ozzie123@reddit
What’s this UI ur using?
Psychological_Cry920@reddit
Hi u/ozzie123 this is Jan app (still in beta), we also trained the model in the post.
Psychological_Cry920@reddit
Can you help us give it a try? Beta build below, thanksss!
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc4-beta
JLeonsarmiento@reddit
Excellent.
Kooky-Somewhere-2883@reddit (OP)
You're welcome <3
jasonhon2013@reddit
Wowwwww that’s insane
Kooky-Somewhere-2883@reddit (OP)
🤯
jasonhon2013@reddit
But why 4B that slow 🤔🤔mistral 7B seems faster
Kooky-Somewhere-2883@reddit (OP)
its pretty fast on my rtx a2000, i have not really tried mistral these days
jasonhon2013@reddit
Ahhh icic mind if I ask how many token per s ?
maifee@reddit
What UI is this??
Psychological_Cry920@reddit
Hi u/maifee this is Jan - (Beta - Apache 2.0). We trained the model in the post: https://github.com/menloresearch/jan
Psychological_Cry920@reddit
The app is still in its Beta phase, please help us try it out here:
Windows: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_x64-setup.exe
macOS Universal: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_universal.dmg
Linux Deb: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.deb
Linux AppImage: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.AppImag
maifee@reddit
Is this app open sourced??
Commercial_Key_9023@reddit
yeah definitely!
check it out here: https://github.com/menloresearch/jan
gowisah@reddit
Thank you. I will try this out. Really interesting.
Kooky-Somewhere-2883@reddit (OP)
thank you we worked hard on it
Beb_Nan0vor@reddit
Can you share the MCP tools you are using for this? And also, does Jan work with the RTX 50 series GPUs? I am trying the latest Jan beta and its been stuck on loading the model (or any model for that matter).
Psychological_Cry920@reddit
Hi u/Beb_Nan0vor, this is Serper MCP Server. It should be. Do you have any advanced settings for the model? Or better yet, could you share some logs with me so I can take a look?
Beb_Nan0vor@reddit
Thanks for the reply. I'll send it in a direct message.
Kooky-Somewhere-2883@reddit (OP)
i use serper https://github.com/marcopesani/mcp-server-serper
yes jan support latest cuda!
Beb_Nan0vor@reddit
Thank you! :)
aknight2015@reddit
I've got an older laptop with 8 gigs of RAM. Can I run this locally?
Kooky-Somewhere-2883@reddit (OP)
yes can! use the model i recommended, but dont keep your hope too high for Q4 even the ixs version
aknight2015@reddit
I'm pretty flexible with my expectations. For tough stuff I use other AIs.
yoracale@reddit
Congrats guys this is amazing work!
Kooky-Somewhere-2883@reddit (OP)
Thanks!! we worked hard on it
sunshinecheung@reddit
wow, when can we use deep-research in jan?
Kooky-Somewhere-2883@reddit (OP)
you can use it now but the build is beta you can find it in this somewhere in this thread
sunshinecheung@reddit
can you give me the link of beta thx
Psychological_Cry920@reddit
Hi u/sunshinecheung,
Here are the links to our Beta build, we’d love for you to try it out!
Windows: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_x64-setup.exe
macOS Universal: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_universal.dmg
Linux Deb: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.deb
Linux AppImage: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.AppImag
SillyLilBear@reddit
Have you tried how it performs with 8B and 14B?
Kooky-Somewhere-2883@reddit (OP)
Yes we did!
I will include this part in technical report.
Funny enough at first the 4B outperform 8B and 14B due to 8b and 14b overthink on using parameters, we made some changes and its scaling logically, 8B and 14B will perform better.
there is some learning on this part too, details coming very very soon!!
Lankonk@reddit
That's really impressive!
PowerBottomBear92@reddit
Is there a guide for how to download this?
Perfect-Category-470@reddit
This is sickkkkkk!!!
Kooky-Somewhere-2883@reddit (OP)
YES IT IS!!!!
Honest_Ad_7497@reddit
wow this is wild.
Kooky-Somewhere-2883@reddit (OP)
It's deepseek-v3 latest version guy sr i think i forgot to type v3 into the post
Only_Situation_4713@reddit
I'm getting a jinja error in LM studio :(
Kooky-Somewhere-2883@reddit (OP)
hi right now you can temporarily fix it by (repost from a comment of mine on huggingface)
Hi you can use Qwen3 template from other lmstudio compatible model but remember to disable "thinking" and add this system prompt when using
```
In this environment you have access to a set of tools you can use to answer the user's question. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.
Tool Use Rules
Here are the rules you should always follow to solve your task:
Always use the right arguments for the tools. Never use variable names as the action arguments, use the value instead.
Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.
If no tool call is needed, just answer the question directly.
Never re-do a tool call that you previously did with the exact same parameters.
For tool use, MARK SURE use XML tag format as shown in the examples above. Do not use any other format.
```
In the meantime we will try to see if we can fix the gguf
Enjoy
Only_Situation_4713@reddit
sweet it worked, thanks
Kooky-Somewhere-2883@reddit (OP)
amazing!! remember we train the model on non-think objective, so please not enable thinking
i suggest use sequential thinking MCP or sth like that will give you the desired effects if you want the model to think more between tool calls!