Jan-nano, a 4B model that can outperform 671B on MCP | TheaterFire

Jan-nano, a 4B model that can outperform 671B on MCP

Posted by Kooky-Somewhere-2883@reddit | LocalLLaMA | View on Reddit | 490 comments

Hi everyone it's me from Menlo Research again,

Today, I’d like to introduce our latest model: Jan-nano - a model fine-tuned with DAPO on Qwen3-4B. Jan-nano comes with some unique capabilities:

It can perform deep research (with the right prompting)
It picks up relevant information effectively from search results
It uses tools efficiently

Our original goal was to build a super small model that excels at using search tools to extract high-quality information. To evaluate this, we chose SimpleQA - a relatively straightforward benchmark to test whether the model can find and extract the right answers.

Again, Jan-nano only outperforms Deepseek-671B on this metric, using an agentic and tool-usage-based approach. We are fully aware that a 4B model has its limitations, but it's always interesting to see how far you can push it. Jan-nano can serve as your self-hosted Perplexity alternative on a budget. (We're aiming to improve its performance to 85%, or even close to 90%).

We will be releasing technical report very soon, stay tuned!

You can find the model at:
https://huggingface.co/Menlo/Jan-nano

We also have gguf at:
https://huggingface.co/Menlo/Jan-nano-gguf

I saw some users have technical challenges on prompt template of the gguf model, please raise it on the issues we will fix one by one. However at the moment the model can run well in Jan app and llama.server.

Benchmark

The evaluation was done using agentic setup, which let the model to freely choose tools to use and generate the answer instead of handheld approach of workflow based deep-research repo that you come across online. So basically it's just input question, then model call tool and generate the answer, like you use MCP in the chat app.

Result:

SimpleQA:
- OpenAI o1: 42.6
- Grok 3: 44.6
- 03: 49.4
- Claude-3.7-Sonnet: 50.0
- Gemini-2.5 pro: 52.9
- baseline-with-MCP: 59.2
- ChatGPT-4.5: 62.5
- deepseek-671B-with-MCP: 78.2 (we benchmark using openrouter)
- jan-nano-v0.4-with-MCP: 80.7

[-]

jeffutter@reddit

Wow, I'm real excited for this model. I've been testing the unsloth quant `Jan-nano-128k-GGUF:Q8_4K_XL` with `ollama`. It seems to get stuck in circular thinking pretty badly, especially when multiple (even very simple) tool calls are involved.

It'll call one tool ok (getting the current date) and then output reams and reams of thinking second guessing how it should use that tool call and what it should do next.

Any suggestions on what might cause this or how to remedy it?

[-]

Kooky-Somewhere-2883@reddit (OP)

ollama seems to be reported to have some yarn scaling issue.

My recommendation is for you to follow the llama.server and vllm config we gave in huggingface Readme for now.

[-]

jeffutter@reddit

Oh nice! Thanks for the tip. I got it setup with vlllm - now I just need a Frontend that supports mcps. I’ve been using oterm, but that’s ollama only (doesn’t work with lightllm proxy)

[-]

Kooky-Somewhere-2883@reddit (OP)

you can use jan.ai frontend or cherry ai both should be fine

[-]

jeffutter@reddit

I'm currently unable to use the jan.ai frontend due to this issue: https://github.com/menloresearch/jan/issues/5537

I did manage to get it setup with `vllm` and `open-webui`. The tool calling is still not great. I'm not sure if this is a `jan` problem, or a `mcpo`/`open-webui` problem though.

I'm asking a small query that should use two tools `todays_date` to get the current date and `get_tasks` that takes `completed=Bool` and `due_before=Date`.

It keeps trying to call `get_tasks(completed=Date)` and it calls it over and over and over again in one response, never trying anything different 🤷

[-]

Usual-Instruction445@reddit

This looks really cool! will there be a version compatible with M series chips in the future?

[-]

Kooky-Somewhere-2883@reddit (OP)

Hm... i'm a bit confused do you mean MLX?

we already have GGUF which can run on the M chip right now? https://huggingface.co/Menlo/Jan-nano-gguf

[-]

Usual-Instruction445@reddit

I tried the lm studio download on the website and it wont work. I assumed it was my hardware.

[-]

Kooky-Somewhere-2883@reddit (OP)

NOTICE
We recommend using Q8 GGUF.

If you are "really tight" on VRAM we recommend iQ4_XS GGUF

These are tested! other quants like Q4_0 and Q4_K_M have significantly degraded performance.

[-]

-p-e-w-@reddit

Q4_K_M has “significantly” degraded performance compared to IQ4_XS?

Are you sure you tested this correctly? What is the criterion you used for the test? KL divergence compared to FP?

[-]

Kooky-Somewhere-2883@reddit (OP)

not extensively for some reason it almost didnt work well at all we just test qualitatively for now.

it gave up on researching and gave bad quality response like literally you can distinguish.

safest bet is Q8 for quality

[-]

EmployeeLogical5051@reddit

what about Q6?

[-]

Kooky-Somewhere-2883@reddit (OP)

don't know haven't tested

[-]

EmployeeLogical5051@reddit

ohh alright, i will test myself to see the response quality.

[-]

animax00@reddit

If you have tested, how was the quality…?

[-]

EmployeeLogical5051@reddit

Unfortunately i didnt test Q6, just went with Q8.

[-]

mk8933@reddit

Does Jan Ai have attachment support?

[-]

Psychological_Cry920@reddit

Hey, Jan attachment support still in WIP.

[-]

mk8933@reddit

Thanks

[-]

Papabear3339@reddit

Quants can degrade tiny models a lot harder then big ones.

There are special fine tuning methods to try and get around that (like awq), but it sounds like that wasn't used here.

[-]

-p-e-w-@reddit

I know that, but IQ4_XS is smaller than Q4_K_M, by about 15%, and is generally considered a worse-quality quant.

[-]

lazarus102@reddit

'really tight' is relative to the individual and their finances. It could be more helpful if you were to make a short list of which cards are bottom requirements for which quant. Though it may need context length and other settings factored in, for people to set realistic expectations.

[-]

Kooky-Somewhere-2883@reddit (OP)

Some of the mcp like browser-use can easily eat up to 32k tokens with just a few step that come across big site, but anyways i will share my own home setup.

This takes up to 9-10GB of VRAM.

[-]

lazarus102@reddit

"ngl" is a good setting. cuz you always want an AI that's not gonna lie.

[-]

Kooky-Somewhere-2883@reddit (OP)

ngl

[-]

Ok_Bug1610@reddit

Any thoughts on using Unsloth Dynamic 2.0 GGUFs Quantization? to make a UD-Q2_K_XL, UD-Q3_K_XL, and UD-Q4_K_XL variant? They offer 2-4x performance gains in my testing with very little accuracy loss, not to mention 5-10x using vLLM due to hardware underutilization (can squeeze a lot more parallel processes out of a quantized model).

Is anyone here aware of an android app that can run custom quantized models? I would love to test this out on pure mobile and also an RPi 4. Though the model isn't particularly capable, I can output near \~624 tok/sec on a single RTX 3070 8GB using Qwen 0.6B UD-Q2_K_XL. I'm going to buy a newer graphics card, but I don't have access to particularly top-end hardware, and I tested an Intel Arc A770 using Volkan drivers and IPEX-LLM but I can't even reach like a third of the 3070 results.

[-]

Kooky-Somewhere-2883@reddit (OP)

i think they have quant https://huggingface.co/unsloth/Jan-nano-GGUF

[-]

Ok_Bug1610@reddit

Sorry, the UD (Unsloth Dynamic) model's are not the same thing. I'd suggest reading the Unsloth's paper above and Run DeepSeek-R1 Dynamic 1.58-bit. Basically, though the models are roughly the size of their "dense" quantized counterparts, it's actually an average bit size. Their dynamic quantization method quantizes the "critical" parts of the model at higher precision, while giving more aggressive compression to the less critical ones (and you can apply this more or less agressively, depending on the target: speed, size, accuracy, etc.). This means they use 1-8 bits quantization (dynamically) with an "average" of X bits (\~2.5bits: UD-Q2_K_XL, \~3.5bits: UD-Q3_K_XL, and \~4.5bits: UD-Q4_K_XL). This yields better performance and accuracy over standard "dense" quantization, albeit slightly larger. It's worth a read and playing around with!

[-]

yoracale@reddit

The Jan GGUFs are Unsloth dynamic 2.0! So we do do dynamic quantization for the quants

[-]

Ok_Bug1610@reddit

Sorry, that wasn't immediately clear but awesome to hear, thanks. I will be testing them out shortly. It's nice to hear the method is becoming a standard. Awesome work, and thank you so much!!

[-]

az226@reddit

What is DAPO? Can you share the code?

[-]

Kooky-Somewhere-2883@reddit (OP)

https://arxiv.org/abs/2503.14476

[-]

fullouterjoin@reddit

Decoupled clip and Dynamic sAmpling Policy Optimization (DAPO) [sic] algorithm, and fully open-source a state-of-the-art large-scale RL system.

DAPO takes the cake as the most twisted brand-initialism I have seen in ML. 🎂

[-]

az226@reddit

Thanks!

[-]

General_Cornelius@reddit

Been 5 days, how are the vibes, any one had a chance to use it?

[-]

Kooky-Somewhere-2883@reddit (OP)

I would say it's a mixed bag, but mostly our fault.

We have recently fixed the prompt template inside the GGUF so that it has better behavior (aka use tool more) that matches the expected UX much better, you should try.

[-]

General_Cornelius@reddit

Any good uses you have noticed? Looking to enrich information with search

[-]

Kooky-Somewhere-2883@reddit (OP)

of course its good to use please go ahead and run with mcp.

the model is good i think its just setting ip challenges

[-]

Karim_acing_it@reddit

Amazing! Downloaded Jan app (v0.5.17) for Windows, downloaded the Q8 by inserting the HF-GGUF link (surprisingly Jan app doesn't advertise Jan-nano at all, you can't even find it when searching for it using its name) and tried to replicate your prompt, albeit slightly different topic.

Thinking started, but no "deep research". Was my prompt just wrong or do I need to download/do anything else? Couldn't find anything in the Jan settings either. Sorry for the noob question... and congrats to your achievement!

[-]

Kooky-Somewhere-2883@reddit (OP)

To make the model go deeper you should ask it to give a scientific report etc…. try a bit harder.

[-]

Karim_acing_it@reddit

Thanks, I just ask because other may read this as well: I even replicated your "Breaking news today about finance, highlight shocking ones" prompt and again, I just get it to start thinking.

I left everything at default settings as described above. How are we able to replicate your results? I am sure it's something wrong on my side, but I can't be the only one-

[-]

Kooky-Somewhere-2883@reddit (OP)

hi just got time to get back to you

we realized some of you guys may have issues due to flexible and different settings, so i have decided to bake some enhancement system prompt into the model to make sure it tool call a bit more you can retry download the gguf now you will see the difference

[-]

Karim_acing_it@reddit

Amazing to hear, but I didn't have mcp enabled and hadn't setup your entire installation guide. Instead on relying on docker etc, it would be absolutely amazing if all the mcp hosting could happen within Jan, I think this in itself would be the biggest gain over all other GUIs out there and make Jan-Nano accessible to a muuuch larger audience. Thanks!

[-]

ley_haluwa@reddit

Off topic: What app/frontend is in the video? Looks minimalistic and polished

[-]

Psychological_Cry920@reddit

Hi u/ley_haluwa, this is Louis, a contributor to the Jan App featured in the video.

The app is still in its Beta phase, but we’d really appreciate it if you could give it a try and share your feedback.

Here are the links to our Beta build, we’d love for you to take it for a spin!

Windows: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_x64-setup.exe
macOS Universal: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_universal.dmg
Linux Deb: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.deb
Linux AppImage: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.AppImag

Let me know!

[-]

dionisioalcaraz@reddit

I downloaded the AppImage but when I load the model (Jan-nano-UD-Q4_K_XL) from the GUI it takes a long time saying 'loading model...', after some minutes the name of the model appears in the prompt just like the video, but when I ask a question it starts again 'loading models...' without responding. The model works ok with llama.cpp. Any clue?

[-]

Psychological_Cry920@reddit

Hey, can you help me grab log files in the app data folder. I will take a look then.

[-]

dionisioalcaraz@reddit

Thanks for your answer! I just needed to upgrade, now it loads the model fine. But now I have other issues, here is the log https://matcamp.neocities.org/app.log.txt

[-]

Psychological_Cry920@reddit

Ah you will need NodeJS installed for npx MCP like browser mcp.

[-]

dionisioalcaraz@reddit

Thanks! Sorry to bother you again, I installed NodeJS from Debian repository, still can't activate browser mcp, now I have this message in the log:

+node: /tmp/.mount_Jan-be9MfriZ/usr/lib/libcrypto.so.3: version `OPENSSL_3.4.0' not found (required by /usr/lib/x86_64-linux-gnu/libnode.so.115)

I have openssl 3.5.0 installed, is that the problem?

Deleting all files created by Jan and running again the appimage shows this in the log at launch:

[2025-06-19][18:57:03][app_lib::core::setup][ERROR] Failed to run mcp commands: Failed to read config file: No such file or directory (os error 2)

[-]

Psychological_Cry920@reddit

Hey, can you help me check in the app data folder to see if the mcp_config file is there?

[-]

dionisioalcaraz@reddit

Yes, here it is:

{ "mcpServers": { "browsermcp": { "command": "npx", "args": [ "@browsermcp/mcp" ], "env": {}, "active": false }, "fetch": { "command": "uvx", "args": [ "mcp-server-fetch" ], "env": {}, "active": false }, "serper": { "command": "npx", "args": [ "-y", "serper-search-scrape-mcp-server" ], "env": { "SERPER_API_KEY": "7683477c30d5fed0e0279cb33e0475c1b4f4a9ab" } }, "filesystem": { "command": "npx", "args": [ "-y", "@modelcontextprotocol/server-filesystem", "/path/to/other/allowed/dir" ], "env": {}, "active": false }, "sequential-thinking": { "command": "npx", "args": [ "-y", "@modelcontextprotocol/server-sequential-thinking" ], "env": {}, "active": false } }}

When I try to toggle the switch a pop-up window says: "Failed to start MCP server fetch: connection closed: initialize response. Please check the parameters according to the tutorial"

and in the app.log : "node: /tmp/.mount_Jan-bePdJ9Wo/usr/lib/libcrypto.so.3: version `OPENSSL_3.4.0' not found (required by /lib/x86_64-linux-gnu/libnode.so.115)"

[-]

dionisioalcaraz@reddit

Thanks for your answer! I just needed to upgrade, now it loads the model fine. But it can't connect to the internet, it shows "It seems there is an issue connecting to the browser extension. Please ensure that the Browser MCP extension is installed and connected properly....". Using the toogle switches in the MCP server settings I allowed MCP permissions and browsermcp, but still doesn't works.

[-]

Kooky-Somewhere-2883@reddit (OP)

oh hi colleague, just about to reply him, yes we recorded the demo using amazing Jan, it's in beta but the look and use of Jan is amazing.

[-]

MeYaj1111@reddit

im kind of a dumb ass with this stuff. I installed the Jan beta linked above and I see the model "Jan-nano-Gguf" in the list available to download so I installed that but even using the identical prompt from your video it will not do any deep research stuff, just spits out an answer immediately with no searching or anything. Can you point me in the right direction?

[-]

Kooky-Somewhere-2883@reddit (OP)

you need to instal serper MCP

[-]

oxygen_addiction@reddit

How?

[-]

Kooky-Somewhere-2883@reddit (OP)

https://menloresearch.github.io/deep-research/

[-]

oxygen_addiction@reddit

Thanks.

[-]

Psychological_Cry920@reddit

Thank you u/Kooky-Somewhere-2883, I’m really glad you featured it!!!

[-]

Kooky-Somewhere-2883@reddit (OP)

Thank you for the awesome app !!!

[-]

No-Source-9920@reddit

i absolutely love the beta so much more than the stable version, ive been using it since you made this post and it is amazing

[-]

Psychological_Cry920@reddit

Glad you like it!!

[-]

Hawk_7979@reddit

I tried this version of the app, and it’s absolutely amazing. However, I discovered a security concern related to plaintext secrets stored in the mcp environment variables.

It’s better to store encrypted values once they’ve been saved, just like N8N does.

[-]

epycguy@reddit

It’s better to store encrypted values once they’ve been saved

in what way? either you have a password on startup (which can be keylogged) or you have the encryption key on disk which will just get stolen alongside your settings file

[-]

Psychological_Cry920@reddit

What a great point. We will work on this!

[-]

nerdyvaroo@reddit

Thats a pretty big app :O almost a GB damn, what all is in there?

[-]

Psychological_Cry920@reddit

Yeah it's sad. We are shipping CUDA dependencies in recent versions. Thinking about downloading in app when needed.

[-]

nerdyvaroo@reddit

it's not a good idea to ship the CUDA dependencies in the app in my opinion.

Better to do that if cuda not available, just use cpu for llama.cpp (it still didn't work for me on an nvidia GPU on void linux so there is that. better to not have it. If I use my AMD Radeon 7700Xt then those CUDA dependencies as just wasted storage in that situation.)

[-]

Psychological_Cry920@reddit

A note here we switched the app over to Tauri instead of Electron build now, and that literally cut the app size down by more than half (only linux left still around a GB - WIP)

https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta

[-]

nerdyvaroo@reddit

lol I use linux so lemme know once thats working :D

Also HOLY SHIT thats a big improvement

[-]

Psychological_Cry920@reddit

Sure!!!

[-]

Psychological_Cry920@reddit

Yeah I do agree

[-]

Psychological_Cry920@reddit

This definitely what should be fixed by the release!!!

[-]

undisputedx@reddit

Hi loius,

I am trying this rc4 beta on win 10, its stuck on loading model phase.

[-]

Psychological_Cry920@reddit

Hey, thanks for reporting. We noticed a regression issue when running in Windows. Please help us update the app from the settings page or download the new beta build here
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta

[-]

undisputedx@reddit

bunch of errors during update rc 6 install and still stuck on loading model phase.

[-]

Psychological_Cry920@reddit

Btw it looks like you got a dangling process while updating. Can you help me find a kill process cortex-server. Then continue the install process.

[-]

undisputedx@reddit

uninstalling and reinstalling rc5 has worked. Thanks.

[-]

Psychological_Cry920@reddit

yayyyyy

[-]

undisputedx@reddit

is there any tutorial to make mcp work?

[-]

Psychological_Cry920@reddit

Unfortunatelly, we have not done for the docs yet. For now, go to Settings > MCP Servers and you will see default servers there. Start with fetch, toggle it on (good to start with), back to the chat you will see tools enabled. To try better search and scrape, you can go to Serper website to get an API Key, input it to Serper MCP server env in app then you will see google_search and scrape tools enabled in chat.

[-]

Psychological_Cry920@reddit

Oops. Please help me share the cortex.log file, I will take a look at this. You can find the log file in Settings > Data Folder > Logs

[-]

eita@reddit

Hi there! I've been following Jan's progress and all I have to say is that your guys are making great progress!

Couldn't help but notice that you guys are also making a Tauri build. As someone who's interested in the concept of developing for cross-platform, could you describe a little bit your decision? Are you going full Tauri in the future? Also, are you planning to use Tauri for Android/iOS apps?

PS: great thing that you guys decided to go Apache License 2.0

[-]

Psychological_Cry920@reddit

Yessss! That is the reason why we decided to go full Tauri to scale to mobile. That is our goal for the next couple of sprints.

[-]

Psychological_Cry920@reddit

We do notice some limitations of Electron, particularly NodeJS in terms of scalability. Tauri is an excellent choice, as we can optimize more from the Rust part for performance. Additionally, we can utilize native APIs (mobile plugins) to work with certain frameworks down to the native layer.

[-]

eita@reddit

Thank you for your response!

It would be awesome to hear about this experience on a blog post or something.

Also, it seems that you guys are going to provide llama.cpp integration?

I think that from all open source options for local LLM desktop apps, you are the group taking the best decisions in the last months.

Gonna be watching :)

[-]

Psychological_Cry920@reddit

Thank you for your kind words. We build in public so that please feel free to join us in some of the discussions. Yes, we’ll work on some blog posts on this shift 🚀🚀

[-]

liquidnitrogen@reddit

What is the app for when we have gguf?

[-]

Psychological_Cry920@reddit

Hi, Jan app is one of our in-house products which focus on a clean GUI for “normies”. Everyone can use the GGUF file with other llm apps which support tool use. In this version, we introduce MCP support to work with the model better, and enable us to implement upcoming model updates tailored to different use cases that require additional work from the application layer to further boost the model’s capabilities.

[-]

ab2377@reddit

this question was actually very on-topic, thanks!

[-]

Confident-Artist-692@reddit

Hi, just tried to load this into LM Studio but it threw up an error and wouldn't work.

[-]

Kooky-Somewhere-2883@reddit (OP)

HI you can check this issue, it might fix yours

[-]

Kooky-Somewhere-2883@reddit (OP)

https://huggingface.co/Menlo/Jan-nano-gguf/discussions/1#684e3b09078845bb1355901c

[-]

kadir_nar@reddit

Open Source 👑

[-]

Kooky-Somewhere-2883@reddit (OP)

Thank you <3

[-]

Healthy-Ad-8558@reddit

May I ask why you went with Qwen3-4b instead of Phi4 or maybe even Gemma 3? Follow up question, do you plan on using larger models in the future. One more thing, it seems like IBM's Granite 4.0 will be wickedly efficient, if it turns out to be as good as they're claiming it to be, would you consider using it?

[-]

Kooky-Somewhere-2883@reddit (OP)

we did try gemma3, we couldn't get the model to generate any correct sample like at all, none so it's not even improving.

could be our model or training issue, or could be at size 4B there isn't much great choice.

[-]

disillusioned_okapi@reddit

pretty cool stuff. btw, one reputable way to assert your claim here would be to get included in the Berkeley Function-Calling Leaderboard.

If the model is as good for general tool calling as it is in your MCP benchmarks, it might end up in the top 10, and might even replace xLAM-2-8b as the best small model for tool calling.

The process for testing and submitting new models is fairly well-documented. I hope you consider this 🙏

[-]

PrizeNew8709@reddit

Is there an API to consume this deep search mode?

[-]

Whiplashorus@reddit

Hello How can I use the deep research mode ?

I installed the latest 0.6 version and jan nano Q8 but I don't find any button or even doc about it

[-]

Psychological_Cry920@reddit

Hi u/Whiplashorus! Unfortunately, this feature is still in preview and we on the way working on 0.6.1 which will have this feature enabled for now you can download preview build here to try it out https://jan.ai/docs/desktop/beta

[-]

Kooky-Somewhere-2883@reddit (OP)

THANK YOU

r/LocalLLaMA

Since sunday when we released the model:
- We have 15 downloads and increasing now
- We have thousands of upvotes from you guys
- We are trending on huggingface

As you may have known Menlo Research is a small research team that is trying our best. Community is everything to us. We will remember and take all the feedbacks to improve the models and the app.

We will keep you guys updated and release the tecnical report soon!

[-]

Kooky-Somewhere-2883@reddit (OP)

[-]

hi87@reddit

Thanks for sharing! Can you share which UI that is in your video?

[-]

yoracale@reddit

Jan AI - Apache 2.0 licensed. They were also the ones who trained the model: https://github.com/menloresearch/jan

[-]

Psychological_Cry920@reddit

Thanks u/yoracale for the share.

We’d really appreciate it if you guys can try it out and share some feedbacks

Here are beta links:
Windows: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_x64-setup.exe
macOS Universal: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_universal.dmg
Linux Deb: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.deb
Linux AppImage: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.AppImag

[-]

hi87@reddit

Thank you for these links. I really love the simplicity and the snappy UI. Reminds me of Cherry Studio without the clutter (which is great).

Some feedback:

I can't seem to use any Gemini models from the providers. Even the older ones that show up don't all work. Tool calling doesn't work at all with the gemini models.
After adding a new MCP server I had to quit and restart the application for the chat to register them.
Ability to configure agents with their own MCP servers/tools.

Thanks for sharing again

[-]

Psychological_Cry920@reddit

Thank you for great feedback

[-]

oxygen_addiction@reddit

Loving the app. Having LM Studio as a default provider would be good, as it is more poplar than Ollama nowadays (at least for Normies).

[-]

MmmmMorphine@reddit

My only real wish is for them to integrate some agent management into the ui. Can't seem to find any decent minimalistic ones that let you easily specify a set of models as agents and how they should interact using a couple of drop down menus

[-]

WriedGuy@reddit

Open source gonna rule

[-]

Kooky-Somewhere-2883@reddit (OP)

LET GOOOO

[-]

WriedGuy@reddit

Hey how you added internet search, scraping and other features and I want to add my personal features how I can add it

[-]

Kooky-Somewhere-2883@reddit (OP)

let install more mcp

just dont use too many at the same time the model is smol it will be confused

[-]

Cluzda@reddit

how many MCPs is too many, what number is considerable a sane amount?
On the other hand, you can often cluster some domains in sub-agents, right?

[-]

Kooky-Somewhere-2883@reddit (OP)

as in number of tools

[-]

Kooky-Somewhere-2883@reddit (OP)

anything beyond 10

[-]

hutoreddit@reddit

Please can anyone teach me how set up search engine API in Jan ?, I didnt see anyplace to set up web search for JAN, what search and scraper api you used for the video.

Please any body ?

[-]

Kooky-Somewhere-2883@reddit (OP)

https://menloresearch.github.io/deep-research/

here bro

[-]

Psychological_Cry920@reddit

Hi u/hutoreddit please help us download Jan beta version here, search MCP is not available on stable version yet
https://www.jan.ai/docs/desktop/beta

[-]

hutoreddit@reddit

Thank you !

[-]

hutoreddit@reddit

Thank you !

[-]

Plus-Childhood-7139@reddit

Crazy… curious why this doesn’t get the hype Deepseek received.

[-]

Kooky-Somewhere-2883@reddit (OP)

its only beating on one single aspect which is to do tool call and use search engine so obviously very niche

still cool to use tho! very efficient i use it to summarize quick link website and do some web crawling and search!

[-]

Plus-Childhood-7139@reddit

Who cares. In the end I just need something that calls the right tools to do things. Pretty much an assistant I can call and she gets the job do e

[-]

lompocus@reddit

How many tokens did you use for the training? How expensive was the training?

[-]

Kooky-Somewhere-2883@reddit (OP)

i think if you convert the number to runpod ? you can probably do with relatively cheap cost probably below 100$ if you rent h200 (which is because it's faster and take not that much time)

[-]

lompocus@reddit

So cheap! When I look at your video I wish some queries got chopped-up (like, "search for x" and then subsequently "extract the highlights"). If I wanted to fine-tune like you did, what would be a good way to generate training data? Did you create a bunch of examples referencing a list of mcp adapters at mcprepository.com for example?

[-]

Kooky-Somewhere-2883@reddit (OP)

We will include this details in upcoming technical report!!!!

[-]

RobotRobotWhatDoUSee@reddit

Ah, excellent, I am very interested in this technical report when you all have it. Thanks!

[-]

Kooky-Somewhere-2883@reddit (OP)

My demo isn't the best performing version cuz i use my home computer with kv-cache for q8 and the q8 version weight.

In theory with this size I should host BF16 to make sure of best quality, but i only got 12gb VRAM at home lol.

[-]

Kooky-Somewhere-2883@reddit (OP)

we use our homebrew A6000.

8xA6000 for training code.
4xA6000 for a vllm server for inferencing and generate answers.

[-]

That_Neighborhood345@reddit

how many hours? So we could figure out the cost?

[-]

Kooky-Somewhere-2883@reddit (OP)

h200 is around 2-3 hours

[-]

shing3232@reddit

It would be interesting to have a QAT variant of IQ4

[-]

RobotRobotWhatDoUSee@reddit

This looks great. Do you have a paper on the training, rtc?

[-]

Sussymannnn@reddit

Can you bring it on android?

[-]

Kooky-Somewhere-2883@reddit (OP)

we dont see any android chat app that support mcp yet

[-]

unum_omnes@reddit

Did you use a specific system prompt for this? I'm trying to replicate the behavior shown off in the video, where the agent scrapes multiple web pages. Even with a system prompt that instructs the LLM to scrape multiple webpages, it only usually scrapes one.

[-]

ajmusic15@reddit

For some reason it is impossible for me to load models using Jan.ai on my 5080. No matter how different the llama.cpp settings are and how many weights I load on GPU (Depending on the settings), it always uses CPU while the tool's own task manager says it is using GPU. I don't understand that.

[-]

AbaloneStriking9397@reddit

which chat/mcp client are you using buddy?

[-]

Psychological_Cry920@reddit

Hi u/AbaloneStriking9397, it's is Jan, currently in beta. Please help us give it a try and share your feedback.
https://www.jan.ai/docs/desktop/beta

[-]

anshulsingh8326@reddit

what is this software? Doesn't look like a web ui

[-]

Kooky-Somewhere-2883@reddit (OP)

its jan beta https://www.jan.ai/docs/desktop/beta

[-]

MaruluVR@reddit

Since this is based on Qwen3, is there any chance of getting a 30B-A3B finetune with the same training data?

[-]

Kooky-Somewhere-2883@reddit (OP)

yes, but u see VRAM 😭

[-]

MaruluVR@reddit

If you do the training entirely locally I feel you.

But if you think using cloud GPUs for training is fine, then you can get a Nvidia H200 SXM with 141 GB for only $0.80 per hour.

[-]

Kooky-Somewhere-2883@reddit (OP)

wait where to get h200 only for 0.8

[-]

MaruluVR@reddit

https://hpc-ai.com/ had them on sale a few days ago, not sure if the sale is still ongoing tough

[-]

Kooky-Somewhere-2883@reddit (OP)

thank you will take a look

[-]

These-Dog6141@reddit

can you pin a FAQ to make the model have like in the video clip in OP post? I loaded the model (q8) in Jan and it is not tool calling and just halucinating

[-]

Kooky-Somewhere-2883@reddit (OP)

do you have mcp

[-]

These-Dog6141@reddit

no i dont think so i just loaded the downloaded model in jan and started chatting

[-]

Psychological_Cry920@reddit

Hey here is a quick guide, as we are still working on docs, sorry for that. https://menloresearch.github.io/deep-research/

[-]

SilentLennie@reddit

Very interesting, there is a lot to be gained from fine tuning and working with good tool use and MCP.

[-]

Kooky-Somewhere-2883@reddit (OP)

yes!

[-]

SilentLennie@reddit

I already noticed that with tool use and MCP you could get better results, I hadn't tried fine tuning yet.

Something I noticed, the gguf file doesn't have tools support ? Even though I see tools mentioned in the original hugging face repo.

[-]

Kooky-Somewhere-2883@reddit (OP)

oh really? can you point out where , sorry we are really really noob at gguf - im better at training model

[-]

SilentLennie@reddit

So I was trying to use it with Ollama. I knew how to take a gguf file and use Modelfile to make a model in Ollama. Ollama says: no tools support.

HTTP/1.1 400 Bad Request
Content-Type: application/json
Date: Mon, 16 Jun 2025 13:18:59 GMT
Content-Length: 137

{"error":{"message":"registry.ollama.ai/library/jan-nano-4b:latest does not support tools","type":"api_error","param":null,"code":null}}

I was trying to convert it your file, I just compiled llama.cpp to see how it's done. First time for me too. Gemini is trying to help me.

[-]

Kooky-Somewhere-2883@reddit (OP)

oh for ollama you can use qwen3 template or sth directly from ollama

i have never used ollama myself so im not very sure how to

can you try llama.server?

[-]

SilentLennie@reddit

I can use this:

./bin/llama-server -m models/jan-nano-4b-Q8_0.gguf --host :: --jinja

But I'm still trying to see if it actually works well for MCP in our case.

CC u/qnixsynapse

[-]

SilentLennie@reddit

./bin/llama-server -m models/jan-nano-4b-Q8_0.gguf --host :: --jinja does work.

[-]

qnixsynapse@reddit

Jan nano GGUF has proper tool support. I think Ollama uses custom chat templates. Please open an issue on their repo with this.

[-]

Kooky-Somewhere-2883@reddit (OP)

What will happen if you let the model run on 128k context window and almost full precision??

I recorded the model running in almost full-power, with ability to do extremely long followup on tool calling make it give out very good DeepResearch report

Enjoy, here is the recording

https://youtu.be/hnTnu-7q-WE

[-]

Kooky-Somewhere-2883@reddit (OP)

you can do the same by following the tutorial to set up yarn in base model repo https://huggingface.co/Qwen/Qwen3-4B#processing-long-texts

[-]

Commercial-Celery769@reddit

How do I enable web search I downloaded jan and the jan nano q8 but I don't see an option for it. Is it a custom tool that im missing or am I overlooking something? I looked at the docs on your site but for whatever reason many of the pages are showing up as broken HTML for me.

[-]

Mediocre_Leg_754@reddit

A beginner's question. All these data sets that you have, how do you ensure that the underlying LLMs don't have any exposure to these data sets?

[-]

GodIsAWomaniser@reddit

wow looks awesome!
Could i ask what mcp servers you hooked it up to and what interface you are using in the demo?
Interested to replicate.

[-]

Kooky-Somewhere-2883@reddit (OP)

this one

https://github.com/marcopesani/mcp-server-serper

[-]

GodIsAWomaniser@reddit

Sorry, as a follow up question could you please share with me what interface you're using? I have searched for similar ones and I couldn't find it.

[-]

Psychological_Cry920@reddit

Hi, it's is Jan! currently in beta. Please give it a try and share your feedback with us.

https://jan.ai/docs/desktop/beta

[-]

GodIsAWomaniser@reddit

ah i downloaded the current release and went "wtf this isnt right"
Ill grab the beta and have a try.
Thanks!

[-]

Psychological_Cry920@reddit

Haha, thanks!

[-]

GodIsAWomaniser@reddit

TYSM!

[-]

Effective_Stage7405@reddit

Just to let you know: I've been trying this model on my 2022 Xiaomi Pro 12GB RAM / 512GB storage and gives about 8T/s which is amazing!

I used smallchat v.09. Head here to download the apk:
https://github.com/shubham0204/SmolChat-Android

I used the 8bit Quant. I tell you, this is a game changer for offline mobile AI. Congrats guys!

[-]

Kooky-Somewhere-2883@reddit (OP)

thank you i will try the app

[-]

306d316b72306e@reddit

Like with most expert posts on here.. I look forward to the benchmark that defends the claims; in this case right in the post title..

[-]

No-Refrigerator-1672@reddit

Seems like it's a Qwen 3 4B finetune, which begs me a question: do you have data on performance degradation in summarization and multilingual tasks? I'm actually running a separate vanilla Qwen3 4B as auxiliary model for non mission critial uses, and if your performance degradation is minimal, it would be tempting to replace it with your model and them use it for MCP too.

[-]

Kooky-Somewhere-2883@reddit (OP)

Our team uses a flip benchmark, it's a new way to cheap for degradation and the result showing 1-2% degradation. We will include this in our technical report.

But are you confident in your multilingual requirement for a 4B model? I can somewhat confidently say that if you are okay with language ability of the Qwen3-4B, our model also will have relatively similar performance.

But again, not everyone is happy with base performance of Qwen3-4B.

[-]

No-Refrigerator-1672@reddit

I need multilingual capabilities for tag generation in OpenWebUI. In my experience, 4B variety can't say anything useful in languages other than English, but surprisingly it grasps the main idea of texts pretty well, so if you ask it to generate tags/shorts summary in English for a chat involving another language, it accomplishes the task good enough. Particularly I've tested it with Latvian, which is a European language with roughly 2M-3M speakers.

[-]

beryugyo619@reddit

My favorite test for these small Chinese LLMs is to ask it recipes in Japanese. They start speaking like the cake core from Portal 1.

[-]

jgwinner@reddit

GLaDoS got out?

Let's hope Alibaba doesn't have any neurotoxin in stock

[-]

beryugyo619@reddit

Some of hilarious ingredients from dozen or so Qwen3-4B and 0.6B responses to variations of カレーの作り方 "/no_think"

vanadium, trash, spoon, "palmitto", bagels, "playing balls", "blue ass(mushroom)", "shpagin", soy milk, "fenNEL", "chamellier", "oatmeal(or rice)", spaghetti(100g), ketchup (for farming(spicy types)), curry rice, "currynese", human flesh miso, "tarste sauce", "curryren", "candies (can choose)", sand pot(or countertop), mapo tofu, "mayonesu", "orangeepanat", "Harefield green"

yep she's out

[-]

jgwinner@reddit

haha Right, even without the green Neurotoxin she's trying to kill us.

Or dispose of us, I mean wtf AI: "human flesh miso"??

Then again, there was a guy that ate a 747, so countertop, maybe. I draw the line at the miso.

[-]

Kooky-Somewhere-2883@reddit (OP)

This is a crazy idea but you can also use a Google Translate or DeepL MCP.

Let bro translate it and read 😂.

[-]

istinetz_@reddit

to be fair, google translate is crazy expensive at scale

like, literally 10x more expensive than even using gpt-4o for translation

[-]

Kooky-Somewhere-2883@reddit (OP)

i never used the api, interesting to know

[-]

sub_RedditTor@reddit

Exactly.. No need for multilingual model , especially if it's for coding .

[-]

tookmyplates@reddit

lol when i first started fine-tuning qwen3 models i kept breaking them to the point where they reverted to speaking Chinese. Some of my models still use Chinese, but only when they feel like they're saying something that might ride the safety line in English

[-]

VoidAlchemy@reddit

Is this "flip benchmark" you mention basically the Accuracy is Not All You Need paper type benchmarking? I'd be interested in more details of your benchmark implementation for more general use as well. Thanks and great job!

[-]

Kooky-Somewhere-2883@reddit (OP)

noted we will include, today is sunday so we dont have the full team here to work on but surely very soon

[-]

coding_workflow@reddit

What dataset you used for training? Mind sharing.

This sound very intersting in tools use. Will give a tests.

How did you run evaluation? Based on what?

[-]

Kooky-Somewhere-2883@reddit (OP)

Thank you everyone ❤️

This is the first time we have a model that is trending on top 6 on Huggingface, literally the first time.

Without yall support this wouldn't be possible.

[-]

DuxLunae@reddit

And that's a proper demo, not zooming in and out like it’s a Michael Bay movie

[-]

Kooky-Somewhere-2883@reddit (OP)

i loled 🤣🤣

[-]

Tim541@reddit

What is this app which you're running it on?

[-]

Psychological_Cry920@reddit

Heyyy, it's Jan, currently in Beta. Please help us give it a few shots, you can download it here:
https://jan.ai/docs/desktop/beta

In this beta build, we introduce MCP support and enhance the UI. Please give us feedback if you try it sometime.

[-]

CptKrupnik@reddit

what MCPs used? how do you facilitate search? is everything here local?

[-]

Kooky-Somewhere-2883@reddit (OP)

its google search basically

[-]

CptKrupnik@reddit

with api-key? or something like playwright?

[-]

Kooky-Somewhere-2883@reddit (OP)

api key

serper.dev is very cheap

[-]

Kooky-Somewhere-2883@reddit (OP)

we use serper mcp

[-]

Kooky-Somewhere-2883@reddit (OP)

we did train 8b and 14b but not release cuz its not as finished in term of method as this 4b.

I would say the big model learns much faster, but have overthinking issue.

[-]

pmttyji@reddit

Awesome. I'm gonna load this tonight. I bookmarked this HF page.

I have 2 suggestions for you on JanAI. I'm

Please make more noise on online & push JanAI to more more people. Look at this thread, many still asking that what UI is that from video.
Start a Youtube channel & upload videos(like demo videos in this post, Release versions, features) whenever possible.

These could push users to create tutorials on JanAI.

Thanks for JanAI(Using it since year start) & now for this model. I love Jan.

[-]

Psychological_Cry920@reddit

Thanks for all the great suggestions!

[-]

pmttyji@reddit

I have feedback(This is for 0.5.17 version). Please feel free to ....

Increase width of Model dropdown in Chat window(yeah, it would be better to give some extra width to see full name of model)
Multiple quants of same Model causing confusion on Model dropdown in Chat window. Let me give an example. I downloaded multiple quants(Q8, Q6, Q4) of Qwen3-8B model. But the chat window dropdown shows 3 Qwen3-8B which is confusing as there's no way to ensure which one is with which quant. So instead it would be easy & better to see Qwen3-8B-Q8-0, Qwen3-8B-Q6-K, Qwen3-8B-Q4-K_M there.
For some imported GGUF files, Chat window dropdown shows just empty. Ex: I downloaded gemma-3-12b-it-qat-q4_0(stduhpf) and chat window shows empty instead of model name. Same for one other model, but not sure which one since I deleted some models accidentally.

Thanks. Today only noticed that you have released some minor versions. I'll check those.

[-]

Psychological_Cry920@reddit

Oh woaaa, I will note down all of this here for the next release.

[-]

pmttyji@reddit

thanks again

[-]

das_rdsm@reddit

I am confused here, the main branch and the dev branch seem to be really different, is the app going through big changes?

[-]

Psychological_Cry920@reddit

Ah Yes, we reworked the UI and application backend to scale for mobile in the upcoming sprints. - Switched from Electron to Tauri, this will help reduce the app size half, also enable us to build mobile targets as well. - No longer NextJS as frontend, it’s now React only. We are also moving off cortex and integrating llama.cpp server directly.

[-]

Ok_Appeal8653@reddit

You have to be careful author, I tried around a bit, and it was normal to aska a simple question, and it doesnt answer, its thought gets tuck until the end of the max anser lenght wiht stuff like:

[...]

Final Answer:

The mechanical power input to an induction generator is equal to the electrical power output. Therefore,

Pelectrical=Pmechanical

Pelectrical =Pmechanical

This equality represents the fundamental principle of energy conversion in such generators.

Final Answer:

The mechanical power input is equal to the electrical power output. Therefore,

Pelectrical=Pmechanical

Pelectrical =Pmechanical

This relationship holds under ideal conditions where there are no losses in the system.

Final Answer:

The mechanical power input is equal to the electrical power output. Therefore,

Pelectrical=Pmechanical

Pelectrical =Pmechanical

This equality represents the fundamental principle of energy conversion in an induction generator.

Final Answer:

The mechanical power input to an induction generator is equal to the electrical power output. Therefore,

Pelectrical=Pmechanical

Pelectrical =Pmechanical

This equality represents the fundamental principle of energy conversion in such generators.

Final Answer:

The mechanical power input is equal to the electrical power output. Therefore,

Pelectrical=Pmechanical

Pelectrical =Pmechanical

This relationship holds under ideal conditions where there are no losses in the system.

Final Answer:

The mechanical power input is equal to the electrical power output. Therefore,

Pelectrical=Pmechanical

Pelectrical =Pmechanical

This equality represents the fundamental principle of energy conversion in an induction generator.

[...]

[-]

Kooky-Somewhere-2883@reddit (OP)

I knew many issues with the model, this one probably isnt the model but some sampling issue

[-]

Psychological_Cry920@reddit

Hi u/Ok_Appeal8653, I think you got context_shift settings ON by default from previous version, please help me go to Settings > Provider -> Llama cpp -> Disable context shift in the setting.
You probably want to increase the context size of the model, it will prompt you to increase in the RC6 when the issue arise.

[-]

Ok_Appeal8653@reddit

Ok, I I think that this should be the problem,a s I am not using the beta version of the app right now, and I dont see this option. I will download the beta version and test ti later, thanks.

[-]

Psychological_Cry920@reddit

Yayy!

[-]

Psychological_Cry920@reddit

[-]

Psychological_Cry920@reddit

Also, please don't turn on so many MCP servers or MCP tools to reduce the input prompt, as this can quickly lead to an out-of-context size issue.

[-]

Ok_Appeal8653@reddit

In theory, as I don't have the beta version, the model doesn't have any tool activated.

[-]

Psychological_Cry920@reddit

Ah copy!

[-]

DMTJones@reddit

What is the tool you're using to run the novel in that video?

[-]

Psychological_Cry920@reddit

Hi u/DMTJones it's is Jan beta, with Serper MCP tool enabled (You can see it in the app beta's settings page).

https://jan.ai/docs/desktop/beta

[-]

haikusbot@reddit

What is the tool you're

Using to run the novel

In that video?

- DMTJones

^(I detect haikus. And sometimes, successfully.) ^Learn more about me.

^(Opt out of replies: "haikusbot opt out" | Delete my comment: "haikusbot delete")

[-]

Severe-Video3763@reddit

What MCP tools do you typically give it access to?

[-]

Psychological_Cry920@reddit

Hi u/Severe-Video3763 I currently use Serper because other tools might produce a very large output that requires a very high context length setting.

[-]

SelectionCalm70@reddit

Which datasets did you use to post train the model?

[-]

Severe-Video3763@reddit

Congrats on the launch. Any idea how Perplexity Sonar does on SimpleQA for comparison?

[-]

MagoViejo@reddit

Any way to make this work with open WebUI+ollama? I tried the beta windows build, but it seems to only support llama.cpp directly.

[-]

Kooky-Somewhere-2883@reddit (OP)

can you try webui with llama.server?

i have not tested ollama, but i heard from friends some of them have templating issues.

you can refer to my fix here, the same apply to ollama or lmstudio

https://huggingface.co/Menlo/Jan-nano-gguf/discussions/1

[-]

HilLiedTroopsDied@reddit

llamacpp server docker container, mount point to your model.gguf

[-]

MagoViejo@reddit

thanks , that may actually work.

[-]

HilLiedTroopsDied@reddit

it does work, I do it on my home server < unraid

[-]

MagoViejo@reddit

I have put that prompt into the system prompt, and see this kind of odd behaviour

\boxed{Naná (2016)}

when asking about film adaptations of Emil Zola books. I think I'm missing something else there, as just a model and a template will not make an mcp server, I think I need to configure the tools that will be executed somewhere, right?

[-]

Kooky-Somewhere-2883@reddit (OP)

you need to set up MCP

[-]

MaximaProU@reddit

A bit off topic but is Jan faster than Ollama?

[-]

Kooky-Somewhere-2883@reddit (OP)

Jan use llama.server you need to ask if llama.cpp is faster thsn ollama

[-]

MaximaProU@reddit

Ollama also uses llama.cpp Jan uses cortex.cpp which is based on llama.cpp

So is cortex.cpp faster than Ollama?

[-]

Kooky-Somewhere-2883@reddit (OP)

Currently Jan only uses cortex as a proxy, and soon we will be llama.server native

llama.cpp IS AMAZING!

[-]

Kooky-Somewhere-2883@reddit (OP)

and frankly i dont know 🤣, i think llama.server is clean and good and llama.cpp is amazing !!!!!

[-]

Yogeshwar_maya@reddit

Why it's not running in LM studio?

[-]

Kooky-Somewhere-2883@reddit (OP)

u can try https://huggingface.co/Menlo/Jan-nano-gguf/discussions/1#684e3b09078845bb1355901c

[-]

Yogeshwar_maya@reddit

Thanks!

[-]

ZiggityZaggityZoopoo@reddit

Yeah, obviously. Qwen has officially embraced MCP while DeepSeek couldn’t care less

[-]

kironlau@reddit

I can sucessfully load the model on Jan, but how to load it in other llama.cpp platform?
I cannot load and chat in LM Studio, seems it use a differenct chat template

[-]

Kooky-Somewhere-2883@reddit (OP)

https://huggingface.co/Menlo/Jan-nano-gguf/discussions/1#684e3b09078845bb1355901c

[-]

kironlau@reddit

thanks

[-]

Caffdy@reddit

maybe a dumb question, but what software is that on the video? what front'end

[-]

Psychological_Cry920@reddit

Hey, it's Jan, this our new beta version supports MCP, and work better with Jan-nano in the post.
https://jan.ai/docs/desktop/beta

[-]

Caffdy@reddit

took a quick glance at the project and look very interesting. Just one quick question, what is your privacy policy? do Jan collect or log chats or sends anything to an external server/service?

[-]

Psychological_Cry920@reddit

Hey we don’t upload any logs!!! Everything stay on your machine

[-]

myfavcheesecake@reddit

Hi!

I'm using:

Jan-beta_0.5.18-rc4-beta_x64-setup.exe

I'm Windows 11 and I'm unable to load any models.

However itself the stable version works though.

[-]

Psychological_Cry920@reddit

Hi u/myfavcheesecake, could you please share me the log file? I will take a look at this.

[-]

myfavcheesecake@reddit

Hey looks like the new beta solved this! Thanks!

[-]

Psychological_Cry920@reddit

Nicee! Thanks

[-]

Psychological_Cry920@reddit

Oops! I can reproduce here as well, on it!!!

[-]

Psychological_Cry920@reddit

Hi u/myfavcheesecake, we just pushed a new version to fix the issue. Could you please help us update the app again (App > Settings > Check for update) or download from here https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta

[-]

myfavcheesecake@reddit

Thanks! Works great now!

[-]

Psychological_Cry920@reddit

Yayyy!

[-]

ShinobuYuuki@reddit

Yeah, we switch the app over to Tauri instead of Electron build now, and that literally cut the app size down by more than half, it is also snappier imo

[-]

Optimal-Builder-2816@reddit

Are there any specific instructions for how to ask this model to use provided tools? Wondering if there's prompting guidance.

[-]

Kooky-Somewhere-2883@reddit (OP)

we baked some default prompt into the chat template dyou dont need to do much

[-]

Optimal-Builder-2816@reddit

Oh ok cool, does qwen normally need any additional prompting for tools?

[-]

Kooky-Somewhere-2883@reddit (OP)

you can use jan-nano without prompting, just some default mcp prompt (mcp come with prompt), for qwen i need to use very very long system prompt with multiple examples and it's still very hit and miss

[-]

Optimal-Builder-2816@reddit

Excited to check it out!

[-]

eggs-benedryl@reddit

Okay, so this is all over my head but did make me look in to MCP stuff.

I'm curious if I have this right. Before this agent use and MCP tools would be handled similarly to comfy ui or what have you, a series of automated handoffs query, to result, to query etc.

When I test my confirmed working MCP wikipedia tool, all prior models appear to complete the very first step and no further. This jan model appears to do it all, one query to the next.

Is that why this is so revolutionary? If what I'm getting from this Jan model wasn't nearly as simple and easy before.. Bravo this is dope.

[-]

Kooky-Somewhere-2883@reddit (OP)

thank you.

we just train the model on visiting the page when it needed using RL.

[-]

Phantomx_77@reddit

Particularly how much did the training cost?

[-]

Kooky-Somewhere-2883@reddit (OP)

we use internal cluster which is just a6000

if you rent in runpod i believe anything below 100$

[-]

Phantomx_77@reddit

Nice! I have rtx 4050 in my laptop and what can i expect with it?

[-]

Kooky-Somewhere-2883@reddit (OP)

it should work

[-]

Unhappy-Branch3205@reddit

Awesome! Great job!

[-]

Kooky-Somewhere-2883@reddit (OP)

thank you ❤️

[-]

Tricky_Reflection_75@reddit

What MCP is show cased in the video?

[-]

Kooky-Somewhere-2883@reddit (OP)

serper mcp

[-]

TheREXincoming@reddit

Damn the model is so smol and so gud

[-]

Kooky-Somewhere-2883@reddit (OP)

thank you 😍

[-]

Admirable-Bedroom-65@reddit

Are you planning to add reasoning to this anytime soon?

[-]

Kooky-Somewhere-2883@reddit (OP)

probably but need a lot of change

[-]

Edzomatic@reddit

I just tried it (Q8) with Jan UI and web search with serper worked really really well, however with the other tools like browsermcp its performance was quite poor but I guess that's to be expected from a 4b model.

Also since the model is finetuned from qwen3 is it supposed to have thinking? Because it didn't think during my testing

[-]

Kooky-Somewhere-2883@reddit (OP)

we trained the model on nonthink so it might not perform well with think

[-]

Asleep-Ratio7535@reddit

Is this Jan? It looks so different from before.

[-]

Psychological_Cry920@reddit

Yes, we completely revamped Jan from the UI to the application backend.It's now quite customizable in terms of appearance, so Alan changed the settings as shown in the video. For now, it's quite lightweight in terms of size compared to previous versions (it's still heavy on Linux, and we're working on it).

[-]

SaratogaCx@reddit

I just tried out the beta and please let me go full width in the chat interface. Every AI chat seems to have decided that I can't pick my own line length and it sucks. The released version lets me set the width to fill the window but I couldn't find a setting in rc5 that would let me (I could have missed it though).

[-]

Asleep-Ratio7535@reddit

Great job. I am enjoying it right now. It looks great! Great job. I found a bug (?) in the default models from remote providers. It's hardcoded. Deleting one model just puts it to the end of the queue. But the mcp looks very decent. I think this is a killer now. I will use Jan for the API now.

[-]

Psychological_Cry920@reddit

Woho! Thanks for the bug report. Noted it down here 🚀🚀🚀

[-]

epSos-DE@reddit

Nice interface. It needs a canvas mode and a folder with canvas files for the canvas mode !

[-]

Kooky-Somewhere-2883@reddit (OP)

hope we will have that soon

[-]

MichaelBui2812@reddit

This really interesting given that the model is quite small to run in almost any consumer PC. I have some questions:

For GGUF, how much does each quant affect the quality of the response? E.g., is it q6 is still almost same with q8 like other GGUF models?
I do understand this use MCP for fetching details for its context, is there anyway it can use browser (e.g., headless Chrome) to fetch/crawl data instead because Google, Serp, Brave,… APIs are “not so cheap”? 😅
How smart it is for the AI decide to search or read PDF when prompted? Do we need to explicitly mention searching web content or reading documents for it to follow?

Thanks a lot!

[-]

Kooky-Somewhere-2883@reddit (OP)

yes it can use browser just use browser use mcp.

you just need a hard question, the model will decide it want to read the details purely agentic.

[-]

SupeaTheDev@reddit

Very interesting. Stuff like this is what might change the world the most. We need LLM cost closer to zero to have agents everywhere in the world (which I believe will Be the end result)

[-]

Dr4x_@reddit

Hi, I'm not an expert so I'm trying to understand what is the breakthrough here. If I recall correctly qwen3 models are already good at MCP tool calling, so from what I understand the improvement is about better information extraction from a response provided by a research tool. But I feel I'm missing something

[-]

quan734@reddit

i think they ran a ReCall/ReSearch RL on top of qwen3-4b so its better at multi hop search, not just for MCP/Tool calling

[-]

nguyenm@reddit

80% reason of why I pay for a GPT Plus subscription, and thinking of switching to Gemini Pro, is the Deep Research function. This is pretty solid for a developmental build, so keep up the good work. Although I would take a wild guess that context length would be plausible bottleneck if the search query action produces less-than-stellar results.

With the o3 Deep Research, sometimes it shows the thinking token as it can watch videos and listen to audio but I am skeptical of such. For context I was asking it if Thanos had snapped to the Adam's Family song, the original one, how much of us fleshbags would there be left. The thinking tokens mentioned it tried to watch the 1960s video of the theme song, then produced a result of 33 snaps after which there'd be 8 humans left on earth.

Tangentially, I almost always default to Xanh SM now because of predictability in bike model and slightly-more professional drivers (due to the strictness of Vin Group management lol).

[-]

IrisColt@reddit

For context I was asking it if Thanos had snapped to the Adam's Family song, the original one, how much of us fleshbags would there be left.

🤣 Sudden comedy gold.

[-]

sonik13@reddit

I've been using AI incorrectly this whole time.

[-]

mynameismypassport@reddit

Incorporating the 'Addams reasoning benchmark' into future tests for sure.

[-]

IrisColt@reddit

By the way, tricky question (the snapper should also be considered in the population). Added to my battery of excellent questions. Thanks!!!

[-]

Kooky-Somewhere-2883@reddit (OP)

Yes you are on point! we will need to evaluate more on context len issue and more deepresearch related development, and also, bigger mode.

Stay tuned!!!

[-]

Latter_Virus7510@reddit

🔥🥰❤️

[-]

Kooky-Somewhere-2883@reddit (OP)

🥰🥰🥰

[-]

SandboChang@reddit

maybe a silly question as I am not familiar with MCP: Do you need to pair it with a specific search engine like Google for it to work? Or does it crawl the internet on its own? My understanding is that you need to have some index for a search to be effective, so I wonder what is used in this aspect.

Or, does the search here is looking over a local data base?

[-]

Kooky-Somewhere-2883@reddit (OP)

you can also use local rag database if you have as long as it has mcp

[-]

Kooky-Somewhere-2883@reddit (OP)

it needs a search engine MCP like serper mcp

[-]

Swimming_Power_2960@reddit

I am a bit confused. what exactly is it better at compared to all the other models mentioned in the post? Because the model itself can't bet better right? Because its super small compared to a huge model like the ones listed in the "SimpleQA" part of the post. I am genuinely confused about why and how this model is so special.

[-]

Kooky-Somewhere-2883@reddit (OP)

you can try setting up qwen4B with some mcp and this model you will see the difference immediately

[-]

Swimming_Power_2960@reddit

https://imgur.com/a/pFyoUuB maybe i am using it incorrectly but so far i can't say i am impressed.

[-]

Swimming_Power_2960@reddit

So its just extremely good at tool calling for its size? Is that what makes it special?

[-]

qnixsynapse@reddit

Awesome. This tiny 2.3GB models calls tools like Pro man!

[-]

Kooky-Somewhere-2883@reddit (OP)

Born This Way

[-]

ForceItDeeper@reddit

that would make this a pretty solid choice for something like homeassistant, right?

Ive havent paid much attention lately and Ive got some reading up to do. Im just learning aboot MCP, and between that and the capabilities of a 4B model makes me think LLMs have gotten to the point where they are able to mostly everything id want it to.

Thanks for the time and work put into this. I love the idea of AI without sacrificing every bit of privacy and I appreciate all you peeps smarter than me working to make that happen

[-]

Kooky-Somewhere-2883@reddit (OP)

This is pretty amazing, i think it will make something like amazon alexa but like much much smarter?

[-]

You_Wen_AzzHu@reddit

How to enable deep research on the Linux UI?

[-]

Kooky-Somewhere-2883@reddit (OP)

because the model is very under prompt, you can use some prompting + mcp (search api mcp) it will carry out the research for you, just ask it to research deeply etc.

[-]

You_Wen_AzzHu@reddit

i sense some config is missing

[-]

Kooky-Somewhere-2883@reddit (OP)

i have sample vllm serve command for u

vllm serve Menlo/Jan-nano --host 0.0.0.0 --port 1234 --enable-auto-tool-choice --tool-call-parser hermes --chat-template ./qwen3_nonthinking.jinja

[-]

You_Wen_AzzHu@reddit

fixed tool use on model added serper api key. but the research stop after google search

[-]

Kooky-Somewhere-2883@reddit (OP)

oh this vllm version we didn't include the deepresearch default prompt, can you try include this.

```
In this environment you have access to a set of tools you can use to answer the user's question. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.

Tool Use Rules

Here are the rules you should always follow to solve your task:

Always use the right arguments for the tools. Never use variable names as the action arguments, use the value instead.
Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.
If no tool call is needed, just answer the question directly.
Never re-do a tool call that you previously did with the exact same parameters.
For tool use, MARK SURE use XML tag format as shown in the examples above. Do not use any other format.
```

[-]

You_Wen_AzzHu@reddit

Added to instructions section of assistant. Same issue. Changed the one tool per message to all tools, same issue.

[-]

Kooky-Somewhere-2883@reddit (OP)

oh i see can you reduce the number of tools? just keep web search and scrape and retry too many can be confusing for the model atm

[-]

You_Wen_AzzHu@reddit

Added to Jinja. Same issue . Serper is requested since I say request # increase on dashboard. But serper research results are not shown on UI. Jan-beta log shows found tool google_search in server and nothing else. Will try to limit the tools. Thanks.

[-]

Kooky-Somewhere-2883@reddit (OP)

this is very strange, i recommend you to download the llama.server version inside jan to test and compare, maybe something is off

[-]

Kooky-Somewhere-2883@reddit (OP)

use q8

[-]

You_Wen_AzzHu@reddit

My mistake . The Google _search tool works in deed but since the result is very minimal I mistakenly thought it would do further tool calls to other tools.

[-]

Kooky-Somewhere-2883@reddit (OP)

damn have u considered the answer is quite sufficient? cuz if it found the information is enough it wont try

[-]

Kooky-Somewhere-2883@reddit (OP)

click on this, and endable tool use

[-]

Kooky-Somewhere-2883@reddit (OP)

[-]

You_Wen_AzzHu@reddit

[-]

Aromatic_Fun_6118@reddit

How did you set it up like that? Is there any guide on how to do that for beginners? :) Thank you for your work!

[-]

Kooky-Somewhere-2883@reddit (OP)

you can download the beta jan app that is being posted around this comment section

[-]

Psychological_Cry920@reddit

Thank you for your kind words. We build in public so that please feel free to join us in some of the discussions. Yes, we’ll work on some blog posts on this shift 🚀🚀

[-]

Kooky-Somewhere-2883@reddit (OP)

It's quite funny seeing Jan-nano talking about .... itself

[-]

nerdyvaroo@reddit

I wake up I decide to make a new project using a local llm to prove that they are good I get presented with a smol llm and a UI FERB I GOT AN IDEA.

[-]

Kooky-Somewhere-2883@reddit (OP)

Candance ?

[-]

nerdyvaroo@reddit

candace nuts fit in yo mouthh?

[-]

vinigrae@reddit

Goteem

[-]

Kooky-Somewhere-2883@reddit (OP)

ew

[-]

nerdyvaroo@reddit

;-; its the joke for when someone says candace... Apologies.

[-]

spiffco7@reddit

Thank you team Menlo

[-]

Kooky-Somewhere-2883@reddit (OP)

omg thank you for trying out ❤️

[-]

Clueless_Nooblet@reddit

I tried with Jan-nano, endless "loading model" loop. what am I doing wrong?

[-]

Psychological_Cry920@reddit

Hi u/Clueless_Nooblet, which beta version are you using there? We just patched a model loading issue on the new beta version here. Please help me update to this version https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta

[-]

Clueless_Nooblet@reddit

The google search tool seems to have problems, though. Even with rc6

[-]

Psychological_Cry920@reddit

Ah you will need a Serper API Key inputted in the MCP server env. Otherwise it’s just a placeholder key 😂

[-]

Clueless_Nooblet@reddit

Oh my, I'll have to look all that stuff up. Thanks a lot for your help :)

[-]

Psychological_Cry920@reddit

Haha sorry for the inconvenience caused. We will think for a better UX (e.g. built-in free endpoint setup)

[-]

Clueless_Nooblet@reddit

It's fine! I know your pain, I created Writingway (also on github). AI is still early days, we're suffering through all kinds of problems still ;)

Thanks a lot!

[-]

Psychological_Cry920@reddit

Sweeeet!!

[-]

Clueless_Nooblet@reddit

Hi, I just updated to rc6, and it seems to work now!

[-]

Psychological_Cry920@reddit

Yayyy!

[-]

Kooky-Somewhere-2883@reddit (OP)

oh wait i realized this is apl issue

[-]

Kooky-Somewhere-2883@reddit (OP)

please use the recommended sampling params of qwen3

[-]

ImprovementMedium716@reddit

Add help guides for how to integrate mcp servers please

[-]

Psychological_Cry920@reddit

Heyy, please help me go to Settings > MCP Servers. You will see all default servers there. Start with fetch (good to start with) then later Serper when u grab an api key from their website and fill it there

[-]

WhoKnows_Maybe_ImYou@reddit

Can I use this model with Ollama and Open-webui?

[-]

Kooky-Somewhere-2883@reddit (OP)

yes but i believe ollama has some template issue now

[-]

kyznikov@reddit

how do i use the google search and scrapping feature? when i ask the model, it says it has limited access and can't access internet. im using Jan app with your model jan-nano-4b-Q8_0

[-]

Psychological_Cry920@reddit

Hi u/kyznikov so sorry for that, we patched the issue just now, can you help updating the app or download the new beta version from this link

https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta

[-]

kyznikov@reddit

it works now but the model can't access the internet, what's wrong?

[-]

Kooky-Somewhere-2883@reddit (OP)

oh there is something wrong with mcp, very strange you see the error returned for some reason it cant search

[-]

kyznikov@reddit

i think it works now. i didn't know i had to open chrome, install browser mcp extension, and had it open for it to work

[-]

Kooky-Somewhere-2883@reddit (OP)

amazing, but browser use can use a lot of VRAM, using serper and scrape is the way to go (if you have the api key)

[-]

Psychological_Cry920@reddit

Hmm weird. Do you have a valid Serper API key there. Help me check it again to see if there is any typo there

[-]

kyznikov@reddit

it works now, but it seems it can't access the internet. what's wrong?

[-]

DesperateAdvantage76@reddit

How do you keep Google's AI response at the top of the search from skewing your results?

[-]

Kooky-Somewhere-2883@reddit (OP)

we dont have this objective, its optimized for the correct response

[-]

Psychological_Cry920@reddit

Hi everyone 👋, we just patched the model run issue on Windows, please help us update the beta build or download from here. So sorry for that!
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta

[-]

Kooky-Somewhere-2883@reddit (OP)

so fast

[-]

Psychological_Cry920@reddit

Thanks! Mate

[-]

Kooky-Somewhere-2883@reddit (OP)

amazing

[-]

Jack_Fryy@reddit

What chat app is that in the video?

[-]

Psychological_Cry920@reddit

Hi u/Jack_Fryy, it's Jan - beta version, please help us try it out here.
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta

[-]

Jack_Fryy@reddit

Cool will try check it out

[-]

Psychological_Cry920@reddit

Yayy!

[-]

Psychological_Cry920@reddit

Hello everyone! This is Louis from the Menlo Research team, currently contributing to Jan App. I’m thrilled to see that Jan-nano has received such positive feedback from everyone. I’d love to ask for your help giving our beta app a few shots, the more feedback we receive, the better our upcoming release will be. 🙏 Here is the beta build link:

https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc4-beta

Thank you everyone.

[-]

hungry_hipaa@reddit

Is there a plan to support Apple Silicon? It shows it's not detecting a GPU. Also being a little lazy here and will look into it but can I run Serper and the other tools? This is on a MBP M2 Max 96GB. Wish you all the best on this project and will be following closely even if it has to be on a non Mac :)

[-]

Psychological_Cry920@reddit

Hi it supports Apple Sillicon by default. You got a monster there!!!!! Woho

[-]

Kooky-Somewhere-2883@reddit (OP)

Hello Louis this is Alan

[-]

Psychological_Cry920@reddit

Hi mate!!!!

[-]

MikeBirdTech@reddit

Love this 🥰

Menlo keeps adding value to open source AI

[-]

Psychological_Cry920@reddit

Thank you!

[-]

Kooky-Somewhere-2883@reddit (OP)

thank you ❤️

[-]

whisgc@reddit

This is awesome work. Thanks and kudos to the entire team. All the best

[-]

Psychological_Cry920@reddit

Thank you!

[-]

Kooky-Somewhere-2883@reddit (OP)

thank you ❤️

[-]

xxPoLyGLoTxx@reddit

Looks interesting - thanks for creating.

The processing seems to be referencing URLS. Is the model entirely local or does it access the web?

[-]

Kooky-Somewhere-2883@reddit (OP)

it accessing the web through a serper MCP (tool to scrape web and do google search)

[-]

xxPoLyGLoTxx@reddit

OK thanks for answering! Might be a dumb question, but is there any way to ease woes about privacy of the prompts?

This looks really cool as a perplexity alternative and I'm excited to try it. But I do value privacy.

[-]

Kooky-Somewhere-2883@reddit (OP)

there are many web search api provider, maybe you try brave api instead of google serper?

[-]

Loighic@reddit

How can I set up tool calling like you have here?

[-]

Thin_Protection9395@reddit

How are people using MCP with local models these days? I use the openai python api mostly, but I feel like there might be a more up-to-date way?

[-]

Kooky-Somewhere-2883@reddit (OP)

I use FastMCP its very cool chec it out!

[-]

Exarch_Maxwell@reddit

What's the UI called?

[-]

Psychological_Cry920@reddit

Hi it's Jan, please help us give it a few shots. Here is the beta build link:
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc5-beta

[-]

Asleep-Ratio7535@reddit

https://github.com/menloresearch/jan

[-]

AdditionalWeb107@reddit

I am curious what is the baseline performance without MCP. Why does MCP help improve performance?

[-]

Kooky-Somewhere-2883@reddit (OP)

it google search better, so it finds more information to answer you better.

Just like human coder is better with stackoverflow.

[-]

AdditionalWeb107@reddit

Oh I see. So in essence the tool calling capabilities are enhanced and hence the quality of input to the model?

[-]

Kooky-Somewhere-2883@reddit (OP)

sorta, but even so the base model doesn't google search as good for example. so it makes better decisions to use tools

[-]

Psychological_Cry920@reddit

Everyone, there's a regression issue with the Jan beta windows build. We're working on a new beta build to patch it (estimated time: 1-2 hours). So sorry for this.

[-]

AppearanceHeavy6724@reddit

Very strange to see SimpleQA used with RAG.

[-]

Kooky-Somewhere-2883@reddit (OP)

well if you consider google search is RAG

[-]

AppearanceHeavy6724@reddit

Of course it is. Why? Everything that pulls something into context from some external storage and summarizes it is form of RAG.

[-]

Kooky-Somewhere-2883@reddit (OP)

Well then yes, cuz perplexity also using this benchmark!! if you view it that way. We just aim to have some relevant counter parties to benchmark to.

[-]

kiruz_@reddit

I tried to use that Jan app together with your model but whenever I download Q8 model + try to run it - it never loads. I checked my resources and looks like it doesn't try to load anything as GPU and RAM stays the same. Im on 9950x3d + 5090. LLM studio works fine for me. Any idea how to debug your app? I tried that beta version.

[-]

Psychological_Cry920@reddit

Hi u/kiruz_ we noticed a regression issue with Windows, build. We will push a new build in 1-2 hours. Please help me update and try again then. Will let you know.

[-]

kiruz_@reddit

Lovely, thanks for quick prompt! Will check for sure after new build

[-]

RedditDiedLongAgo@reddit

Why the Corpos always use throwaway accounts?

[-]

Kooky-Somewhere-2883@reddit (OP)

i am well and alive person :) i'm the first author of most paper from menlo

jeez

[-]

Kooky-Somewhere-2883@reddit (OP)

and we are a small startup as well, not corpo

[-]

Outside-Ordinary3603@reddit

wait a minute you are stating this right?

**THIS LOCAL 4B MODEL CAN:**

UNDERSTAND ANY SHORT FACT SEEKING QUESTION IN ANY DOMAIN (e.g. who was the author of the painting Guernica?
SEARCH ONLINE FOR THE ANSWER
GIVE THE CORRECT ANSWER 85% OF THE TIME OUTPERFORMING THE BEST LLMS AVAILABLE

correct?

[-]

Kooky-Somewhere-2883@reddit (OP)

current number is 80.7% we aim to get 85 in next version

[-]

RedditDiedLongAgo@reddit

So... no?

[-]

Kooky-Somewhere-2883@reddit (OP)

i just want to correct the 85% in the comment

[-]

Kooky-Somewhere-2883@reddit (OP)

and its on simpleqa, not literally everything

[-]

arleth94@reddit

Chúc mừng! 🎉

[-]

Psychological_Cry920@reddit

Cảm ơn nhé!!

[-]

Kooky-Somewhere-2883@reddit (OP)

Cảm ơn 😍

[-]

bwjxjelsbd@reddit

Does this support MLX?

[-]

Kooky-Somewhere-2883@reddit (OP)

We have not converted it to mlx since Jan is not supporting MLX right now.

Feel free to convert the base weight to mlx.

[-]

NoPresentation7366@reddit

Wow i'm super impressed, I've made some inferences and for such a small one, it's very impressive. Thank you for your works mates! 💗😎

[-]

liquidnitrogen@reddit

just tried this on LM Studio on Mac and got `Error rendering prompt with jinja template: "Error: Parser Error: Expected closing statement token. OpenSquareBracket !== CloseStatement.`

[-]

Kooky-Somewhere-2883@reddit (OP)

hi please check this solution

https://www.reddit.com/r/LocalLLaMA/s/VzIp5wqihF

[-]

liquidnitrogen@reddit

Thank you, it works great now !!!

[-]

RichardPinewood@reddit

What is that model equivalent to OpenAI / Claude ?

[-]

lyhiving@reddit

Nice found

[-]

RIPT1D3_Z@reddit

Great job!

[-]

Psychological_Cry920@reddit

Thanks!!! The team worked very hard on this. It would be great if you can try our app beta version as well. https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc4-beta

[-]

Due-Memory-6957@reddit

What the hell was this music lol

[-]

Kooky-Somewhere-2883@reddit (OP)

WHATEVER IT TAKES!

[-]

CritStarrHD@reddit

whoa! im curious is this the best open source deep research model available? what other options do we have?

[-]

Kooky-Somewhere-2883@reddit (OP)

there are many ways to do deep research, this model excels just in agentic way.

smolagent, for example is pretty good in plan based or workflow based.

[-]

CritStarrHD@reddit

nice! im tryna run the model on jan beta and it seems to be stuck on loading the model, anyway to fix this?

[-]

notwhobutwhat@reddit

Amazing effort, this is the sort of thing that makes me wonder if smaller language models that can leverage tools effectively is the future of personal AI use cases.

Burning massive amounts of cash on hardware and power that can host 601B (or 1T+ in the case of GPT) parameters, when something like this can collate plenty of relevant context from tools then sort it and understand it effectively, seems completely wasteful.

(PS loving Jan to date, but stacking it with this model out of the box as a turnkey AI solution for personal use? Game changer.)

[-]

stoppableDissolution@reddit

Bunch of specialized small models >>> one huge generalist. Bet this is the way the field is going to advance.

[-]

Psychological_Cry920@reddit

Yessss! This is Louis from the Jan team. We’re working on packaging our flagship models with Jan to help handle some of the common, repetitive, everyday tasks the team deals with.

[-]

Kooky-Somewhere-2883@reddit (OP)

Probably Jan team will do this!!

Love the new ui (Im from research from Menlo)

[-]

Muted-Celebration-47@reddit

How to use mcp in jan?

[-]

Psychological_Cry920@reddit

Hi u/Muted-Celebration-47, please help download the beta version from the link below, then go to settings -> MCP Servers. You will see some default servers there, and you can add your favorite servers from there as well.

https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc4-beta

[-]

Kooky-Somewhere-2883@reddit (OP)

yes mcp only in beta Jan now

[-]

ExplanationEqual2539@reddit

Crazy bruh

[-]

Kooky-Somewhere-2883@reddit (OP)

BRUH

[-]

osamaromoh@reddit

BRUHH

[-]

Kooky-Somewhere-2883@reddit (OP)

bruhhh

[-]

osamaromoh@reddit

I’m gonna test your model in a bit. How is it doing with structured outputs?

[-]

Kooky-Somewhere-2883@reddit (OP)

should be doing very well? input and output of tool use is in xml and json

[-]

osamaromoh@reddit

I’m gonna integrate it into my PydanticAI workflow and give it a test.

[-]

Kooky-Somewhere-2883@reddit (OP)

i dont think it will do good 🤣but hopefully, please let us know the result

[-]

ozzie123@reddit

What’s this UI ur using?

[-]

Psychological_Cry920@reddit

Hi u/ozzie123 this is Jan app (still in beta), we also trained the model in the post.

[-]

Psychological_Cry920@reddit

Can you help us give it a try? Beta build below, thanksss!
https://github.com/menloresearch/jan/releases/tag/v0.5.18-rc4-beta

[-]

JLeonsarmiento@reddit

Excellent.

[-]

Kooky-Somewhere-2883@reddit (OP)

You're welcome <3

[-]

jasonhon2013@reddit

Wowwwww that’s insane

[-]

Kooky-Somewhere-2883@reddit (OP)

🤯

[-]

jasonhon2013@reddit

But why 4B that slow 🤔🤔mistral 7B seems faster

[-]

Kooky-Somewhere-2883@reddit (OP)

its pretty fast on my rtx a2000, i have not really tried mistral these days

[-]

jasonhon2013@reddit

Ahhh icic mind if I ask how many token per s ?

[-]

maifee@reddit

What UI is this??

[-]

Psychological_Cry920@reddit

Hi u/maifee this is Jan - (Beta - Apache 2.0). We trained the model in the post: https://github.com/menloresearch/jan

[-]

Psychological_Cry920@reddit

The app is still in its Beta phase, please help us try it out here:

Windows: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_x64-setup.exe
macOS Universal: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_universal.dmg
Linux Deb: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.deb
Linux AppImage: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.AppImag

[-]

maifee@reddit

Is this app open sourced??

[-]

Commercial_Key_9023@reddit

yeah definitely!

check it out here: https://github.com/menloresearch/jan

[-]

gowisah@reddit

Thank you. I will try this out. Really interesting.

[-]

Kooky-Somewhere-2883@reddit (OP)

thank you we worked hard on it

[-]

Beb_Nan0vor@reddit

Can you share the MCP tools you are using for this? And also, does Jan work with the RTX 50 series GPUs? I am trying the latest Jan beta and its been stuck on loading the model (or any model for that matter).

[-]

Psychological_Cry920@reddit

Hi u/Beb_Nan0vor, this is Serper MCP Server. It should be. Do you have any advanced settings for the model? Or better yet, could you share some logs with me so I can take a look?

[-]

Beb_Nan0vor@reddit

Thanks for the reply. I'll send it in a direct message.

[-]

Kooky-Somewhere-2883@reddit (OP)

i use serper https://github.com/marcopesani/mcp-server-serper

yes jan support latest cuda!

[-]

Beb_Nan0vor@reddit

Thank you! :)

[-]

aknight2015@reddit

I've got an older laptop with 8 gigs of RAM. Can I run this locally?

[-]

Kooky-Somewhere-2883@reddit (OP)

yes can! use the model i recommended, but dont keep your hope too high for Q4 even the ixs version

[-]

aknight2015@reddit

I'm pretty flexible with my expectations. For tough stuff I use other AIs.

[-]

yoracale@reddit

Congrats guys this is amazing work!

[-]

Kooky-Somewhere-2883@reddit (OP)

Thanks!! we worked hard on it

[-]

sunshinecheung@reddit

wow, when can we use deep-research in jan?

[-]

Kooky-Somewhere-2883@reddit (OP)

you can use it now but the build is beta you can find it in this somewhere in this thread

[-]

sunshinecheung@reddit

can you give me the link of beta thx

[-]

Psychological_Cry920@reddit

Hi u/sunshinecheung,
Here are the links to our Beta build, we’d love for you to try it out!

Windows: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_x64-setup.exe
macOS Universal: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_universal.dmg
Linux Deb: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.deb
Linux AppImage: https://delta.jan.ai/beta/Jan-beta_0.5.18-rc4-beta_amd64.AppImag

[-]

SillyLilBear@reddit

Have you tried how it performs with 8B and 14B?

[-]

Kooky-Somewhere-2883@reddit (OP)

Yes we did!

I will include this part in technical report.

Funny enough at first the 4B outperform 8B and 14B due to 8b and 14b overthink on using parameters, we made some changes and its scaling logically, 8B and 14B will perform better.

there is some learning on this part too, details coming very very soon!!

[-]

Lankonk@reddit

That's really impressive!

[-]

PowerBottomBear92@reddit

Is there a guide for how to download this?

[-]

Perfect-Category-470@reddit

This is sickkkkkk!!!

[-]

Kooky-Somewhere-2883@reddit (OP)

YES IT IS!!!!

[-]

Honest_Ad_7497@reddit

wow this is wild.

[-]

Kooky-Somewhere-2883@reddit (OP)

It's deepseek-v3 latest version guy sr i think i forgot to type v3 into the post

[-]

Only_Situation_4713@reddit

I'm getting a jinja error in LM studio :(

[-]

Kooky-Somewhere-2883@reddit (OP)

hi right now you can temporarily fix it by (repost from a comment of mine on huggingface)

Hi you can use Qwen3 template from other lmstudio compatible model but remember to disable "thinking" and add this system prompt when using

```

In this environment you have access to a set of tools you can use to answer the user's question. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.

Tool Use Rules

Here are the rules you should always follow to solve your task:

Always use the right arguments for the tools. Never use variable names as the action arguments, use the value instead.
Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.
If no tool call is needed, just answer the question directly.
Never re-do a tool call that you previously did with the exact same parameters.
For tool use, MARK SURE use XML tag format as shown in the examples above. Do not use any other format.

```

In the meantime we will try to see if we can fix the gguf

Enjoy

[-]

Only_Situation_4713@reddit

sweet it worked, thanks

[-]

Kooky-Somewhere-2883@reddit (OP)

amazing!! remember we train the model on non-think objective, so please not enable thinking

i suggest use sequential thinking MCP or sth like that will give you the desired effects if you want the model to think more between tool calls!