iamMess

This is the wrong use case for adding knowledge to a model. Use RAG. If you really want to go down this path, then [runpod.io](http://runpod.io) and [modal.com](http://modal.com) is your best bet. You can even do it serverless if your users are ok with a little cold boot time.

Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with Over 30 Billion Parameters and Support Most European Languages

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 42 comments

[-]

iamMess@reddit

8k context. DoA.

Phantom-fragment

Posted by Ok_Horror_8567@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

iamMess@reddit

Bro 80% of your README is about how it’s faster than docker. Which means nothing. I don’t know what it does other than you vibe coded the fuck out of this.

Phantom-fragment

Posted by Ok_Horror_8567@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

iamMess@reddit

Why does this look like vibe coded from hell?

Phantom-fragment

Posted by Ok_Horror_8567@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

iamMess@reddit

What?

Phantom-fragment

Posted by Ok_Horror_8567@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

iamMess@reddit

What

Local Meeting Notes with Whisper Transcription + Ollama Summaries (Gemma3n, LLaMA, Mistral) - Meetily

Posted by Sorry_Transition_599@reddit | LocalLLaMA | View on Reddit | 9 comments

[-]

iamMess@reddit

They are not. Could easily be if someone had good data for it though.

axolotl vs unsloth [performance and everything]

Posted by Shivacious@reddit | LocalLLaMA | View on Reddit | 26 comments

[-]

iamMess@reddit

Axolotl is currently the fastest framework. It takes a little more to set up, but it’s still really easy to use.

🚀 OpenAI released their open-weight models!!!

Posted by ResearchCrafty1804@reddit | LocalLLaMA | View on Reddit | 571 comments

[-]

iamMess@reddit

pretty sick stuff

The "Leaked" 120B OpenAI Model Is Trained In FP4

Posted by Few_Painter_5588@reddit | LocalLLaMA | View on Reddit | 132 comments

[-]

iamMess@reddit

Yes

100x faster and 100x cheaper transcription with open models vs proprietary

Posted by crookedstairs@reddit | LocalLLaMA | View on Reddit | 23 comments

[-]

iamMess@reddit

How about adding canary-qwen to the post?

Drummer's Mixtral 4x3B v1 - A finetuned clown MoE experiment with Voxtral 3B!

Posted by TheLocalDrummer@reddit | LocalLLaMA | View on Reddit | 15 comments

[-]

iamMess@reddit

Thanks. Seems like no one had luck with that part yet, and Mistral is notorious for not providing help 😂

Drummer's Mixtral 4x3B v1 - A finetuned clown MoE experiment with Voxtral 3B!

Posted by TheLocalDrummer@reddit | LocalLLaMA | View on Reddit | 15 comments

[-]

iamMess@reddit

Have you had any luck finetuning voxtral for actual transcriptions?

Voxtral WebGPU: State-of-the-art audio transcription directly in your browser!

Posted by xenovatech@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

iamMess@reddit

No. It’s not trained for it. Would be rather easy to make though, if someone figures out how to fine tune it.

I made a 1000 hour NSFW TTS dataset

Posted by hotroaches4liferz@reddit | LocalLLaMA | View on Reddit | 152 comments

[-]

iamMess@reddit

It’s from the google tts model.

mistralai/Voxtral-Mini-3B-2507 · Hugging Face

Posted by Dark_Fire_12@reddit | LocalLLaMA | View on Reddit | 94 comments

[-]

iamMess@reddit

How to finetune this?

Well, if anyone was waiting for Llama 4 Behemoth, it's gone

Posted by Ok-Elevator5091@reddit | LocalLLaMA | View on Reddit | 154 comments

[-]

iamMess@reddit

That is some retarded compliance requirements.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

iamMess@reddit (OP)

😂

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

iamMess@reddit (OP)

Then we would go over a dollar for computer 😀

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

iamMess@reddit (OP)

Qwen3 is also a great model. As mentioned previously, this is less about the performance and more about the method. If we went for full performance we would have chosen other models and probably also spent a lot more time improving the dataset.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

iamMess@reddit (OP)

That is true. A more nuanced baseline might have been asking it to CoT then provide answer. To be honest I don't think it will improve much. The original emotion dataset is very hard even for humans.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

iamMess@reddit (OP)

Will do :)

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

iamMess@reddit (OP)

We tried your method, but it doesn’t really work. Rather it thinks about the instruction you gave it, which we do not want. Yes, the model is small and the reasoning is complex, but we still see a decent improvement. We also mention in the paper that using a larger model would probably yield better results.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

iamMess@reddit (OP)

Also a possibility, and possibly better performance. It doesn’t provide the explainability though. Our reasoning gen model can also be used to augment other dataset with reasoning. For example, there is a big need for multi turn reasoning dataset, which currently (to my knowledge) does not exist.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

iamMess@reddit (OP)

Yeah. We’re also working on a better TTS and STT model using llama3 as a base model. We’ve considered using Qwen, but they are not as multilingual as the llama models.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

iamMess@reddit (OP)

We used LLaMA because they are well supported and easy to train. I'm certain that using SOTA models would improve performance, but it would cost us a lot more if we need to train a 600b model than 1b model. Also this is more about the method than the actual performance. It can easily be scaled by changing the model to a better one :)