iamMess

I fine-tuned Cohere Transcribe to support diarization and timestamps

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 25 comments

I fine-tuned Cohere Transcribe to support diarization and timestamps

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 25 comments

I fine-tuned Cohere Transcribe to support diarization and timestamps

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 25 comments

iamMess@reddit (OP)

It's hard to find real training data that is well labelled for that many speakers. You can generate it synthetically, but it's not the same quality.

I fine-tuned Cohere Transcribe to support diarization and timestamps

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 25 comments

Deepseek v4 pricing is genuinely silly, did the math and now i am questioning my entire stack

Posted by Skid_gates_99@reddit | LocalLLaMA | View on Reddit | 77 comments

Fine-tuned Qwen3 SLMs (0.6-8B) beat frontier LLMs on narrow tasks

Posted by Jolly-Gazelle-6060@reddit | LocalLLaMA | View on Reddit | 82 comments

Unsloth announces support for finetuning embedding models

Posted by -Cubie-@reddit | LocalLLaMA | View on Reddit | 18 comments

Someone from NVIDIA made a big mistake and uploaded the parent folder of their upcoming model on Hugging Face

Posted by Nunki08@reddit | LocalLLaMA | View on Reddit | 165 comments

ELI5: why does nvidia always sell their consumer gpus below market price?

Posted by GreenTreeAndBlueSky@reddit | LocalLLaMA | View on Reddit | 18 comments

Speculative Decoding is AWESOME with Llama.cpp!

Posted by simracerman@reddit | LocalLLaMA | View on Reddit | 61 comments

How to post-train LLM with tokenizer replacement?

Posted by Objective-Good310@reddit | LocalLLaMA | View on Reddit | 2 comments

Deepinfra sudden 2.5x price hike for llama 3.3 70b instruction turbo. How are others coping with this?

Posted by parmarss@reddit | LocalLLaMA | View on Reddit | 25 comments

[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business

Posted by Competitive-Wing1585@reddit | LocalLLaMA | View on Reddit | 36 comments

iamMess@reddit

This is the wrong use case for adding knowledge to a model. Use RAG. If you really want to go down this path, then [runpod.io](http://runpod.io) and [modal.com](http://modal.com) is your best bet. You can even do it serverless if your users are ok with a little cold boot time.

Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with Over 30 Billion Parameters and Support Most European Languages

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 42 comments

Phantom-fragment

Posted by Ok_Horror_8567@reddit | LocalLLaMA | View on Reddit | 13 comments

iamMess@reddit

Bro 80% of your README is about how it’s faster than docker. Which means nothing. I don’t know what it does other than you vibe coded the fuck out of this.

Phantom-fragment

Posted by Ok_Horror_8567@reddit | LocalLLaMA | View on Reddit | 13 comments

Phantom-fragment

Posted by Ok_Horror_8567@reddit | LocalLLaMA | View on Reddit | 13 comments

Phantom-fragment

Posted by Ok_Horror_8567@reddit | LocalLLaMA | View on Reddit | 13 comments

Local Meeting Notes with Whisper Transcription + Ollama Summaries (Gemma3n, LLaMA, Mistral) - Meetily

Posted by Sorry_Transition_599@reddit | LocalLLaMA | View on Reddit | 9 comments

axolotl vs unsloth [performance and everything]

Posted by Shivacious@reddit | LocalLLaMA | View on Reddit | 26 comments

🚀 OpenAI released their open-weight models!!!

Posted by ResearchCrafty1804@reddit | LocalLLaMA | View on Reddit | 571 comments

The "Leaked" 120B OpenAI Model Is Trained In FP4

Posted by Few_Painter_5588@reddit | LocalLLaMA | View on Reddit | 132 comments

100x faster and 100x cheaper transcription with open models vs proprietary

Posted by crookedstairs@reddit | LocalLLaMA | View on Reddit | 23 comments

Drummer's Mixtral 4x3B v1 - A finetuned clown MoE experiment with Voxtral 3B!

Posted by TheLocalDrummer@reddit | LocalLLaMA | View on Reddit | 15 comments

Drummer's Mixtral 4x3B v1 - A finetuned clown MoE experiment with Voxtral 3B!

Posted by TheLocalDrummer@reddit | LocalLLaMA | View on Reddit | 15 comments

Voxtral WebGPU: State-of-the-art audio transcription directly in your browser!

Posted by xenovatech@reddit | LocalLLaMA | View on Reddit | 13 comments

I made a 1000 hour NSFW TTS dataset

Posted by hotroaches4liferz@reddit | LocalLLaMA | View on Reddit | 152 comments

mistralai/Voxtral-Mini-3B-2507 · Hugging Face

Posted by Dark_Fire_12@reddit | LocalLLaMA | View on Reddit | 94 comments

Well, if anyone was waiting for Llama 4 Behemoth, it's gone

Posted by Ok-Elevator5091@reddit | LocalLLaMA | View on Reddit | 154 comments

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

iamMess@reddit (OP)

Qwen3 is also a great model. As mentioned previously, this is less about the performance and more about the method. If we went for full performance we would have chosen other models and probably also spent a lot more time improving the dataset.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

iamMess@reddit (OP)

That is true. A more nuanced baseline might have been asking it to CoT then provide answer. To be honest I don't think it will improve much. The original emotion dataset is very hard even for humans.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

iamMess@reddit (OP)

We tried your method, but it doesn’t really work. Rather it thinks about the instruction you gave it, which we do not want. Yes, the model is small and the reasoning is complex, but we still see a decent improvement. We also mention in the paper that using a larger model would probably yield better results.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

iamMess@reddit (OP)

Also a possibility, and possibly better performance. It doesn’t provide the explainability though. Our reasoning gen model can also be used to augment other dataset with reasoning. For example, there is a big need for multi turn reasoning dataset, which currently (to my knowledge) does not exist.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

iamMess@reddit (OP)

Yeah. We’re also working on a better TTS and STT model using llama3 as a base model. We’ve considered using Qwen, but they are not as multilingual as the llama models.

Here is how we beat ChatGPT at classification with 1 dollar in cloud compute

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 43 comments

iamMess@reddit (OP)

We used LLaMA because they are well supported and easy to train. I'm certain that using SOTA models would improve performance, but it would cost us a lot more if we need to train a 600b model than 1b model. Also this is more about the method than the actual performance. It can easily be scaled by changing the model to a better one :)

What finetuning library have you seen success with?

Posted by Responsible-Crew1801@reddit | LocalLLaMA | View on Reddit | 17 comments

Is there appetite for hosting 3b/8b size models at an affordable rate?

Posted by No-Fig-8614@reddit | LocalLLaMA | View on Reddit | 25 comments

iamMess@reddit

I think most people here can inference a 3b or 8b model themselves. For 60 usd you can get A LOT of serverless inference at runpod or like 120 hours of rtx 3090. Doubt many people are actually using the models actively that much per month.

Is this the largest "No synthetic data" open weight LLM? (142B)

Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 48 comments

Is this the largest "No synthetic data" open weight LLM? (142B)

Posted by AaronFeng47@reddit | LocalLLaMA | View on Reddit | 48 comments

ResembleAI provides safetensors for Chatterbox TTS

Posted by WackyConundrum@reddit | LocalLLaMA | View on Reddit | 16 comments

ResembleAI provides safetensors for Chatterbox TTS

Posted by WackyConundrum@reddit | LocalLLaMA | View on Reddit | 16 comments

DeepSeek R1 05/28 performance on five independent benchmarks

Posted by zero0_one1@reddit | LocalLLaMA | View on Reddit | 8 comments

How is Kokoro TTS so good with so few parameters?

Posted by JealousAmoeba@reddit | LocalLLaMA | View on Reddit | 80 comments

B200 vs H100 Training Benchmark: Up to 57% Faster Throughput

Posted by igorsusmelj@reddit | LocalLLaMA | View on Reddit | 18 comments

When you prompt a non-thinking model to think, does it actually improve output?

Posted by Kep0a@reddit | LocalLLaMA | View on Reddit | 42 comments

Orpheus-FastAPI: Local TTS with 8 Voices & Emotion Tags (OpenAI Endpoint Compatible)

Posted by townofsalemfangay@reddit | LocalLLaMA | View on Reddit | 111 comments

How I used entropy and varentropy to detect and remediate hallucinations in LLMs

Posted by AdditionalWeb107@reddit | LocalLLaMA | View on Reddit | 12 comments