hellninja55's Comments

DeepSeek R1 (Qwen 32B Distill) is now available for free on HuggingChat!

Posted by SensitiveCranberry@reddit | LocalLLaMA | View on Reddit | 124 comments

[-]

hellninja55@reddit

u/SensitiveCranberry Can you guys put the 70b model there as well?

Local TTS models that can match ElevenLabs in terms of quality and consistency

Posted by _megazz@reddit | LocalLLaMA | View on Reddit | 46 comments

[-]

The newest Fish Speech model supports portuguese, but keep in mind you need at least one minute of reference audio for it to work well. Here is a sample: [https://vocaroo.com/1n69hlXD60Uu](https://vocaroo.com/1n69hlXD60Uu)

Local TTS models that can match ElevenLabs in terms of quality and consistency

Posted by _megazz@reddit | LocalLLaMA | View on Reddit | 46 comments

[-]

hellninja55@reddit

Fish Speech 1.5 supports portuguese, why don't you try that? [https://huggingface.co/spaces/fishaudio/fish-speech-1](https://huggingface.co/spaces/fishaudio/fish-speech-1)

What are we expecting from Llama 4?

Posted by Own-Potential-2308@reddit | LocalLLaMA | View on Reddit | 89 comments

[-]

hellninja55@reddit

3 and 4 are never gonna happen, Meta so far has avoided open-sourcing their image-related models or audio models that could be used to clone other people's voices.

Open models wishlist

Posted by hackerllama@reddit | LocalLLaMA | View on Reddit | 238 comments

[-]

hellninja55@reddit

Train an LLM on musical ABC notation and music theory, and make it actually good. Basically what the ChatMusician guys did: [https://huggingface.co/m-a-p/ChatMusician](https://huggingface.co/m-a-p/ChatMusician) But trained on actually good stuff and different genres, not just ancient folk songs. My recommendation would be using Omnizart on free public domain songs from different genres (check FMA for example) to generate MIDIs out of the different vocal and instrumental tracks, convert them to ABC notation and build a huge dataset on ABC notation, carefully curating it. Bonus if you guys can train a TTS model that sings, like some devs in China did with DiffSinger, by making a TTS model that takes lyrics, notes, phonemes, and duration for each phoneme.

Is there any RAG specialized UI that does not suck and treats local models (ollama, tabby etc) as a first-class user?

Posted by hellninja55@reddit | LocalLLaMA | View on Reddit | 20 comments

[-]

hellninja55@reddit (OP)

Which settings are you using for RAG? I am not getting accurate results

11 days until llama 400 release. July 23.

Posted by danielcar@reddit | LocalLLaMA | View on Reddit | 196 comments

[-]

hellninja55@reddit

Since you seem to, ahem, have knowledge specifically about that, can you tell us whether the API prices will be competitive against GPT4 and Claude Sonnet?

LLama3 8B Vision Model that is on par with GPT4V & GPT4o

Posted by SnooTigers1510@reddit | LocalLLaMA | View on Reddit | 27 comments

[-]

hellninja55@reddit

> SOTA open-source VLLM That's a huge claim. Post benchmark numbers vs internvl 1.5 or MiniCPM

What is the SOTA vision llm (multimodal llm that describes images), and where can I keep track of it?

Posted by hellninja55@reddit | LocalLLaMA | View on Reddit | 4 comments

[-]

hellninja55@reddit (OP)

Yeah... I would like to know how that fares against InternVL-Chat-V1.5