Looking for a solution for cleaning up/summarizing long voice transcriptions on M1 Max 32GB.
Posted by Bulky_Jellyfish_2616@reddit | LocalLLaMA | View on Reddit | 8 comments
I like to record voice memos as a diary kind of thing, so I want to take the text transcription that the Apple Voice Memos app generates and process it with AI to "clean it up" or summarize it, while retaining it's meaning and style.
I think I'm struggling with Context Size limits - I'm using Ollama on a Macbook Pro M1 Max with 32GB of RAM... I've tried using Gemma2:27b and Llama3.1:8b with context adjusted up to 8k and 16k, but they don't follow instructions when I insert my transcript, just respond with what is essentially "wow cool story" lol.
What would be the best tool for this job? Can I get a large enough context window locally to process these transcripts containing ~5000 words?
Pookieswer@reddit
You could definitely do this locally with a bigger-context model, but honestly for your workflow (long voice memos → clean summary), a lightweight online tool might save you a lot of hassle. I’ve had a decent experience using Vizard for this kind of thing — you can drop in the audio or the transcript and it’ll generate a clean summary, key points, or a polished version of the text. No need to mess with context windows, and it handles long recordings without choking. It’s more of a “upload → get usable text fast” tool than a modeling setup, so it might fit what you’re doing. Vizard processes 10GB files with timestamped SRT exports and 60 free monthly minutes, though it does require upload.
Luckylars@reddit
What did you end up using ?
cristianadam@reddit
I have a MacBook Pro M1 Max with 32 GB of RAM and I've used to make a summarization of a 20 minutes talk. I've used Mistal Nemo. Examples of local LLM usage (qt.io) has the step by step instructions.
You shouldn't be afraid of using a terminal and hacking a bash script file though.
SomeOddCodeGuy@reddit
Gemma-2 27b has been finicky for me, and it would definitely be finicky with the context kicked up to 16k. I would set that one aside for now.
Llama 3.1 8b handles context up to 128k (I think?) but an 8b may not be as great at picking up the context.
Personally, in your shoes, I'd try Command-R 08-2024. I'd start at q5_K_M and see if that fits nicely, and jump down to q4_K_M if not. This model is fantastic for this kind of request, and should really get the job done. The prompt template can be tricky, so be sure to use either a front or back end that will handle it for you.
If that doesn't do it for you, Mistral Small (22b) just came out the other day and I've never been disappointed in a mistral model. q8 would fit on your machine, I'm certain, so I'd give it a try.
ekaj@reddit
You shouldn’t need a tool for that, what prompt are you using to generate the summaries
Bulky_Jellyfish_2616@reddit (OP)
I have a voice transcription that needs improvement. The transcription is a bit unorganized, rambly, and contains some grammar and spelling mistakes. I want you to do the following:
ekaj@reddit
Here’s the prompts I use for summaries:
https://github.com/rmusser01/tldw/blob/main/Docs/Prompts/sprompt.txt
For yours, 1. I recommend adding in an example 2. Forget about grammar/spelling. 3. ‘Restructure this text to group and condense the expressed ideas. Group related thoughts into their own paragraphs. Use the existing writing style exemplified in the written text. Do not insert or add any perspectives other than the one the text is written in. Do not insert or add any additional or new content or commentary, unless that content is a distillation of the existing writing. Here is an example of this request:’
That’s just an off the cuff attempt at rewriting it to align with what I’ve found to be effective in prompting.
Bulky_Jellyfish_2616@reddit (OP)
I'll add too that I'm doing this through Open WebUI, and I expanded the context by using these model files: