Looking for a solution for cleaning up/summarizing long voice transcriptions on M1 Max 32GB.

Posted by Bulky_Jellyfish_2616@reddit | LocalLLaMA | View on Reddit | 8 comments

I like to record voice memos as a diary kind of thing, so I want to take the text transcription that the Apple Voice Memos app generates and process it with AI to "clean it up" or summarize it, while retaining it's meaning and style.

I think I'm struggling with Context Size limits - I'm using Ollama on a Macbook Pro M1 Max with 32GB of RAM... I've tried using Gemma2:27b and Llama3.1:8b with context adjusted up to 8k and 16k, but they don't follow instructions when I insert my transcript, just respond with what is essentially "wow cool story" lol.

What would be the best tool for this job? Can I get a large enough context window locally to process these transcripts containing ~5000 words?

[-]

Pookieswer@reddit

You could definitely do this locally with a bigger-context model, but honestly for your workflow (long voice memos → clean summary), a lightweight online tool might save you a lot of hassle. I’ve had a decent experience using Vizard for this kind of thing — you can drop in the audio or the transcript and it’ll generate a clean summary, key points, or a polished version of the text. No need to mess with context windows, and it handles long recordings without choking. It’s more of a “upload → get usable text fast” tool than a modeling setup, so it might fit what you’re doing. Vizard processes 10GB files with timestamped SRT exports and 60 free monthly minutes, though it does require upload.

[-]

Luckylars@reddit

What did you end up using ?

[-]

cristianadam@reddit

I have a MacBook Pro M1 Max with 32 GB of RAM and I've used to make a summarization of a 20 minutes talk. I've used Mistal Nemo. Examples of local LLM usage (qt.io) has the step by step instructions.

You shouldn't be afraid of using a terminal and hacking a bash script file though.

[-]

SomeOddCodeGuy@reddit

Gemma-2 27b has been finicky for me, and it would definitely be finicky with the context kicked up to 16k. I would set that one aside for now.

Llama 3.1 8b handles context up to 128k (I think?) but an 8b may not be as great at picking up the context.

Personally, in your shoes, I'd try Command-R 08-2024. I'd start at q5_K_M and see if that fits nicely, and jump down to q4_K_M if not. This model is fantastic for this kind of request, and should really get the job done. The prompt template can be tricky, so be sure to use either a front or back end that will handle it for you.

If that doesn't do it for you, Mistral Small (22b) just came out the other day and I've never been disappointed in a mistral model. q8 would fit on your machine, I'm certain, so I'd give it a try.

[-]

ekaj@reddit

You shouldn’t need a tool for that, what prompt are you using to generate the summaries

[-]

Bulky_Jellyfish_2616@reddit (OP)

I have a voice transcription that needs improvement. The transcription is a bit unorganized, rambly, and contains some grammar and spelling mistakes. I want you to do the following:

Correct any grammar, spelling, or punctuation errors
Improve the overall flow and clarity, without adjusting the tone.
Try to match the same casual tone as my transcription. This is an informal voice memo for my personal use, and should not be overly formal.
Break the text into distinct paragraphs where appropriate.
Organize it in a more coherent and structured way, while keeping the context and meaning intact.
The writing should always be from MY perspective. The writing should retain pronouns like "I", "me", "my", etc. as the writing is FROM MY PERSPECTIVE. Do not insert any of your own thoughts.
Do not include any formalities, only reply with the corrected transcription.
The format should be in paragraphs, as this is a story. Not bullet points, lists, etc.

[-]

ekaj@reddit

Here’s the prompts I use for summaries:

https://github.com/rmusser01/tldw/blob/main/Docs/Prompts/sprompt.txt

For yours, 1. I recommend adding in an example 2. Forget about grammar/spelling. 3. ‘Restructure this text to group and condense the expressed ideas. Group related thoughts into their own paragraphs. Use the existing writing style exemplified in the written text. Do not insert or add any perspectives other than the one the text is written in. Do not insert or add any additional or new content or commentary, unless that content is a distillation of the existing writing. Here is an example of this request: ’

That’s just an off the cuff attempt at rewriting it to align with what I’ve found to be effective in prompting.

[-]

Bulky_Jellyfish_2616@reddit (OP)

I'll add too that I'm doing this through Open WebUI, and I expanded the context by using these model files:

FROM gemma2:27b
PARAMETER temperature 1
PARAMETER num_ctx 16384
SYSTEM You are an excellent author and proofreader that loves to help improve text and stories, correct grammer and spelling mistakes, and improve the readability of a body of text.