Best AI setup for intelligent srt subtitles translation

Posted by CaterpillarOne6711@reddit | LocalLLaMA | View on Reddit | 8 comments

Okay so basically I'm trying to translate tons of srt files (cations subtitles) from one language to another and I'm trying to do it intelligently sentence by sentence and not line by line.

My hardware:

CPU 5900x

RAM 64gb + (up to 80gb)

GPU 4070 12GB VRAM

I've tried various versions of deepseek such as 7b, 8b, 14b and gpt oss 20b on both ollama and lm studio and I noticed that 20b is the only one intelligent enough to do the job, but the thing is 20b on ollama and lm studio is hella slow, so I tried running it on llama.cpp and it turned out to be 10-20x faster. But the thing is 20b refuses to translate large files, when I tell it to translate large files and specifically tell it not to reason about the length of the text and to translate never stop, it starts to reason that the file is too large and chunk it every time, so that I have to to remind it to keep on translating.

Is there any workaround?

[-]

Remarkable-Run3693@reddit

You can use Subtitle Edit v5.0.0. and LM Studio together . Just find good AI model , coz for my language even Qwen 36b doesnt work :D

byJim@reddit

Logre traducir archivos SRT enormes (como el SRT de la pelicula Avatar) utilizando modelos locales de 100mb aproximadamente, con una decima de los recursos con los que cuentas. Pruébalo y monitorea los recursos de tu PC: https://afrodita-subtitle-translator.vercel.app/

Usa la API de Translator en Chrome para traducir texto con los modelos de IA que se proporcionan en el navegador. Es posible que tengas la idea de que esto es solo para el lado del cliente, pero funcionar super bien del lado del servidor, puedes utilizar Chrome en Modo Headless recomiendo utilizar Puppeter.

Successful_Dot_3094@reddit

Te tengo que felicitar porque tu web me a facilitado muchisimo el paso de ir traduciendo subtitulos de 200 lineas en 200 a traves de deepseek por la web. Es super rapido y e podido observar y comparar que la traduccion de tu web se parece mucho a la de deepseek que es la IA que usaba yo.

Que cool! :D
Gracias por tomarte el tiempo de comentar. Hacerlo de forma manual es un martirio. Oye estoy planificando las siguientes mejoras de la app. Me encantaría escuchar tus ideas!

karthikgokul@reddit

If you want sentence-level translation (not line-by-line) at scale, fighting an LLM’s context limits is the wrong battle. The workaround is to preprocess the SRT into sentence “blocks” (while preserving timestamps), translate blocks in batches, then reflow back into subtitle lines with your max chars/line rules.

A practical setup that avoids the “file too large” refusal:

Step 1 (merge): Merge adjacent SRT lines into a paragraph until you hit a sentence boundary (., ?, !, quotes) or a time-gap threshold (e.g., >300–500ms).
Step 2 (translate): Translate those sentence blocks in chunks (e.g., 30–80 sentences per batch) with a strict format (ID → translation).
Step 3 (reflow): Split translated sentences back into 1–2 lines per caption, keeping original timings, enforcing max chars/line and reading speed.

If you’d rather not engineer all that, a subtitle-native workflow is easier: tools like Vitra’s Translate. video handle subtitles as subtitles (sentence-aware translation + consistent terminology), so you’re not manually babysitting chunking and retries like you are with llama.cpp.

The key idea: don’t ask the model to “translate a whole file”—make the file translation a deterministic pipeline, and use the model only for the translation step.

Free_Government_4003@reddit

Have you tried setting a higher context window and using system prompts that explicitly forbid any commentary? Something like "You are a translation machine. Translate the following subtitle file completely without any explanations, reasoning, or stops. Do not mention file length or suggest chunking"

Also might want to mess with the temperature settings - lower temp could make it more obedient to your instructions

CaterpillarOne6711@reddit (OP)

the temp is default I think, 0.8

yeah I've set context size at 131k already, but when I ask it why it cant translate more it says because it has limited ouptput which is not true because when it responds I can literally see output: x/infinite