How to make LLM generate realistic company name variations? (LLaMA 3.2)
Posted by Neural_Nodes@reddit | LocalLLaMA | View on Reddit | 4 comments
Hey all,
I’m building a blacklist company detection system where the LLM (LLaMA 3.2 via Ollama) is used to generate company name variations (misspellings, abbreviations, formatting).
Problem:
The LLM generates unrealistic or unrelated variations instead of true real-world ones. I need high-quality, meaningful variations only.
Example:
Input: “Infosys Limited”
Expected: “Infosys Ltd”, “Infosys”, “Infosys Pvt Ltd”
But LLM sometimes generates irrelevant names.
Looking for:
* How to constrain LLM to generate only valid real-world variations?
* Better prompt strategies or structured output formats?
* Should I combine LLM with rule-based constraints?
Goal is to improve precision in name matching.
Any suggestions would help 🙌
suprjami@reddit
for suffix in Ltd Pvt\ Ltd Pty\ Ltd LLC; \ do echo "${1%% *} $suffix"; done
I don't know why so many people think they need an LLM for simple text processing which has existed since the 1970s.
DinoAmino@reddit
Are you doing few-shots? Giving the model a half dozen or so examples in the system prompt?
StableLlama@reddit
Your request sounds fishy.
Creating a black list with name variations to test against sound like a bad approach, as there will always be a variation you didn't consider. E.g. having a doubled space is already failing in a string compare.
Why aren't you using the strengths of the LLMs and invert the task? Give it the name to check and ask is whether that can be the same company.
froggybrdr@reddit
My first thought is a sanity check. So the original name and variations get labeled and output, then a second model/pass with a second prompt verifies those names are realistic. Something like gemma4 e2b might meet your needs and take up a similar hardware footprint.