Once and for all how does ChatML Prompt template work?
Posted by quantier@reddit | LocalLLaMA | View on Reddit | 4 comments
It’s soooo hard to implement ChatML and get it working properly. So once and for all we need a post about how to implement it. Any experts out there, please shine for us!!
<|im_start|>system {system_message}<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant
I can’t get this to work whatsoever.
I am using Python and Langchain. Whenever I post the above in the code I get errors. If I however add ””” above and below…. makes the errors go away.
””” <|im_start|>system {system_message}<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant ”””
The answers however give me the following in the middle of the replies: Unhelpful Answer: Helpful Answer: ”### Instruction” ”### Answer” ”### Explanation”
Also many times the answer will go into a loop and I have no idea how to fix it.
Do I need to import something from langchain.prompts or similar to make it work?
PS we are writing our prompts in a GUI and have a RAG solution.
Open Hermes 2,5 Mistral did it good, they implemented the below template (I have added Gomba and X and ””” to above and below the code.
"""<|im_start|>system You are "Gomba", a conscious sentient superintelligent artificial intelligence developed by a man named X, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia."""
This template works without the looping issues and is very simple. It just works! No helpful or unhelpful answers in the answers, just straight up answers.
Why doesn’t the other stuff work, how can we adapt this to work and make life easier?
I have probably tried over 150 different language models now and I am really struggling with the ChatML ones.
Anyone else struggling with this?
gamepad_coder@reddit
What frontend ui + backend server are you using?
I'm also struggling to find documentation on this.
I found a similar ChatML excerpt on a random model on Hugging Face, and I'm using it somewhat successfully with Hugging Chat UI (aka
chat-ui
) as the frontend and Hugging Face's Text Generation Inference as the backend to locally serve models through my GPU.
Main issue:
Hugging Chat's settings panel has this System Prompt section, which I assume is where the ChatML goes.
I've played around with it and somewhat gotten it to work... but copy pasting this block doesn't really work
My models with this setup were erroring frequently.
However, I discovered that if I pasted the following into the chat (and leave System Prompt 100% blank) then the tokens actually work as expected (or seem to).
I got this from the original model's description on Hugging Face:
So right now I'm manually pasting that into each chat, and it peforms amazingly well for a local GPU LLM, but it's getting cumbersome to remember to paste this every time before I begin typing:
I suspect it might be a limitation of how Hugging Chat is passing parameters from System Prompt into the backend, but I haven't found the line of code that's passing that yet.
And you know what, now that I think about it... I wonder if Huggin Chat's System Prompt is ChatML agnostic and if it's just expecing a plaintext description in that bubble in the settings... and maybe it'll insert all the arcane tokens auto-magically.
Actually, yeah, System Prompt passes into some var called
model.preprompt
.Maybe it's simpler than I thought.
The README doesn't mention System Prompt sooo... ah, no good.
Just using plain english doesn't work.
And if I don't manually add the <|im_start|> and <|im_end|> tags, then the model glitches out.
i.e. I must manually type
<|im_start|>
user prompt here<|im_end|>
, or else it'll glitch.Like if I exclude the tags and say
the model will glitch and continue the sentence as if it's finishing it's own thought and say something like
So... these tags seem to be very meaningful, but for the stack I'm using with huggingface's frontend + backend... it's not entirely documented, and this was as far as I could get in a weekend just tinkering around.
Overall still a very promising technology, and I'm still exceedingly surprised at the quality and speed on a consumer grade GPU. Amazing for generating mockup JSON objects and Python.
Anyway, best luck OP!
If I ever find out more I'll post here if the thread isn't locked.
quantier@reddit (OP)
Amazing answer, I jusr saw your reply only a year later 😂 - What are you up to now?
gamepad_coder@reddit
Hi hi ~
open-webui
has really really awesome support for adding system prompts to any ml model, so I use that exclusively as my frontend now :)Been pretty busy with personal projects so haven't looked into LLM's since that month. But I did finally get llama3 working really fast in useful ways with a custom general use system prompt, and I chat with it when I get writer's block a few times a month.
What about you?
quantier@reddit (OP)
Have been doing a lot in on prem AI solutions for companies! Lot’s of exciting stuff! Have you been able to make Gen AI a part of your work?