Ollama retaining history?

Posted by DimensionEnergy@reddit | LocalLLaMA | View on Reddit | 11 comments

so ive hosted ollama locally on my system on http://localhost:11434/api/generate and was testing it out a bit and it seems that between separate fetch calls, ollama seems to be retaining some memory.

i don't understand why this would happen because as much as i have seen modern llms, they don't change their weights during inference.

Scenario:

makes a query to ollama for topic 1 with a very specific keyword that i have created
makes another query to ollama for a topic that is similar to topic 1 but has a new keyword.

Turns out that the first keyword shows up in the second response aswell. Not always, but this shouldn't happen at all as much as i know

Is there something that i am missing?
I checked the ollama/history file and it only contained prompts that i have made from the terminal using ollama run

[-]

chibop1@reddit

It doesn't. Tell your name ask what your name is in the same context. It should tell your name. Then in another session with fresh context, ask your name. It shouldn't be able to tell your name. If it does, client is managing some weird history context.

[-]

DimensionEnergy@reddit (OP)

thats why i wanted to ask.

understand that the keywords I'm referring to here are acronyms that are deeply specific. And even though that makes it even more probable that they show up together, there are at least 20-30 that are correlated in this way. why did the only one show up that i passed in just before in the previous prompt?

on top of this in the second prompt i only mentioned the general topic. yet it mentioned and generated a gibberish response with words from the first prompt

[-]

Red_Redditor_Reddit@reddit

It's kind of a weird experience. It thinks way more alike between sessions than you'd think. I would have it write stories and it would use the same names over and over and it would get a bit unsettling.

[-]

DimensionEnergy@reddit (OP)

exactly what im saying!!! i experienced this a lot before, but this time was concrete evidence that it had some memory.
I really dont understand why its doing that tho.

thing is around a few months ago when i was messing around with the chatgpt API endpoint. i had a similar experience, but dismissed it.

so maybe its something about the general llm architecture?

[-]

Red_Redditor_Reddit@reddit

LLM's are just that repetitive. They have a strong inclination to say certain things. It's part of the reason GPT'isms are so bad.

One thing you might change is the temp setting. A setting too low is more likely to cause the LLM to produce repetitive results.

[-]

DimensionEnergy@reddit (OP)

good point, but still i dont think the words I'm working with are so common that an llm would just chuck one in for good measure just because it felt right. especially when it could have added any of the other 20-30 words

[-]

ThinkExtension2328@reddit

Don’t over think this , ollama can be buggy. If you have too many issues use LLM studio.

[-]

chibop1@reddit

How are you hitting the API? Are you using Ollama/OpenAI library?

Unless you're accidentaly including old context, it can't remember anything. I assure you, it's not possible.

[-]

DimensionEnergy@reddit (OP)

simple fetch call

OLLAMA_ENDPOINT = "http://localhost:11434/api/generate"
def query_ollama(prompt):
    payload = {
        "model": MODEL_NAME,
        "prompt": prompt,
        "system": SYSTEM_PROMPT,
        "stream": False
    }
    response = requests.post(OLLAMA_ENDPOINT, json=payload)
    if response.status_code == 200:
        print(response.json()["response"])
        return response.json()["response"]
    else:
        raise Exception(f"Ollama query failed with status {response.status_code}: {response.text}")

[-]

DimensionEnergy@reddit (OP)

i can assure you that the client isn't storing any history

[-]

chibop1@reddit

Type /clear