Using LlaMa to analyze scientific texts - I am failing

Posted by RollLikeRick@reddit | LocalLLaMA | View on Reddit | 7 comments

So I am trying to analyze 15 scientific texts at once using a local LLM.

I wrote this post two days ago: https://www.reddit.com/r/LocalLLaMA/comments/1gu77hn/i_just_tried_llama70binstructggufiq2_xs_and_am/

Where people basically told me not to use a Q2 but atleast a Q4 model. I tried Q4_K_S, Q4_K_M, Q4_K_L and Q5_K_S but the best I got was this: The model checked two sources and gave rudimentary info.

Is using the llama 3.1 model doomed from the beginning and should I use a different one?
Or is the task just too big to be run locally on a consumer machine?
If I really wanted to use AI, should I analyze each paper individually?

HW: 2x4090RTX, 4x32GB RAM, Ryzen Threadripper 24 x 4,2GHz

[-]

ShengrenR@reddit

I think what a lot of folks are forgetting to answer, and you may not realize yourself: you are not analyzing scientific texts with this sort of approach.. I'm not an "open webui" user myself.. but it's concerning that you're "uploading" the texts in your description.

This is what I assume is happening: You 'add' your documents to the ui, but those aren't being given whole-form to your LLM, they're being split into a bunch of small chunks and indexed with an embedding model - then, when you ask your question, your imput is being similarly processed and compared against all those chunks.. those top N (maybe 5?) Chunks get added to the prompt input when the UI runs the actual LLM request and you get what you get. This is generic, vanilla, RAG; it has its uses, but you need to read up about it so you know what's happening and why it's not answering your question well.

If this is what's happening, you have three problems: one, the llm is given no context about where that chunk came from and it may have chunks from different papers assuming different things. Two, the llm knows zero about the rest of the content you've added, for that particular request, zero. Three, your lookup (embedding model) is garbage - it's trained on general text and knows concepts about general things.. but research papers have highly specific language typically, or use common words in different ways, etc. - so the way that the model sees similarity doesn't actually map all that well into niche research paper land.

Comparing to "ChatGPT" isn't reasonable here, because chatgpt isn't a model, it's a platform, and it's doing many things behind the scenes to compensate for just the model+rag situation. Maybe open webui has similar tools, I don't know, but you'd need to learn them - just attaching a bunch of pdfs and asking questions without the understanding of what's under the hood is like driving a manual in first gear and wondering if your car can't go 60.

[-]

RollLikeRick@reddit (OP)

Hey thanks for your extensive answer, really appreciate it! I think I'll stick to notebook LM and/or chatGPT for now.

[-]

manobutter@reddit

I suspect this is a memory issue. If you can provide the actual length of the scientific texts, it would help narrowing it down.

llama 3.1 70b Q5 is probably filling up almost all of your VRAM on its own and then throwing 15 articles on top is definitely pushing it. In my experience, the fact that it keeps referencing only 2 papers is a sign the context length is set too small and it is artificially limiting its context to fit in memory. For 15 separate papers, the context length is going to need to be very large using RAG and you are simply not going to be able to analyse them with 48GB and a 70b model.

To see if it is an issue with context length, try using llama 3.1 8b Q4 as a test, increasing the context length to the max (I think 131k) and testing it on an increasing number of papers until it fails to actually include all of the papers. Start with 3 and go from there. Fair warning that I've found the accuracy of models to tank once you get into really large context windows.

Is there any particular reason you need to be analysing all 15 papers at once?

If all else fails, I would recommend using NotebookLM as in my experience it has given me the best results for something like this.

[-]

ShengrenR@reddit

Do you use open webui like they are? I don't, but it sounds like it's likely doing RAG, not full form text inclusion like the papers would need? They need the full text for this to work at all, RAG really won't do it unless they're customizing a bunch of parts.

[-]

Apart_Boat9666@reddit

Also i would recommend two passes to get the result. In first pass ask it to make detailed summary of research paper individually. Then in second pass send processed summary to get result. This might now work in research paper.

[-]

Apart_Boat9666@reddit

What about nwmotron model or athena v2.

[-]

sehu@reddit

I am using LM studio v. 0.2.29 (Llama 3.1 Q4_K_M, n_ctex 6144 with gist embedding v0), combine with anythingLLM v1.6.6 with over 100 pdf RAGs, and getting a decent results, sometimes get 6pdf citations. 2x3090, 64RAM. maybe it is not the best one can do, but it is enough for me right now.