Self-hosted LLM for scientific papers
Posted by Lost_Albatross_5673@reddit | LocalLLaMA | View on Reddit | 6 comments
Hi everyone,
I am new to self-hosted LLMs but so far it's been an exciting journey. My main use case for LLMs is understanding and grabbing key conceptualisations from scientific papers. So far I've used and worked mostly with ChatGPTs 4o model. I have a specific prompt that gives a key summary of the main arguments, research design, supporting data and data analysis. It works really well with ChatGPTs 4o model, but when I give the same prompt to a self-hosted Gemma/Ilama 3.1, I end up with a very high level set of bullet points.
Any further exploratory work or questions are either met with high level answers or statements that the model cannot access the document.
I haven't trained the model, but I assumed that it was already trained on a set of data. Any advice on what I should do to improve the models performance? I am running the model on my MacBook using AnythingLLM. I tried Docker and can switch easily, but I am guessing the issue is that I haven't trained the model yet?
Super_Spot3712@reddit
gemma:9B and llama3.1:8B can't compete with GPT-4o, but perhaps llama3:70B would deliver good results. For that, you'd need, for example, a computer with two RTX 3090 cards... Or you could first test your prompts with different models on runpod for instance.
Can you perhaps say more precisely what the prompt for GPT-4 looks like? I assume it's publications that don't differ greatly in pattern, and therefore the prompt generally works well?
Lost_Albatross_5673@reddit (OP)
Hey thank you for your reply! I am running the 9B and 8B versions as you pointed out - I thought of setting up 70B but some people sais there wasn’t a big difference so I stuck to the lightest models 😅 Two RTX 3090s are a bit out my budget at the moment😅 I’ll look into Runpod - it’s the first time I hear of it 😊
My exact prompt is this: “Could you summarise the key concepts and takeaways in this article? Can you also provide answers to the following questions: what is the core argument of the text? What are the secondary arguments for the text? What is the text arguing against? What are the evidences used to support the arguments presented? How does that argument contribute to current discourse on the subject area?”
Witht the self-hosted models, I feed each question as a separate prompt which seems to yield better results. Some common challenges that I’ve had are around reasoning and how in depth the answers were. I started looking into how people train these models but is that really the problem in my case?
hedonihilistic@reddit
Looking into how people train these models is not going to help you much unless you are going to start training models which in your case is not what you really need to do at this point. What you need is to understand the strengths and weaknesses of different types of models with different parameter sizes. In my experimentation with academic writing, anything below 70 billion is not going to be intelligent enough to give you a good discussion on any reasonably complicated scientific writing.
Llama 3.170 billion in my experience gets quite flustered when given very long papers and it's output quality drops quite a bit with long papers in my experience right now. For selfhosted models, Qwen is the best I've used. It is great at understanding as well as writing based on the provided contacts in academic writing. Overall for production ready writing no local models beat gpt4 1106 or Claude opus right now in terms of the depth and breath of knowledge and nuanced discussion of the ideas and findings in academic papers.
Reach_the_man@reddit
which specific model and quantization was this?
hedonihilistic@reddit
It was one of the awq quantizations of the 72B model. Probably the official one from the qwen team.
Lost_Albatross_5673@reddit (OP)
Hey, thank you for your advice. I don't mind training them - it seems pretty interesting :) Short term, you are correct, I just need something that will work. I tried running a 70B model and it overclocked my memory and wasn't able to generate anything. My current processor simply doesn't have enough memory.
Right now I am experimenting with Gemma2:27-text-q3-K-M. Its a lot better than the 8B models for my use case, but it struggles with reasoning unfortunately. I'll give Qwen a try, thank you for your advice :)