Some advise or suggestions?

Posted by PeakTurbulent5545@reddit | LocalLLaMA | View on Reddit | 5 comments

I’m a bioinformatician tasked with building a pipeline to automatically find, catalog, and describe UMAP plots from large sets of scientific PDFs (mostly single-cell RNA-seq papers). i never used AI for this kind of task so right now i don't really know what I am doing, idk why my boss want this, i don't think is a good idea but maybe i am wrog.

What I've tried so far:

Are there any ready-to-use models or specific Hugging Face checkpoints that are already "expert" in scientific document layout or biological figure classification?

I’m looking for something that might have been trained on datasets like PubLayNet or PMC-Reports and can handle the visual nuances of bioinformatics plots. Is there a better alternative to the Qwen/YOLO combo for this specific niche, or is fine-tuning an absolute must here?