My First Official AI Research Paper Accepted on SSRN
Posted by assemsabryy@reddit | LocalLLaMA | View on Reddit | 18 comments

Today, my research paper “Stable Training with Adaptive Momentum (STAM)” was officially accepted on SSRN — marking my first documented and official publication as an AI Researcher.
The paper introduces a new optimization algorithm for deep learning training that outperformed several popular optimizers in selected benchmarks, addressed multiple training stability challenges, and achieved up to 50% reduction in computational training cost in some experiments.
This is an important milestone in my research journey, and I’m excited to continue exploring optimization techniques for efficient and stable AI training.
You can read the paper here:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6699059
LegacyRemaster@reddit
Congrats well done!
stonetriangles@reddit
You tested it on an extremely small model with a single GPU. How can you be sure it scales with model size and distributed training?
rookan@reddit
We? It is only you, man
veinamond@reddit
Gj. However, I need to point out that since it is not peer-reviewed, it is not a full-fledged academic publication where acceptance means being chosen for publication. Not a NIPS/AAAI/IJCAI level even remotely.
westsunset@reddit
Why do you need to point it out?
veinamond@reddit
Why not? I work in academia half my life, I know exactly what I am talking about. Even peer-reviewed papers sometimes, let us say, contain unverified results that are hard to reproduce and do not correspond to the made claims. OP making it sound like this is an accomplishment when it is clearly not by normal standards for master/phd students.
KickLassChewGum@reddit
That, and SSRN is... a weird place to be pre-publishing something like this. It's giving off a "I couldn't get anyone to endorse me for arXiv" energy.
And also, from what I can gleam from skimming, the paper claims a "50% reduction in computational training cost" from... a training pipeline (a) that never sees a single GPU, (b) on a vocab of 64, (c) with a sequence length of 32, (d) evaluated by very specific synthetic tasks rather than vetted/generic datasets, (e) with "results" that are well within noise margins.
No_Swimming6548@reddit
I have no idea what that means but happy for you OP 🤗
Few_Painter_5588@reddit
Congrats OP!
AvidCyclist250@reddit
Congrats!! Affiliation says independent. Did you manage to publish this entirely on your own without formal training? Would give me some hope but for a philosophical paper with an attempt on moral grounding I’ve been not submitting for quite a while now because I’m afraid they’ll tell me to gtfo as a „layperson“ without direct ties to academia in that field at least.
nuclearbananana@reddit
Congrats OP
bonobomaster@reddit
Can you ELI5 me?
You probably need to dumb it down to caveman levels please.
hyperdynesystems@reddit
There's a momentum parameter (β 1) which is typically set to a static value but which can have benefits in different gradient scenarios (high-variance or near-stationarity) if it's varied up or down as the gradient changes.
Could be totally wrong because I haven't ever done any reading on training but that's what the abstract seems to say, from my reading of it.
En-tro-py@reddit
I'm probably not the best source, but I think it's along the line of - Keep moving in the accumulated direction, but reduce inertia when the gradient signal starts jerking around.
So converges faster in some cases and saves training time/cost.
llama-impersonator@reddit
it steps on the gas when it thinks it's safe to do
nuclearbananana@reddit
You probably want to ask OP that, I haven't read past the abstract yet. u/assemsabryy
End0rphinJunkie@reddit
Keeping the exact same memory footprint as AdamW for the lite version is a huge win for local hardware. Definately going to give the drop-in PyPI package a spin on my next fine-tune run.
Imn1che@reddit
Holy shit we got some insanely smart people here huh