How does a model like QwQ do calculations like 4692*2 „in its head“?
Posted by andWan@reddit | LocalLLaMA | View on Reddit | 33 comments
Here: https://huggingface.co/Qwen/QwQ-32B-Preview I did ask the model „What is 4792 * 3972?“ I saw in the chain of thought how it started to brake that down into 4 simpler multiplications which makes sense. But then it was able to calculate „4792 × 2 = 9584“ outside of the generated text. Were calculations like this just in the learning data? Or can this be achieved via the attention mechanism in the Transformer architecture? Are there studies that have investigated the numbers inside the attention mechanism as they were being updated?
I have studied „Neural Systems and Computation“ but for 14 years not worked in this field. My best knowledge stems from the 3Blue1Brown video series about LLMs.
33 Comments
the_trve@reddit
noiserr@reddit
mysticmoontree@reddit
bromix_o@reddit
maddogawl@reddit
infiniteContrast@reddit
Zulfiqaar@reddit
TheRealGentlefox@reddit
vornamemitd@reddit
Shawnrushefsky@reddit
Shawnrushefsky@reddit
phree_radical@reddit
Thomas-Lore@reddit
IWantAGI@reddit
OrangeESP32x99@reddit
IWantAGI@reddit
stevekite@reddit
Wiskkey@reddit
SwimmingTranslator83@reddit
omarx888@reddit
ThisWillPass@reddit
nero10578@reddit
gtek_engineer66@reddit
Mediocre_Tree_5690@reddit
sr1729@reddit
ethereel1@reddit
Ok-Parsnip-4826@reddit
Vivid_Dot_6405@reddit
logicchains@reddit
EstarriolOfTheEast@reddit
andWan@reddit (OP)
aurelivm@reddit
mxforest@reddit