LongPage: 300 full novels with reasoning traces for training better writing LLMs

Posted by Senior_Evidence_3793@reddit | LocalLLaMA | View on Reddit | 10 comments

Current LLMs struggle with long-form creative writing because they lack hierarchical planning. LongPage solves this by providing the reasoning scaffolds that were missing.

What it is:

Why it matters: This is the "Chain of Thought for creative writing" - explicit reasoning traces showing models how to plan character development, plot progression, and maintain thematic coherence across entire books.

Training applications:

Currently 300 books, scaling to 100K. All reasoning generated by Qwen3-32B with iterative agent validation across scene → chapter → book levels.

HF Link: https://huggingface.co/datasets/Pageshift-Entertainment/LongPage

Anyone working on long-form generation? Would love to hear what training approaches you're planning to try with this.