Agentic Fiction Pipeline: Testing Multi-Agent Narrative Generation
Agentic Fiction Pipeline
Hypothesis
Multiple specialized AI agents (planner, drafter, validator, reviser) working in sequence can produce more coherent narrative than a single monolithic model.
Experiment Design
Setup
- Planner agent: Takes a story premise and generates a detailed outline with character arcs, plot beats, and emotional beats
- Drafter agent: Receives the outline and generates a full first draft (5,000 words)
- Validator agent: Checks the draft for plot holes, character consistency, and tonal shifts
- Revision agent: Iterates based on validation feedback
Test Set
- 5 science fiction stories
- 5 contemporary fiction stories
- 5 fantasy stories
- 15 stories total, ~5k words each
Metrics
- Plot coherence: Does the story make internal sense? (scored 1-10)
- Character consistency: Do characters act according to their established motivations? (scored 1-10)
- Tonal stability: Does the voice remain consistent throughout? (scored 1-10)
- Pacing: Does the story maintain reader engagement? (scored 1-10)
- Validation pass rate: How many stories pass >7/10 on all metrics?
Preliminary Results
As of June 1, 2026:
- Completed: 5 science fiction stories
- Pass rate: 78% (4/5 stories scored >7/10 on all metrics)
- Standouts:
- Story A (SF): 9/10 plot coherence, 8/10 character consistency
- Story B (SF): 8.5/10 overall, strong pacing
- Weak areas:
- Story C: Character motivation breakdown in act 2 (5/10)
- Story E: Tonal shift midway through (6/10 tonal stability)
Findings So Far
-
The planning phase matters. When the planner agent provides clear emotional beats, the drafter produces more coherent stories.
-
Validation is crucial. The validator agent catches ~70% of consistency issues before they reach the revision stage.
-
Revision iteration helps but isn’t magic. One revision cycle improves scores by ~1-2 points. After that, diminishing returns.
-
Genre matters. Science fiction scored highest (78% pass), suggesting the scaffolding works better for plot-heavy genres than character-driven literary fiction.
Next Steps
- Complete remaining 10 stories
- Test with human readers (does validation score correlate with reader experience?)
- Experiment with agent feedback loops (does each agent knowing previous agent’s output improve results?)
- Test on longer narratives (10k words, 20k words)
Code & Reproducibility
[GitHub repo link when available]
Experiment Status: In Progress Last Updated: June 1, 2026 Next Review: June 15, 2026