Jinsong Zhou1,3*, Yihua Du1*, Xinli Xu1*†, Luozhou Wang1, Zijie Zhuang1, Yehang Zhang1, Shuaibo Li1, Xiaojun Hu3, Bolan Su3, Ying-Cong Chen1,2‡
1HKUST(GZ) 2HKUST 3ByteDance
*Equal Contribution †Project Lead ‡Corresponding Author
Official implementation of VideoMemory: Toward Consistent Video Generation via Memory Integration.
VideoMemory is a multi-agent video generation framework built on LangGraph that automatically transforms screenplay text into coherent video content. By constructing a Visual Memory Bank to maintain consistency of characters, scenes, and props, it enables a high-quality automated video production pipeline.
- [✅] Multi-Agent Collaboration: Three-stage pipeline architecture (Storyboard → Memory → Visualization)
- [✅] Visual Memory Bank: Automatically manages character, scene, and prop assets to ensure cross-shot visual consistency
- [✅] Structured Output: Strict output control based on Pydantic Schema
- [✅] Flexible Generation Backend: Supports Replicate (Nano-Banana) for image generation and Sora-2 for video generation
We recommend using Python>=3.11 and uv package manager.
# Clone the repository
git clone https://github.com/your-username/VideoMemory.git
cd VideoMemory
# Create virtual environment and install dependencies using uv
uv sync
source .venv/bin/activatecp env.example .envEdit the .env file with your API keys:
OPENAI_API_KEY=your_openai_api_key
# Generation API
REPLICATE_API_TOKEN=your_replicate_token
# LangSmith (Optional, for tracing)
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=VideoMemoryPlace screenplay files in the scripts/ directory following standard screenplay format.
source .venv/bin/activate
python main.pyIf you find this project helpful in your research or applications, please cite it as follows:
@article{zhou2026videomemory,
title={VideoMemory: Toward Consistent Video Generation via Memory Integration},
author={Zhou, Jinsong and Du, Yihua and Xu, Xinli and Wang, Luozhou and Zhuang, Zijie and Zhang, Yehang and Li, Shuaibo and Hu, Xiaojun and Su, Bolan and Chen, Ying-cong},
journal={arXiv preprint arXiv:2601.03655},
year={2026}
}This project is licensed under the CC BY-NC-SA 4.0 (Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License).
The code is provided for academic research purposes only.
For any questions, please contact jzhou945@connect.hkust-gz.edu.cn

