Prompt Relay: Inference-Time Temporal Prompt Routing For Multi-Event Video Generation


S-Lab, Nanyang Technological University

TL;DR

Existing video generation models do not have mechanisms to support fine-grained temporal control in multi-event video generation. To this end, we propose Prompt Relay, an inference-time, training-free, plug-and-play method to support granular control over the temporal placement of each text prompt.

Temporal cross-attention teaser

Method

Given a sequence of temporally-constrained text prompts {(ps, tsstart, tsend)}Ns=1, our goal is to generate a video such that each arbitrary prompt ps is realized within its designated temporal interval [tsstart, tsend]. The generated video should preserve global coherence while ensuring that each prompt influences only its assigned temporal region.

Video Gallery