Harmonious Group Choreography with
Trajectory-Controllable Diffusion


Yuqin Dai1    Wanlu Zhu1    Ronghui Li2    Zeping Ren2    Xiangzheng Zhou1    Xiu Li2    Jun Li1†    Jian Yang1†   
corresponding author   
1PCA Lab, Nanjing University of Science and Technology, China
2Shenzhen International Graduate School, Tsinghua University, China
Paper Code

Abstract

Creating group choreography from music has gained attention in cultural entertainment and virtual reality, aiming to coordinate visually cohesive and diverse group movements. Despite increasing interest, recent works face challenges in achieving aesthetically appealing choreography, primarily for two key issues: multi-dancer collision and single-dancer foot slide. To address these issues, we propose a Trajectory-Controllable Diffusion (TCDiff), a novel approach that harnesses non-overlapping trajectories to facilitate coherent dance movements. Specifically, to tackle dancer collisions, we introduce a Dance-Beat Navigator capable of generating trajectories for multiple dancers based on the music, complemented by a Distance-Consistency loss to maintain appropriate spacing among trajectories within a reasonable threshold. To mitigate foot sliding, we present a Footwork Adaptor that utilizes trajectory displacement from adjacent frames to enable flexible footwork, coupled with a Relative Forward-Kinematic loss to adjust the positioning of individual dancers' root nodes and joints. Extensive experiments demonstrate that our method achieves state-of-the-art results.

Generate Results

2-Dancers

3-Dancers

4-Dancers

5-Dancers


Controllability



Position Control Result


Image 2

Our model leverages user-provided dancer positions to facilitate position swaps, yielding sensible and manageable outcomes while also generating corresponding footwork movements—a capability beyond the existing models.

Framework

Image 2

Our framework consists of two main components: the Dance-Beat Navigator (DBN) and Trajectory-Controllable Diffusion (TCDiff). To address dancer ambiguity, initially, we employ DBN to model dancer positions, as dancers' coordinates exhibit distinct differences and are less prone to confusion. Subsequently, TCDiff utilizes this result for conditional diffusion to generate corresponding dance movements. During this process, a fusion projection enhances group information before inputting it into the multi-dance transformer, while a footwork adaptor adjusts the final footwork.

User Study

Image 2

User study results based on four criteria: motion realism, music-motion correlation, formation realism, and harmony of dancers. Our model has garnered greater user favor, showcasing our superiority in aesthetic appeal.