Kimodo
NVIDIA
A kinematic motion diffusion model from NVIDIA's Spatial Intelligence Lab that generates high-quality 3D human and humanoid-robot motions from text prompts and kinematic constraints (keyframes, joint positions, waypoints, paths). Trained on 700 hours of optical mocap data using a two-stage transformer denoiser that separates root and body prediction. Supports SOMA, Unitree G1, and SMPL-X skeletons.
Modality
text->motion
License
apache_2
Open source
Yes