Ablation

Single Stage v.s. Two-Stage (Ours)

Single-stage methods lose object texture and accumulate color errors over time.

Ablation on LoRA customization with multi-view images

Customization without multi-view images overfits to the frontal view.

Ablation on Extend attention for consistent enhancement

Extend Attention enhances the consistency between keyframes.

Ablation on 3D-guided video interpolation

Both depth control and rendered feature control are vital for 3D guidance in video generation.

Ablation on 3D reconstruction for background

Reconstructed background meshes enable meaningful interactions with the background element.

Reconstructed background meshes provide 3D guidance after camera movement.

Ablation on 3D-guided video generation vs. video refinement.

Renoise (Add noise to a certain noisy level and denoise again) fails to refine the black region. Tokenflow (a V2V translation method) has flickering issues.

More experiments on larger camera movement