Hyper-SD

Hyper-SD

by ByteDance
Hyper-SD is an efficient image synthesis framework designed to reduce computational costs in multi-step inference for diffusion models while maintaining high image quality.

What is Hyper-SD?

Hyper-SD is an efficient image synthesis framework developed by ByteDance researchers, designed to address the high computational costs of existing diffusion models during multi-step inference. By employing Trajectory Segmentation Consistency Distillation (TSCD) technology, it maintains data consistency across different time periods, effectively preserving the original ODE (Ordinary Differential Equation) trajectory. Additionally, it incorporates human feedback learning to optimize model performance in low-step inference scenarios and uses score distillation to further enhance single-step inference image quality. This framework significantly reduces the necessary inference steps while maintaining high image quality, enabling rapid generation of high-resolution images and advancing the field of generative AI.

Hyper-SD

Official Website and Resources

How Hyper-SD Works

  • Trajectory Segmentation Consistency Distillation (TSCD): Divides the training time step range [0, T] into k uniform time periods. Performs consistency distillation within each time period, using the original model as the teacher and the student model gradually learning the teacher model's behavior. By progressively reducing the number of time periods (e.g., 8 → 4 → 2 → 1), the student model is trained to approximate the global behavior of the teacher model.

  • Human Feedback Learning (ReFL): Utilizes human feedback on image preferences to optimize the model. Trains a reward model to recognize and reward images that align more closely with human aesthetics. Through iterative denoising and direct prediction, combined with feedback from the reward model, the student model is fine-tuned.

  • Score Distillation: Uses the score functions of real and fake distributions to guide the single-step inference process. By minimizing the KL divergence between the two distributions, the student model's single-step generation performance is optimized.

  • Low-Rank Adaptation (LoRA): Employs LoRA technology to adapt and train the student model, making it a lightweight plugin that can be quickly deployed and used.

  • Training and Loss Function Optimization: Defines a loss function that combines consistency loss, human feedback loss, and score distillation loss. Uses optimization algorithms like gradient descent to train the student model while updating the LoRA plugin.

  • Inference and Image Generation: After training, the student model is used for the image generation inference process. Depending on the application scenario, an appropriate number of inference steps is selected to balance generation quality and efficiency.

  • Performance Evaluation: Uses quantitative metrics (e.g., CLIP score, aesthetic score) and qualitative metrics (e.g., user studies) to evaluate the quality of generated images. Based on the evaluation results, model parameters are further adjusted and optimized.

Framework Features

Supported Tasks
Image Generation High-Resolution Image Synthesis Low-Step Inference Optimization
Tags
Image Synthesis Diffusion Models Machine Learning AI Research Generative AI High-Resolution Images Computational Efficiency Human Feedback Learning Score Distillation Low-Rank Adaptation

Getting Started

Screenshots & Images

Primary Screenshot
Additional Images

Stats

0 Views
0 Favorites

Similar Frameworks

TPO
0
Phantom by ByteDance
0
AgentSociety by Tsinghua University
0