xAR

xAR

by ByteDance, Johns Hopkins University
xAR is a novel autoregressive visual generation framework that enhances image generation quality and speed using Next-X Prediction and Noisy Context Learning techniques.

What is xAR?

xAR is a novel autoregressive visual generation framework developed by ByteDance and Johns Hopkins University. It enhances image generation quality and speed using innovative techniques like Next-X Prediction and Noisy Context Learning.

xAR

Main Features of xAR

  • Next-X Prediction: Extends traditional "next token prediction" to support the prediction of more complex entities like image patches, cells, subsamples, and entire images, capturing richer semantic information.
  • Noisy Context Learning: Introduces noise during training to improve the model's robustness to errors and mitigate cumulative errors.
  • High-Performance Generation: Outperforms existing technologies like DiT and other diffusion models in both inference speed and generation quality on the ImageNet dataset.
  • Flexible Prediction Units: Supports various prediction unit designs, making it suitable for different visual generation tasks.

Technical Principles of xAR

  • Flow Matching: xAR transforms the discrete token classification problem into a continuous entity regression problem. It generates noisy inputs through interpolation and noise injection, predicting the direction flow (Velocity) from the noise distribution to the target distribution in each autoregressive step.
  • Inference Strategy: xAR generates images step-by-step in an autoregressive manner, starting from Gaussian noise and gradually generating the next unit until the entire image is completed.
  • Experimental Results: xAR has achieved significant performance improvements on the ImageNet-256 and ImageNet-512 benchmarks, with the xAR-B model being 20 times faster in inference speed than DiT-XL and achieving an FID of 1.72.

Application Scenarios of xAR

  • Art Creation: Generate creative images for inspiration or direct use in artworks.
  • Virtual Scene Generation: Quickly generate realistic virtual scenes for game development and virtual reality.
  • Old Photo Restoration: Restore damaged parts of old photos, recovering original details and colors.
  • Video Content Generation: Generate specific scenes or objects in videos for video effects production and editing.
  • Data Augmentation: Expand training datasets by generating diverse images, improving model generalization and robustness.

Project Address of xAR

Framework Features

Supported Tasks
Image Generation Art Creation Virtual Scene Generation Old Photo Restoration Video Content Generation Data Augmentation
Tags
Autoregressive Models Visual Generation AI Framework Image Generation Next-X Prediction Noisy Context Learning High-Performance Generation Flow Matching Data Augmentation Art Creation

Getting Started

Pricing
free

Screenshots & Images

Primary Screenshot
Additional Images

Stats

0 Views
0 Favorites

Similar Frameworks

TPO
0
Phantom by ByteDance
0
AgentSociety by Tsinghua University
0