xAR

by ByteDance, Johns Hopkins University

xAR is a novel autoregressive visual generation framework that enhances image generation quality and speed using Next-X Prediction and Noisy Context Learning techniques.

What is xAR?

xAR is a novel autoregressive visual generation framework developed by ByteDance and Johns Hopkins University. It enhances image generation quality and speed using innovative techniques like Next-X Prediction and Noisy Context Learning.

xAR

Main Features of xAR

Next-X Prediction: Extends traditional "next token prediction" to support the prediction of more complex entities like image patches, cells, subsamples, and entire images, capturing richer semantic information.
Noisy Context Learning: Introduces noise during training to improve the model's robustness to errors and mitigate cumulative errors.
High-Performance Generation: Outperforms existing technologies like DiT and other diffusion models in both inference speed and generation quality on the ImageNet dataset.
Flexible Prediction Units: Supports various prediction unit designs, making it suitable for different visual generation tasks.

Technical Principles of xAR

Flow Matching: xAR transforms the discrete token classification problem into a continuous entity regression problem. It generates noisy inputs through interpolation and noise injection, predicting the direction flow (Velocity) from the noise distribution to the target distribution in each autoregressive step.
Inference Strategy: xAR generates images step-by-step in an autoregressive manner, starting from Gaussian noise and gradually generating the next unit until the entire image is completed.
Experimental Results: xAR has achieved significant performance improvements on the ImageNet-256 and ImageNet-512 benchmarks, with the xAR-B model being 20 times faster in inference speed than DiT-XL and achieving an FID of 1.72.

Application Scenarios of xAR

Art Creation: Generate creative images for inspiration or direct use in artworks.
Virtual Scene Generation: Quickly generate realistic virtual scenes for game development and virtual reality.
Old Photo Restoration: Restore damaged parts of old photos, recovering original details and colors.
Video Content Generation: Generate specific scenes or objects in videos for video effects production and editing.
Data Augmentation: Expand training datasets by generating diverse images, improving model generalization and robustness.

Project Address of xAR

Project Website: https://oliverrensu.github.io/project/xAR/
arXiv Technical Paper: https://arxiv.org/pdf/2502.20388

Framework Features

Supported Tasks

Image Generation Art Creation Virtual Scene Generation Old Photo Restoration Video Content Generation Data Augmentation

Getting Started

Pricing

free

Screenshots & Images

Primary Screenshot

Additional Images

View Repository Documentation

Stats

0 Views

0 Favorites

Similar Frameworks

TPO

Phantom by ByteDance

AgentSociety by Tsinghua University

Helping everyone find the best AI for their work and daily life through deep analysis and honest comparisons.

Company

About Contact News Insights

Stay Updated

Get notified about new AI tools, models, and insights.

xAR

What is xAR?

Main Features of xAR

Technical Principles of xAR

Application Scenarios of xAR