EC-Diff: Fast and High-Quality Edge-Cloud Collaborative Inference for Diffusion Models
Abstract
Diffusion Models have shown remarkable proficiency in image and video synthesis. As model size and latency increase limit user experience, hybrid edge-cloud collaborative framework was recently proposed to realize fast inference and high-quality generation, where the cloud model initiates high-quality semantic planning and the edge model expedites later-stage refinement. However, excessive cloud denoising prolongs inference time, while insufficient steps cause semantic ambiguity, leading to inconsistency in edge model output. To address these challenges, we propose EC-Diff that accelerates cloud inference through gradient-based noise estimation while identifying the optimal point for cloud-edge handoff to maintain generation quality. Specifically, we design a K-step noise approximation strategy to reduce cloud inference frequency by using noise gradients between steps and applying cloud inference periodically to adjust errors. Then we design a two-stage greedy search algorithm to efficiently find the optimal parameters for noise approximation and edge model switching. Extensive experiments demonstrate that our method significantly enhances generation quality compared to edge inference, while achieving up to an average 2× speedup in inference compared to cloud inference.
Method
Overview of the proposed EC-Diff. Subfigure (a) depicts the collaborative architecture where the cloud model accelerates denoising for several steps before switching to the edge model for the remaining inference. Subfigure (b) depicts a two-stage greedy search to find the optimal parameter combination, where s is the switching point, k is the number of noise approximation steps, and α is the smoothing factor. Subfigure (c) depicts the cyclic process of accelerated inference using k-step noise approximation with error correction.
Results
Pretrained T2I (SD-v1.4 (Cloud)+ BK-SDM-Tiny (Edge))
Cloud (1.70s) |
Edge (1.10s) |
EC-Diff (1.05s) |
---|
![]() |
![]() |
![]() |
---|---|---|
Cloud (1.70s) |
Edge (1.10s) |
EC-Diff (1.05s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (1.70s) |
Edge (1.10s) |
EC-Diff (1.05s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (1.70s) |
Edge (1.10s) |
EC-Diff (1.05s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (1.70s) |
Edge (1.10s) |
EC-Diff (1.05s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (1.70s) |
Edge (1.10s) |
EC-Diff (1.05s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (1.70s) |
Edge (1.10s) |
EC-Diff (1.05s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (1.70s) |
Edge (1.10s) |
EC-Diff (1.05s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (1.70s) |
Edge (1.10s) |
EC-Diff (1.05s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (1.70s) |
Edge (1.10s) |
EC-Diff (1.05s) |
Pretrained T2I (SDXL-Base (Cloud) + SSD-vega (Edge))
![]() |
![]() |
![]() |
---|---|---|
Cloud (6.25s) |
Edge (3.27s) |
EC-Diff (2.61s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (6.25s) |
Edge (3.27s) |
EC-Diff (2.61s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (6.25s) |
Edge (3.27s) |
EC-Diff (2.61s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (6.25s) |
Edge (3.27s) |
EC-Diff (2.61s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (6.25s) |
Edge (3.27s) |
EC-Diff (2.61s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (6.25s) |
Edge (3.27s) |
EC-Diff (2.61s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (6.25s) |
Edge (3.27s) |
EC-Diff (2.61s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (6.25s) |
Edge (3.27s) |
EC-Diff (2.61s) |
Cloud (6.25s) |
Edge (3.27s) |
EC-Diff (2.61s) |
---|
![]() |
![]() |
![]() |
---|---|---|
Cloud (6.25s) |
Edge (3.27s) |
EC-Diff (2.61s) |
Pretrained T2V (CogVideoX-5B (Cloud) + CogVideoX-2B (Edge))
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |
![]() |
![]() |
![]() |
---|---|---|
Cloud (245s) |
Edge (91s) |
EC-Diff (105s) |