machine learning - Deconstructiong the Stable Diffusion 3.5 pipeline - Stack Overflow

IT技术

更新时间：2025-04-175

admin管理员组
文章数量:1405113

I am trying to deconstruct the SD3.5 (specifically 3.5 medium) pipeline in order to have a controlled process over the denoising steps. I can't do callbacks because I need to modify the latent according to other pipelines.

I am trying to perform the steps on the following huggingface guide:

I modified my text encoding to fit SD3.5, I also tried to load the entire pipeline and just run the encode_prompt function on it to get the text embedding and pooled embeddings for both the prompt and negative prompt. When running the function and putting its outputs as input to the regular pipeline instead of the prompt and negative prompt it works properly so it seems like this is not what's causing the problem.

I also changed the unet from the article to use the pre-trained transformer of the model. After that I adjusted the decoding to match the same decoding on the pipeline's source code on diffusers.

the output images don't look same as they are looking when running the pipeline through diffusers. I'm not sure where I can find a similar implementation to deconstruction of the SD3 pipeline or what am I missing.

enter image description here

I am trying to deconstruct the SD3.5 (specifically 3.5 medium) pipeline in order to have a controlled process over the denoising steps. I can't do callbacks because I need to modify the latent according to other pipelines.

I am trying to perform the steps on the following huggingface guide: https://huggingface.co/docs/diffusers/en/using-diffusers/write_own_pipeline#deconstruct-the-stable-diffusion-pipeline

I modified my text encoding to fit SD3.5, I also tried to load the entire pipeline and just run the encode_prompt function on it to get the text embedding and pooled embeddings for both the prompt and negative prompt. When running the function and putting its outputs as input to the regular pipeline instead of the prompt and negative prompt it works properly so it seems like this is not what's causing the problem.

I also changed the unet from the article to use the pre-trained transformer of the model. After that I adjusted the decoding to match the same decoding on the pipeline's source code on diffusers.

the output images don't look same as they are looking when running the pipeline through diffusers. I'm not sure where I can find a similar implementation to deconstruction of the SD3 pipeline or what am I missing.

enter image description here

Share Improve this question edited Mar 8 at 19:26 desertnaut 60.5k32 gold badges155 silver badges182 bronze badges asked Mar 8 at 17:06 Curious Scientist 1

Add a comment |

1 Answer 1

Sorted by: Reset to default -1

It looks like the issue might be with the noise scheduler or latent processing. Make sure your scheduler settings match the default in diffusers, and check if the UNet’s predicted noise aligns with the official pipeline. If the results are still off, compare the shape and scale of your latents.

本文标签： machine learningDeconstructiong the Stable Diffusion 35 pipelineStack Overflow

版权声明：本文标题：machine learning - Deconstructiong the Stable Diffusion 3.5 pipeline - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744890642a2630758.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

machine learning - Deconstructiong the Stable Diffusion 3.5 pipeline - Stack Overflow

1 Answer 1

更多相关文章