admin管理员组

文章数量:1405113

I am trying to deconstruct the SD3.5 (specifically 3.5 medium) pipeline in order to have a controlled process over the denoising steps. I can't do callbacks because I need to modify the latent according to other pipelines.

I am trying to perform the steps on the following huggingface guide:

I modified my text encoding to fit SD3.5, I also tried to load the entire pipeline and just run the encode_prompt function on it to get the text embedding and pooled embeddings for both the prompt and negative prompt. When running the function and putting its outputs as input to the regular pipeline instead of the prompt and negative prompt it works properly so it seems like this is not what's causing the problem.

I also changed the unet from the article to use the pre-trained transformer of the model. After that I adjusted the decoding to match the same decoding on the pipeline's source code on diffusers.

the output images don't look same as they are looking when running the pipeline through diffusers. I'm not sure where I can find a similar implementation to deconstruction of the SD3 pipeline or what am I missing.

enter image description here

I am trying to deconstruct the SD3.5 (specifically 3.5 medium) pipeline in order to have a controlled process over the denoising steps. I can't do callbacks because I need to modify the latent according to other pipelines.

I am trying to perform the steps on the following huggingface guide: https://huggingface.co/docs/diffusers/en/using-diffusers/write_own_pipeline#deconstruct-the-stable-diffusion-pipeline

I modified my text encoding to fit SD3.5, I also tried to load the entire pipeline and just run the encode_prompt function on it to get the text embedding and pooled embeddings for both the prompt and negative prompt. When running the function and putting its outputs as input to the regular pipeline instead of the prompt and negative prompt it works properly so it seems like this is not what's causing the problem.

I also changed the unet from the article to use the pre-trained transformer of the model. After that I adjusted the decoding to match the same decoding on the pipeline's source code on diffusers.

the output images don't look same as they are looking when running the pipeline through diffusers. I'm not sure where I can find a similar implementation to deconstruction of the SD3 pipeline or what am I missing.

enter image description here

Share Improve this question edited Mar 8 at 19:26 desertnaut 60.5k32 gold badges155 silver badges182 bronze badges asked Mar 8 at 17:06 Curious ScientistCurious Scientist 1
Add a comment  | 

1 Answer 1

Reset to default -1

It looks like the issue might be with the noise scheduler or latent processing. Make sure your scheduler settings match the default in diffusers, and check if the UNet’s predicted noise aligns with the official pipeline. If the results are still off, compare the shape and scale of your latents.

本文标签: machine learningDeconstructiong the Stable Diffusion 35 pipelineStack Overflow