nlp - How to Fine-Tune Projection Layer in CLIP Model Using LoRA? - Stack Overflow

IT技术

更新时间：2025-04-140

admin管理员组
文章数量:1389890

I'm trying to fine-tune the projection layers in the CLIP model using LoRA.

I need help identifying the exact projection layers to modify for my fine-tuning and how I can apply LoRA to them.

Model loading:

import clip

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

Model structure when printed

CLIP(
  (visual): VisionTransformer()
  (transformer): Transformer()
  (token_embedding): Embedding(49408, 512)
  (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)

I need help identifying the exact projection layers to modify for my fine-tuning and how I can apply LoRA to them.

I'm trying to fine-tune the projection layers in the CLIP model using LoRA.

I need help identifying the exact projection layers to modify for my fine-tuning and how I can apply LoRA to them.

Model loading:

import clip

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

Model structure when printed

CLIP(
  (visual): VisionTransformer()
  (transformer): Transformer()
  (token_embedding): Embedding(49408, 512)
  (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)

I need help identifying the exact projection layers to modify for my fine-tuning and how I can apply LoRA to them.

Share Improve this question edited Mar 26 at 13:39 cronoik 19.6k4 gold badges51 silver badges90 bronze badges asked Mar 17 at 7:37 Fadela 211 silver badge3 bronze badges

Welcome to SO. How are you loading the model? Via original openai code? Keep in mind that the projection layers are just linear layers, which means you won't benefit (much) from classic lora. – cronoik Commented Mar 22 at 16:37
@cronoik Thank you for your comment! I am indeed using clip.load("ViT-B/32", device=device) from the standard clip library. yes this is only an experiment for me to do.. I'm still a bit lost on where to apply LoRA within the model.. I've tried looking for layers with "proj" in their name, but I'm not sure if those are the correct projection layers for LoRA. Could you clarify which kind of layers are typically considered "projection layers" in CLIP for LoRA fine-tuning? Maybe knowing the layer type or position in the network flow would help me identify them accurately. – Fadela Commented Mar 26 at 0:35

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

You will not see the projection layers when you print the architecture with print(model), because the projection layers are initialized with nn.Parameter() in the openai CLIP repo (unlike the huggingface implementation which uses linear layers). The code references can be found:

visual projection layer: code
text projection layer: code

You can still print the layers initialized with nn.Parameter by:

for name, param in model.named_parameters():
    print(f'{name}: {param.shape}')

Output:

text_projection: torch.Size([512, 512])
visual.proj: torch.Size([768, 512])
...

The issue you face now is that nn.Parameter is not supported by peft/LoRA (explanation). You could now either modify the Clip code (using nn.Linear instead of nn.Parameter) or use the CLIP implementation of huggingface (mind the different layer names):

from transformers import CLIPModel
from peft import LoraConfig, get_peft_model

transformers_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")

config = LoraConfig(
    target_modules=["visual_projection", "text_projection"],
)

peft_model = get_peft_model(transformers_model, config)
peft_model.print_trainable_parameters()

Output:

trainable params: 18,432 || all params: 151,295,745 || trainable%: 0.0122

本文标签： nlpHow to FineTune Projection Layer in CLIP Model Using LoRAStack Overflow

版权声明：本文标题：nlp - How to Fine-Tune Projection Layer in CLIP Model Using LoRA? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744574997a2613568.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

nlp - How to Fine-Tune Projection Layer in CLIP Model Using LoRA? - Stack Overflow

1 Answer 1

更多相关文章