admin管理员组

文章数量:1332619

I am working on a llama fine-tuning task. When I train on a single GPU, the program runs fine.

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["TOKENIZERS_PARALLELISM"] = "false"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_name = "../models/llama3_8b/"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map=device,
    torch_dtype=compute_dtype,
    quantization_config=bnb_config,
)

But when I wanted to use multiple GPUs for fine-tuning, an error occurred. The modified code is as follows:

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    # device_map=device,
    **device_map="auto",**  # Modifications
    torch_dtype=compute_dtype,
    quantization_config=bnb_config,
)
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj",],
)
training_arguments = TrainingArguments(
    ...
    **local_rank=os.getenv("LOCAL_RANK", -1),**  # Modifications
    **ddp_find_unused_parameters=False,**  # Modifications
)
trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=train_data,
    #eval_dataset=eval_data,
    peft_config=peft_config,
    dataset_text_field="text",
    tokenizer=tokenizer,
    max_seq_length=max_seq_length,
    packing=False,
    dataset_kwargs={
        "add_special_tokens": False,
        "append_concat_token": False,
    },
)
trainer.train()

The error are as follows:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

Executing Code:

CUDA_VISIBLE_DEVICES=3,4 python llama3.py

Does anyone know how to solve it?

本文标签: