admin管理员组

文章数量:1122816

I'm new to Executorch and was trying to convert mDeBERTA model to run on Edge devices. While I was able to at first export the model but post quantization using XNNPACKQuantizer the contized graph is failing to export with the following error

torch._dynamo.exc.TorchRuntimeError: Failed running call_function aten.gather.default(*(FakeTensor(..., size=(12, 28, 512)), -1, FakeTensor(..., size=(12, 28, 28))), **{}): gather(): Expected dtype int64 for index, but got torch.float32

Please find the code snipped below:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

#from torch.export import export_for_training
#from torch._export import exported_for_training
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
    get_symmetric_quantization_config,
    XNNPACKQuantizer,
)

# For Executorch 
from torch.export import export, ExportedProgram
from executorch.exir import to_edge

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
premise = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU"
hypothesis = "Emmanuel Macron is the President of France"
model_name = "MoritzLaurer/mDeBERTa-v3-base-mnli-xnli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, _fast_init=False, torchscript=True)
input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")

input_shape = [251000,768]
input_data = input["input_ids"].to(dtype=torch.int64)
input_data = input_data.type(torch.int64)
print("Input Data: ", input_data, " Datatype: ", type(input_data))
aten_dialect: ExportedProgram = export(model, (input_data,))

print("Got Aten operation")

quantizer = XNNPACKQuantizer().set_global(get_symmetric_quantization_config())
#prepared_graph = prepare_pt2e(aten_dialect, quantizer)
exported_model = capture_pre_autograd_graph(model, (input_data,))
prepared_graph = prepare_pt2e(exported_model, quantizer)
converted_graph = convert_pt2e(prepared_graph)
print("Quantized Graph")


print("Input Data: ", input_data, " Datatype: ", input_data.type())
inpdatai64 = torch.tensor(input_data.tolist(), dtype=torch.int64)
print("Input Data Datatype: ", inpdatai64.type())

# ERROR: The following line results in error with the gather operation 
aten_dialect1: ExportedProgram = export(converted_graph, (inpdatai64,))
print("ATen Dialect Graph")

Note that I have tried with passing dynamic input to export as well but it too ended up in the same error.

What I observe is that it is creating a FakeTensor while tracing the model as part of export. Even if we have passed an input tuple why does it create a FakeTensor? Is there any known issue with FakeTensor or am I missing something while passing the inputs to export.

本文标签: