admin管理员组

文章数量:1410737

I am trying to use AllenNLP models to parse a file to create a CCG dataset, because as a student I can't afford the CCGBank dataset, However I have to, cuz I need a dataset to help me to train a model to resolve syntactic ambiguities, parsing the sentence to ccg format is an inevitable step. I really need the model like predictor = Predictor.from_path(".02.10.tar.gz") or if you have better option , I am willing to have a try! It's my code below

import pandas as pd
from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging

# 读取原始 CSV 文件
input_path = "validation.csv"  # 替换为你的本地路径
df = pd.read_csv(input_path)
sentences = df["sentence"].tolist()

# 加载 AllenNLP 的预训练 CCG Supertagger 模型
predictor = Predictor.from_path(".02.10.tar.gz")

# 定义预测函数:输入句子,输出 “词/范畴” 序列
def get_ccg_tags(sentence):
    output = predictor.predict(sentence=sentence)
    tokens = output["words"]
    tags = output["ccg_tags"]
    tagged = [f"{w}/{t}" for w, t in zip(tokens, tags)]
    return " ".join(tagged)

# 批量处理每个句子,添加 ccg_tags 列
df["ccg_tags"] = df["sentence"].apply(get_ccg_tags)

# 保存结果到新文件
output_path = "validation_with_allennlp_ccg.csv"
df.to_csv(output_path, index=False)

print(f" AllenNLP CCG :{output_path}")

本文标签: nlpAllenNLP all models about ccgsupertagger are unavailable How to fix or download itStack Overflow