admin管理员组

文章数量:1123425

I have been trying to use nltk's entity chunker, and tried different approaches but I keep getting the error:

LookupError                               Traceback (most recent call last)
       ...
     8 pos_sentences = [nltk.pos_tag(sent) for sent in token_sentences] 
     10 # Create the named entity chunks: chunked_sentences
---> 11 chunked_sentences = nltk.ne_chunk(pos_sentences, binary=True)
     13 # Test for stems of the tree with 'NE' tags
     14 for sent in chunked_sentences:

    178 """
    179 Use NLTK's currently recommended named entity chunker to
    180 chunk the given list of tagged tokens.
   (...)
    187 
    188 """
    189 if binary:
--> 190     chunker = ne_chunker(fmt="binary")
    191 else:
    192     chunker = ne_chunker()

    170 def ne_chunker(fmt="multiclass"):
    171     """
    172     Load NLTK's currently recommended named entity chunker.
    173     """
--> 174     return Maxent_NE_Chunker(fmt)
...
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'

I've attempted to use both my own code, and the below example, which i found on a blog post:

from nltk.tokenize import word_tokenize

import nltk
from nltk.chunk import ne_chunk
nltk.download('punkt')
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

article="The taxi-hailing company Uber brings into very sharp focus the question of whether corporations can be said to have a moral character. If any human being were..."
print(article)

# Tokenize the article into sentences: sentences
sentences = nltk.sent_tokenize(article)

# Tokenize each sentence into words: token_sentences
token_sentences = [nltk.word_tokenize(sent) for sent in sentences]

# Tag each tokenized sentence into parts of speech: pos_sentences
pos_sentences = [nltk.pos_tag(sent) for sent in token_sentences] 

# Create the named entity chunks: chunked_sentences
chunked_sentences = nltk.ne_chunk(pos_sentences, binary=True)

# Test for stems of the tree with 'NE' tags
for sent in chunked_sentences:
    for chunk in sent:
        if hasattr(chunk, "label") and chunk.label() == "NE":
            print(chunk)

I've tried "from nltk.chunk import ne_chunk" and "from nltk import ne_chunk", and I have also tried to use ne_chunk_sents() instead of ne_chunk(). I've tried reproducing multiple other code examples, but it seems like I still get the same error when using nltk's ne_chunk.

My question is, what could be causing this?

本文标签: pythonIssues with nltk39s nechunkStack Overflow