admin管理员组文章数量:1201048
Python's Gensim package offers a dynamic topic model called LdaSeqModel()
. I have run into the same problem as in this issue from the Gensim mailing list (which has not been solved). The problem is that the model infers a topic that is logically impossible in the sense that it assigns a non-zero probability to a word in a time slice where the word was not used. This is a reproduction of the problem:
from gensim.corpora import Dictionary
from gensim.models import LdaSeqModel
common_texts = [
['human', 'interface', 'computer'],
['survey', 'user', 'computer', 'system', 'response', 'time'],
['eps', 'user', 'interface', 'system'],
['system', 'human', 'system', 'eps'],
['user', 'response', 'time'],
['trees'],
['graph', 'trees'],
['graph', 'minors', 'trees'],
['graph', 'minors', 'survey']
]
common_dictionary = Dictionary(common_texts)
common_corpus = [common_dictionary.doc2bow(text) for text in common_texts]
model = LdaSeqModel(corpus=common_corpus, id2word=common_dictionary, num_topics=1, time_slice=[5, 4])
model.print_topic_times(topic=0)
time_slice=[5, 4]
means that the first time slice contains the documents in the first 5 items of the common_texts
list. The term graph
is not in the first time slice, but print_topic_times()
says it is. The output is:
[[('system', 0.13896054593348167),
('user', 0.10696589214152682),
('trees', 0.10664464447111177),
('graph', 0.10643809153102356),
('computer', 0.07494460648968987),
('human', 0.07494460648968987),
('interface', 0.07494460648968987),
('response', 0.07494460648968987),
('time', 0.07494460648968987),
('eps', 0.07494460648968987),
('minors', 0.07474199433434457),
('survey', 0.01658119265037265)],
[('system', 0.13882862152464212),
('graph', 0.10742799576320598),
('trees', 0.10713473662111127),
('user', 0.1064043188010877),
('minors', 0.07517325760789559),
('computer', 0.07474729274679391),
('human', 0.07474729274679391),
('interface', 0.07474729274679391),
('response', 0.07474729274679391),
('time', 0.07474729274679391),
('eps', 0.07474729274679391),
('survey', 0.01654731320129382)]]
Do I have to set additional parameters to obtain correct results?
I have run this with Python 3.10.12 and Gensim 4.3.3.
Update January 23, 2025
I've experimented with the alphas
, passes
, and em_min_iter
parameters, none of which have an effect on the problem.
本文标签: pythonCorrect topics from LDA Sequence Model in GensimStack Overflow
版权声明:本文标题:python - Correct topics from LDA Sequence Model in Gensim - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1738617648a2103002.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论