admin管理员组文章数量:1391929
Trying to import RegexTextSplitter using
from langchain.text_splitter import RegexTextSplitter ,RecursiveCharacterTextSplitter
And I get the error
from langchain.text_splitter import RegexTextSplitter ,RecursiveCharacterTextSplitteImportError: cannot import name 'RegexTextSplitter' from 'langchain.text_splitter'
Doing
from langchain_text_splitters import RegexTextSplitter, RecursiveCharacterTextSplitter
as suggested here does not work either.
from langchain_text_splitters import RegexTextSplitter, RecursiveCharacterTextSplitterImportError: cannot import name 'RegexTextSplitter' from 'langchain_text_splitters'
Is RegexTextSplitter just not present in the latest version of langchain? Then why is this piece of documentation available?
from langchain.text_splitter import RegexTextSplitter
Running dir()
gives
['CharacterTextSplitter', 'ElementType', 'ExperimentalMarkdownSyntaxTextSplitter', 'HTMLHeaderTextSplitter', 'HTMLSectionSplitter', 'HTMLSemanticPreservingSplitter', 'HeaderType', 'KonlpyTextSplitter', 'Language', 'LatexTextSplitter', 'LineType', 'MarkdownHeaderTextSplitter', 'MarkdownTextSplitter', 'NLTKTextSplitter', 'PythonCodeTextSplitter', 'RecursiveCharacterTextSplitter', 'RecursiveJsonSplitter', 'SentenceTransformersTokenTextSplitter', 'SpacyTextSplitter', 'TextSplitter', 'TokenTextSplitter', 'Tokenizer', 'annotations', 'builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'spec', 'split_text_on_tokens']
Which one is the best to split a document by Regex? Thanks
Trying to import RegexTextSplitter using
from langchain.text_splitter import RegexTextSplitter ,RecursiveCharacterTextSplitter
And I get the error
from langchain.text_splitter import RegexTextSplitter ,RecursiveCharacterTextSplitteImportError: cannot import name 'RegexTextSplitter' from 'langchain.text_splitter'
Doing
from langchain_text_splitters import RegexTextSplitter, RecursiveCharacterTextSplitter
as suggested here does not work either.
from langchain_text_splitters import RegexTextSplitter, RecursiveCharacterTextSplitterImportError: cannot import name 'RegexTextSplitter' from 'langchain_text_splitters'
Is RegexTextSplitter just not present in the latest version of langchain? Then why is this piece of documentation available?
from langchain.text_splitter import RegexTextSplitter
Running dir()
gives
['CharacterTextSplitter', 'ElementType', 'ExperimentalMarkdownSyntaxTextSplitter', 'HTMLHeaderTextSplitter', 'HTMLSectionSplitter', 'HTMLSemanticPreservingSplitter', 'HeaderType', 'KonlpyTextSplitter', 'Language', 'LatexTextSplitter', 'LineType', 'MarkdownHeaderTextSplitter', 'MarkdownTextSplitter', 'NLTKTextSplitter', 'PythonCodeTextSplitter', 'RecursiveCharacterTextSplitter', 'RecursiveJsonSplitter', 'SentenceTransformersTokenTextSplitter', 'SpacyTextSplitter', 'TextSplitter', 'TokenTextSplitter', 'Tokenizer', 'annotations', 'builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'spec', 'split_text_on_tokens']
Which one is the best to split a document by Regex? Thanks
Share Improve this question asked Mar 12 at 13:13 Dev_ADev_A 235 bronze badges1 Answer
Reset to default 2The RegexTextSplitter
was deprecated. The introduction of the RecursiveCharacterTextSplitter
class, which supports regular expressions through the is_separator_regex parameter, offers a more flexible and unified approach to text splitting. You can use it like this:
from langchain.text_splitter import RecursiveCharacterTextSplitter
# or alternatively:
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Define your regex separators
separators = [r'\n\n', r'\n', r'(?<=[.?!])\s+']
# Initialize the RecursiveCharacterTextSplitter with regex separators
text_splitter = RecursiveCharacterTextSplitter(
separators=separators,
is_separator_regex=True,
chunk_size=1000, # Set your desired chunk size
chunk_overlap=0 # Set your desired chunk overlap
)
# Example text to split
text = """
Mr. Smith bought cheapsite for 1.5 million dollars, i.e., he paid a lot for it.
Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true...
Well, with a probability of .9 it isn't.
"""
# Split the text
chunks = text_splitter.split_text(text)
# Output the chunks
for i, chunk in enumerate(chunks):
print(f"Chunk {i+1}:\n{chunk}\n")
The is_separtor_regex = True
is crucial when you want to use regex expressions as separators!
本文标签: pythonRegexTextSplitter does not exist in langchaintextsplittersStack Overflow
版权声明:本文标题:python - RegexTextSplitter does not exist in langchain_text_splitters? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744750247a2623132.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论