admin管理员组文章数量:1208155
By trial and error, I've found that the special characters requiring escapement in a Vespa matches
query include more than just the quote "
and backslash \
characters noted at .html. Others (e.g. *
) need double-escapement.
For now, I'm using the following Python regex substitution:
def escape_yql(text: str) -> str:
subtext = re.sub(r'[\\"]', r"\\\g<0>", text)
return re.sub(r'[*]', r"\\\\\g<0>", subtext)
It's used in YQL construction for an exact match regex search like this:
yql = f'select * from sources * where testfield matches "^{escape_yql(text)}$"'
However, confusingly (to me, at least) single- or double-escapement of round brackets ()
causes failure, with a message in the form "Could not create query from YQL: query:L1:58 no viable alternative". No error is usually raised if no escapement is applied to these characters, but not if the number of opening- and closing-brackets is unequal: in that case it becomes evident that they are in fact being parsed, with a message like this:
select * from sources * where name matches \"^TestText))$\" and bcp47_language matches \"^en$\" limit 1
"yql: [{'code': 4, 'summary': 'Invalid query parameter', 'message': \"Could not create query from YQL: Unmatched closing ')' near index 8\\n^TestText))$\\n ^\"}]"
So I wonder which of the other special regex characters might require single- or double-escapement, something else, or none?
For various reasons I'm going to sidestep this problem by creating an additional field in my schema in which a copy of my tokenised testfield
has an exact
property, but it nevertheless seems important to figure this out . Any suggestions, please?
By trial and error, I've found that the special characters requiring escapement in a Vespa matches
query include more than just the quote "
and backslash \
characters noted at https://docs.vespa.ai/en/reference/query-language-reference.html. Others (e.g. *
) need double-escapement.
For now, I'm using the following Python regex substitution:
def escape_yql(text: str) -> str:
subtext = re.sub(r'[\\"]', r"\\\g<0>", text)
return re.sub(r'[*]', r"\\\\\g<0>", subtext)
It's used in YQL construction for an exact match regex search like this:
yql = f'select * from sources * where testfield matches "^{escape_yql(text)}$"'
However, confusingly (to me, at least) single- or double-escapement of round brackets ()
causes failure, with a message in the form "Could not create query from YQL: query:L1:58 no viable alternative". No error is usually raised if no escapement is applied to these characters, but not if the number of opening- and closing-brackets is unequal: in that case it becomes evident that they are in fact being parsed, with a message like this:
select * from sources * where name matches \"^TestText))$\" and bcp47_language matches \"^en$\" limit 1
"yql: [{'code': 4, 'summary': 'Invalid query parameter', 'message': \"Could not create query from YQL: Unmatched closing ')' near index 8\\n^TestText))$\\n ^\"}]"
So I wonder which of the other special regex characters might require single- or double-escapement, something else, or none?
For various reasons I'm going to sidestep this problem by creating an additional field in my schema in which a copy of my tokenised testfield
has an exact
property, but it nevertheless seems important to figure this out . Any suggestions, please?
2 Answers
Reset to default 1You shouldn't really need to escape if you create your strings on this format: 'select * from sd1 where f1 matches "^TestText$"'
.
Though, it seems the error you are seeing is related to incomplete parenthesis closing.
If you are constructing your query from Python, I can recommend checking out the newly released querybuilder-module in Pyvespa
I also added PR to pyvespa with a unit and integration test to demonstrate the usage.
Also note from docs on matches:
Regular expression match is supported using posix extended syntax, with the limitation that it is case-insensitive.
Hope this helps.
-Thomas
You should be using where testfield contains "whatever"
instead. There's no good reason to force use of regexp matching if you don't have a regexp.
Furthermore, I don't think it will help adding a copy of the field with exact
property, because it's the query that specifies use of regexp, so you will get the same behavior there.
本文标签: pythonEscaping special characters in a Vespa YQL matches queryStack Overflow
版权声明:本文标题:python - Escaping special characters in a Vespa YQL `matches` query - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1738682444a2106634.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论