admin管理员组

文章数量:1208155

By trial and error, I've found that the special characters requiring escapement in a Vespa matches query include more than just the quote " and backslash \ characters noted at .html. Others (e.g. *) need double-escapement.

For now, I'm using the following Python regex substitution:

def escape_yql(text: str) -> str:
    subtext = re.sub(r'[\\"]', r"\\\g<0>", text)
    return re.sub(r'[*]', r"\\\\\g<0>", subtext)

It's used in YQL construction for an exact match regex search like this:

yql = f'select * from sources * where testfield matches "^{escape_yql(text)}$"'

However, confusingly (to me, at least) single- or double-escapement of round brackets () causes failure, with a message in the form "Could not create query from YQL: query:L1:58 no viable alternative". No error is usually raised if no escapement is applied to these characters, but not if the number of opening- and closing-brackets is unequal: in that case it becomes evident that they are in fact being parsed, with a message like this:

select * from sources * where name matches \"^TestText))$\" and bcp47_language matches \"^en$\" limit 1
"yql: [{'code': 4, 'summary': 'Invalid query parameter', 'message': \"Could not create query from YQL: Unmatched closing ')' near index 8\\n^TestText))$\\n        ^\"}]"

So I wonder which of the other special regex characters might require single- or double-escapement, something else, or none?

For various reasons I'm going to sidestep this problem by creating an additional field in my schema in which a copy of my tokenised testfield has an exact property, but it nevertheless seems important to figure this out . Any suggestions, please?

By trial and error, I've found that the special characters requiring escapement in a Vespa matches query include more than just the quote " and backslash \ characters noted at https://docs.vespa.ai/en/reference/query-language-reference.html. Others (e.g. *) need double-escapement.

For now, I'm using the following Python regex substitution:

def escape_yql(text: str) -> str:
    subtext = re.sub(r'[\\"]', r"\\\g<0>", text)
    return re.sub(r'[*]', r"\\\\\g<0>", subtext)

It's used in YQL construction for an exact match regex search like this:

yql = f'select * from sources * where testfield matches "^{escape_yql(text)}$"'

However, confusingly (to me, at least) single- or double-escapement of round brackets () causes failure, with a message in the form "Could not create query from YQL: query:L1:58 no viable alternative". No error is usually raised if no escapement is applied to these characters, but not if the number of opening- and closing-brackets is unequal: in that case it becomes evident that they are in fact being parsed, with a message like this:

select * from sources * where name matches \"^TestText))$\" and bcp47_language matches \"^en$\" limit 1
"yql: [{'code': 4, 'summary': 'Invalid query parameter', 'message': \"Could not create query from YQL: Unmatched closing ')' near index 8\\n^TestText))$\\n        ^\"}]"

So I wonder which of the other special regex characters might require single- or double-escapement, something else, or none?

For various reasons I'm going to sidestep this problem by creating an additional field in my schema in which a copy of my tokenised testfield has an exact property, but it nevertheless seems important to figure this out . Any suggestions, please?

Share Improve this question asked Jan 20 at 16:21 Stephen GaddStephen Gadd 211 silver badge3 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 1

You shouldn't really need to escape if you create your strings on this format: 'select * from sd1 where f1 matches "^TestText$"'. Though, it seems the error you are seeing is related to incomplete parenthesis closing.

If you are constructing your query from Python, I can recommend checking out the newly released querybuilder-module in Pyvespa

I also added PR to pyvespa with a unit and integration test to demonstrate the usage.

Also note from docs on matches:

Regular expression match is supported using posix extended syntax, with the limitation that it is case-insensitive.

Hope this helps.

-Thomas

You should be using where testfield contains "whatever" instead. There's no good reason to force use of regexp matching if you don't have a regexp.
Furthermore, I don't think it will help adding a copy of the field with exact property, because it's the query that specifies use of regexp, so you will get the same behavior there.

本文标签: pythonEscaping special characters in a Vespa YQL matches queryStack Overflow