admin管理员组

文章数量:1278690

I'm writing code to parse a command string where the user might type a key value pair but not necessarily do it exactly the way I would prefer. Something like "key=value" would be ideal, but they might type "key = value", "key: value", "key == value", or something silly like "key ::=== = = == value". Obviously that last case is unlikely, but I'd like to be able to handle it. I also have to handle if the value is in double quotes like...

message = "this message has an = in it"

And my goal in all cases is the replace of the stuff between key and value with a single equal sign with no spaces around it so that (for example) the above message text would look like...

message="this message has an = in it"

my current regex from my attempts looks like:

([ :=]+[:=]+[ :=]*|:)(?=(?:[^'\"]*(?:'|\")[^'\"]*(?:'|\"))*[^'\"]*$)

or without the part that handles the quotes...

([ :=]+[:=]+[ :=]*|:)

That works for most cases but fails an equal sign followed by a space. I tried using the three bracket groups to force an equal or colon into the match where the middle one can't be a space. If I just use something like ([ :=]+), it matches single spaces, and forcing a number of characters would match multiple spaces without colon or equal.

I'm looking for Regex that helps clean up the mess of wonky user input and which normalizes the input to something I can rely on later.

I'm writing code to parse a command string where the user might type a key value pair but not necessarily do it exactly the way I would prefer. Something like "key=value" would be ideal, but they might type "key = value", "key: value", "key == value", or something silly like "key ::=== = = == value". Obviously that last case is unlikely, but I'd like to be able to handle it. I also have to handle if the value is in double quotes like...

message = "this message has an = in it"

And my goal in all cases is the replace of the stuff between key and value with a single equal sign with no spaces around it so that (for example) the above message text would look like...

message="this message has an = in it"

my current regex from my attempts looks like:

([ :=]+[:=]+[ :=]*|:)(?=(?:[^'\"]*(?:'|\")[^'\"]*(?:'|\"))*[^'\"]*$)

or without the part that handles the quotes...

([ :=]+[:=]+[ :=]*|:)

That works for most cases but fails an equal sign followed by a space. I tried using the three bracket groups to force an equal or colon into the match where the middle one can't be a space. If I just use something like ([ :=]+), it matches single spaces, and forcing a number of characters would match multiple spaces without colon or equal.

I'm looking for Regex that helps clean up the mess of wonky user input and which normalizes the input to something I can rely on later.

Share Improve this question edited Feb 24 at 16:22 Dharman 33.4k27 gold badges101 silver badges147 bronze badges asked Feb 24 at 16:05 KevinKevin 3141 silver badge10 bronze badges 2
  • 1 Regular expressions are really poor at distinguishing "inside" and "outside". You should write a more powerful parser. – Barmar Commented Feb 24 at 16:45
  • Something like regex101/r/sQlsK3/1 will do. – Wiktor Stribiżew Commented Feb 24 at 17:14
Add a comment  | 

4 Answers 4

Reset to default 2

Unless I'm missing something, simply using ([ :=]*[:=]+[ :=]*|:) for your sign detection (ie, make the first part optional) seems to work.

Edit: in the case in your comment, the rest of your regex was broken. Here's the version fixed enough to work:

([ :=]*[:=]+[ :=]*|:)(?=(?:[^'\"]*(?:'|\")[^'\"]*['"])*[^'\"]*$)

Regex_Replace has the ability to restore some of the elements you scan (in capturing groups), so my suggestion is to scan the entire key-value pair and put it back the way you want. Perhaps like this?

/([\w-]+)(?:\s*[:=])+\s*("[^"]+"|[^ "'=<>`:]+)

and replace with $1=$2. Isn't that simpler? You can adjust the allowed character sets for your name and for the unquoted attribute for your particular situation. I kept the name part simple (alphanumerics + underscore + dash) but used the HTML specification on the value side.

EDIT: My original had several typos - I have corrected the regex above. I think it still works - you gave an example [34 key= val -r -v key2:"this is a thing"]. I ran the corrected regex on it in Regex101 and got this result:

This leaves the [34 alone & the -r -v & the closing square bracket, but cleans up the first occurrence by removing the space and the second occurrence by replacing the : with the =. You can add any number of spaces, colons or equalsigns between key & value and it still scans it.

Try:

^("?(?:(?!\s*[=:])[^"=:])+"?)[=:\s]+

See: regex101


Explanation

  • ^: Anchors to the start of string.

  • ( ... ): Capture LHS to group 1

    • "? ... "?: something that may be in quotes
    • (?: ... )+: while matching as often as possible
      • (?!\s*[=:]): as long as it is not a space before the =,: separators
      • [^"=:]: anything but a literal ",=,:.
  • [=:\s]+: Then match all separators and spaces

and replacing with:

  • $1=: Substitute by keeping LHS (group 1) and inserting =.

Testcases added from all the other nice answers.

*Latest Edit: THIRD TIME HAS THE CHARM. THE REGEX PATTERN CORRECTED AND REPLACED WITH NEW PATTERN answer completely redone based on feedback from @The fourth bird (Thx!) Also, fixed: removed spaces previously capture at the end of key.

Approach: Capture characters in key into the first capture group and insert them into the replacement string with $1. Match any : followed by any = separated by any number of spaces before the value. Match ends at the beginning of value.

UPDATED REGEX PATTERN (PRCE2 FLavor; Flags: gm)*

pattern = /^([^=:]+[^ =:]) *(?:: *)*[=:] *(?:= *)*(?=[^ =:])/
replacement = '$1='

Regex demo: https://regex101/r/ufWHKL/3

NOTES

  • (...) Capture group. Captures and consumes characters matched by the pattern inside.

  • (?:...) Non-capture group Matches and consumes, but does not capture any characters. *(?=...) Positive lookahead Matches can captures, but does not consume any characters.

  • [...] Character class. Matches any character listed.

  • * Star quantifier. Matches the preceding character, group, 0 or more times.

  • + Plus quantifier. Matches the preceding character, group, 1 or more times.

  • [^...] Negated character class. Matches anycharacter that is not listed inside the class.

  • ^ Start from the start of a line.

  • ([^=:]+[^ =:]) Capture Group #1. Referred to in the replacement string with $1. Match anything that is not = or :, 1 or more times (+), followed by a character that is not a space , colon :, or equal sign =.

  • (?:: *)* Non-capturing group. Match : once followed by a 0 or more (*) spaces . Repeat that matched group 0 or more times (?:...)*.

  • [=:] Character class. Match one character either = or :.

  • * Match 0 or more spaces .

  • (?:= *)* Non-capturing group. Match = followed by 0 or more (*) spaces . Repeat the group match 0 or more times.

  • (?=[^ =:]) Positive lookahead. Negated character. Match but do not consume, any character that is not a space , equal sign =, or colon :.

  • replacement = '$1=' Replace match with the characters captured into the first capture group $1 followed by =.

TEST STRING

[34 key= val -r -v key2:"this is a thing]
key=value
key=                                           : # COLON NOT ALLOWED AFTER EQUAL SIGN
$key= = value
my_key = value
key5: value
key == value
key ::=== = = == value
message = "this message has an = in it"
age = 5
age == 5
age=5
age:5
age:     5
age: : : ==========5
age::::::::5
age:: = = = = == 77
age:: = = = = +10 
age::: ===== :8      # SEMICOLON NOT ALLOWED AFTER EQUAL SIGN
age::: : : : : : : :  ===== :999       # COLON NOT ALLOWED AFTER EQUAL SIGN
age=::::333       # COLON NOT ALLOWED AFTER EQUAL SIGN
age::=== === =                333
  :

RESULT

[34 key=val -r -v key2:"this is a thing]
key=value
key=                                           : # COLON NOT ALLOWED AFTER EQUAL SIGN
$key=value
my_key=value
key5=value
key=value
key=value
message="this message has an = in it"
age=5
age=5
age=5
age=5
age=5
age=5
age=5
age=77
age=+10 
age::: ===== :8      # SEMICOLON NOT ALLOWED AFTER EQUAL SIGN
age::: : : : : : : :  ===== :999       # COLON NOT ALLOWED AFTER EQUAL SIGN
age=::::333       # COLON NOT ALLOWED AFTER EQUAL SIGN
age=333
  :

本文标签: