admin管理员组

文章数量:1200985

I'm using the below regex to find most of the matches I'm looking for where I'm trying to mark any 'ml' proceeded by up to a possible 9-digit number, as long as that number isn't in the 1900-2100 range. The issue I'm having, and I don't even know if this is possible in the same regex is '19750ml'. Any time there is a QUOTE character followed by a 2-digit number, I don't want that part selected. This regex is matching on the '0ml' as the value 1975 proceeds it, but in this single quote followed by the 2-digit number, I would like it to mark '750ml'. In other words, can the QUOTE followed by the 2-digit number have the highest priority? This is being used in Microsoft SQL 2019 as a function.
(?i)(?<!\b(?:19|2[01])\d(?=\d))(?<!\b(?:19|2[01])(?=\d{2}))(?<!\b(?:1(?=9\d{2}))|2(?=[01]\d{2}))(?!\b(?:19|2[01])\d\d)\d{0,9}ml$

Here are some examples and what the above regex is matching on:

Mary had a little lamb 1980750ml
Test 19819ml
Test 198218ml
Test 2123456ml
Test 20349876ml
Test 209912345ml
Test 1999123456ml
Test 987654321ml
Test '19750ml <--- This guy is my issue
Test 1988ml
Test 9999ml
Test 2000ml
Test 100ml
Test '2529ml <--- I would like it to mark '29ml' keeping the quote followed by 2 numeric digits

I'm using the below regex to find most of the matches I'm looking for where I'm trying to mark any 'ml' proceeded by up to a possible 9-digit number, as long as that number isn't in the 1900-2100 range. The issue I'm having, and I don't even know if this is possible in the same regex is '19750ml'. Any time there is a QUOTE character followed by a 2-digit number, I don't want that part selected. This regex is matching on the '0ml' as the value 1975 proceeds it, but in this single quote followed by the 2-digit number, I would like it to mark '750ml'. In other words, can the QUOTE followed by the 2-digit number have the highest priority? This is being used in Microsoft SQL 2019 as a function.
(?i)(?<!\b(?:19|2[01])\d(?=\d))(?<!\b(?:19|2[01])(?=\d{2}))(?<!\b(?:1(?=9\d{2}))|2(?=[01]\d{2}))(?!\b(?:19|2[01])\d\d)\d{0,9}ml$

Here are some examples and what the above regex is matching on:

Mary had a little lamb 1980750ml
Test 19819ml
Test 198218ml
Test 2123456ml
Test 20349876ml
Test 209912345ml
Test 1999123456ml
Test 987654321ml
Test '19750ml <--- This guy is my issue
Test 1988ml
Test 9999ml
Test 2000ml
Test 100ml
Test '2529ml <--- I would like it to mark '29ml' keeping the quote followed by 2 numeric digits

Share Improve this question edited Jan 24 at 19:23 WJS 40k4 gold badges26 silver badges44 bronze badges asked Jan 22 at 19:05 CraigCraig 3291 gold badge5 silver badges19 bronze badges 4
  • Does (?:'\d{2}|(?:190[0-9]|19[1-9][0-9]|2[01][0-9]{2})(?=\d*ml))(*SKIP)(*FAIL)|\d{,9}ml work for you? It builds on/extends the logic of my answer to your similar question. – DuesserBaest Commented Jan 22 at 20:32
  • Just because you could do something as a regex doesn't mean you should. Unless this is an interview question, you should never do this as a single pattern because it will be hard to modify or debug. – Todd A. Jacobs Commented Jan 24 at 18:46
  • Additionally, your truncation logic seems unsound. 19750ml is not 750ml, nor is the number in the range of 1900..2100 so you're just truncating numbers for some reason. What's the actual intent of truncating the leading digits? Without knowing why this is possibly a suboptimal X/Y solution. For example, why not just do a right-to-left string scan and extract the numbers? And if you can have nine digits, what should you do with a string like 1975019750ml or 175021001ml? – Todd A. Jacobs Commented Jan 24 at 18:54
  • This question is similar to: Regex to remove up to a 5 numeric characters followed by 'ml' as long as those 5 numeric chars don't fall between 1900-2199. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. – WJS Commented Jan 24 at 18:57
Add a comment  | 

2 Answers 2

Reset to default 1

You could extend the regex with these additional restrictions:

  • The match should not start immediately after a quote when the match starts with two digits.

  • The match should not start immediately after a quote and digit when the match starts with a digit.

For encoding these two constraints in the regex we can use (?<!'(?=\d\d)|'\d(?=\d)) as an additional look-behind to the ones already in the regex.

We could add this allowance (as exception to already existing rules):

  • The match may start when it is preceded by a quote and two digits.

We can encode this with (?<='\d\d) as an alternative to all other look-behind restrictions.

This leads to this regex:

(?i)(?:(?<!\b(?:19|2[01])\d(?=\d))(?<!\b(?:19|2[01])(?=\d\d))(?<!\b(?:1(?=9\d\d))|2(?=[01]\d\d))(?!\b(?:19|2[01])\d\d)(?<!'(?=\d\d)|'\d(?=\d))|(?<='\d\d))\d{0,9}ml$

See it on regex101

This is an extension in the thought process formulated in this answer:

Assuming you use a regex flavor that hat the (*SKIP)(*FAIL) keywords you cound use:

(?:'\d{2}|(?:190[0-9]|19[1-9][0-9]|2[01][0-9]{2})(?=\d*ml))(*SKIP)(*FAIL)|\d{,9}ml

See: regex101


Explanation (see also rexegg):

  1. (?: ... ): Match any of the two conditions

    • '\d{2} : that eighter the numeric range starts with an ' followed by two digits

    • | : or

    • (?:190[0-9]|19[1-9][0-9]|2[01][0-9]{2}): the numeric range 1900-2199 (regex generated based on this answer)

  • (?=\d*ml): if they proceeded any number of digits and "ml"
  • (*SKIP)(*FAIL): and discard them.

  1. | : Or
  • \d{,9}ml: match the desired 0 to 9 digits followed by "ml".

本文标签: