admin管理员组

文章数量:1402836

Struggling with regex and regex AI isn't helping.

What I need to get back is:

Group 1 - All text to before the 3rd numeric
Group 2 - All text from the 3rd numeric to before the 5th numeric
Group 3 - All text from the 5th numeric to before the 7th numeric
Group 4 - All text from the 7th numeric to before the 11th numeric
Group 5 - All text from the 11th numeric to before the 13th numeric
Group 6 - All text from the 13th numeric to the end of the string  

The tricky bit is there may be HTML code anywhere between the numbers.

The below PHP is kiiiinda working:

<?php
$gsPattern = '/([^\d]+)(\d{2})(.*?)(\d{2})(.*?)(\d{2})(.*?)(\d{4})(.*?)(\d{2})(.*?)(\d{2})(.*)/i';
$gsReplacement = '${1}${2}-${3}${4}-${5}${6}-${7}${8}-${9}${10}-${11}${12}${13}';

$gsRowCode = "<td>160301<b style=\"color: red;\">1234</b>2525</td>";
echo preg_replace($gsPattern, $gsReplacement, $gsRowCode);

echo "<br>";
$gsRowCode = "<td>16030<b style=\"color: red;\">1123</b>42525</td>";
echo preg_replace($gsPattern, $gsReplacement, $gsRowCode);

The following hopefully shows the output I'm after:

Output<br>
1.<br>
16-03-01-<b style="color: red;">1234-</b>25-25 <br>
<br>
2.<br>
16030<b style="color: red;">1123</b>42525 <br>
<br>
Would like it to output as: <br>
16-03-0<b style="color: red;">1-123</b>4-25-25 <br>

Struggling with regex and regex AI isn't helping.

What I need to get back is:

Group 1 - All text to before the 3rd numeric
Group 2 - All text from the 3rd numeric to before the 5th numeric
Group 3 - All text from the 5th numeric to before the 7th numeric
Group 4 - All text from the 7th numeric to before the 11th numeric
Group 5 - All text from the 11th numeric to before the 13th numeric
Group 6 - All text from the 13th numeric to the end of the string  

The tricky bit is there may be HTML code anywhere between the numbers.

The below PHP is kiiiinda working:

<?php
$gsPattern = '/([^\d]+)(\d{2})(.*?)(\d{2})(.*?)(\d{2})(.*?)(\d{4})(.*?)(\d{2})(.*?)(\d{2})(.*)/i';
$gsReplacement = '${1}${2}-${3}${4}-${5}${6}-${7}${8}-${9}${10}-${11}${12}${13}';

$gsRowCode = "<td>160301<b style=\"color: red;\">1234</b>2525</td>";
echo preg_replace($gsPattern, $gsReplacement, $gsRowCode);

echo "<br>";
$gsRowCode = "<td>16030<b style=\"color: red;\">1123</b>42525</td>";
echo preg_replace($gsPattern, $gsReplacement, $gsRowCode);

The following hopefully shows the output I'm after:

Output<br>
1.<br>
16-03-01-<b style="color: red;">1234-</b>25-25 <br>
<br>
2.<br>
16030<b style="color: red;">1123</b>42525 <br>
<br>
Would like it to output as: <br>
16-03-0<b style="color: red;">1-123</b>4-25-25 <br>

Share Improve this question edited Mar 21 at 3:41 Tangentially Perpendicular 5,3994 gold badges14 silver badges33 bronze badges asked Mar 21 at 3:27 AnrikAnrik 594 bronze badges 5
  • stackoverflow/q/3577641/62576 – Ken White Commented Mar 21 at 3:35
  • 3 Trying to parse HTML with regex is fraught with difficulties (See this answer). Consider using an HTML parser like DOMDocument that will allow you to separate the text from the HTML reliably, and work with the text in isolation. – Tangentially Perpendicular Commented Mar 21 at 3:37
  • 1 you appear to have mixed input, commentary and ouput into a single code block making it impossible to know what is input, what is output, what is commentary. You define six groups, so your ouput should show six groups. You also need to define what you mean by numeric and numbers. – jhnc Commented Mar 21 at 5:20
  • 1 The 6th number can include 1 or 2 digits, and you require 2 with your pattern. It also looks as if you do not want to add hyphens around tags in the result, which is not possible to do with a single replace operation since the engine does not "know" if it is a tag or plain text. So, you will need two operations here. But before we complicate it even further, try 3v4l./GeVNq and regex101/r/YpEnX7/1 – Wiktor Stribiżew Commented Mar 21 at 8:23
  • Let's consider context. 1. The source is or at least part of it is a HTML <table>. 2. The data in the <table> is fundamentally flawed or is being misinterpreted. 3. If #2 is incorrect then you have a HTML <table> and you need the highlighted (eg. <b>) part of each cell (eg. <td>) to shift to the left by a single character. If #1 and #3 are correct there's a far better way of dealing with your problem using JavaScript and approaching the source as DOM. If #1 and #2 are correct...? – zer00ne Commented Mar 23 at 0:21
Add a comment  | 

3 Answers 3

Reset to default 2

APPROACH:

  • To help with clarity: I captured every digit separately into a named capture group ((?P<n1>...), (?P<n3>...), (?P<n3>...), etc.).

  • And, I captured every non-digit string before the first digit (<?P<string1_beginning>...)), between digits ((?P<string2>...), (?P<string3>...), (?P<string4>...), etc.) and after the last digit ((?P<string15_end>...)) into named capture groups.

  • The numbers in the names of the non-digit-string capture groups P<string_n) and digit capture groups (P<n_n>) match.

  • In the replacement string, I build the desired outcome string placing the digit capture groups and string capture groups in to produce the desired outcome. For example, when the named capture group ${string6} returns <b style=\"color: red;\">, the group ${string_7} returns [EMPTY], and vice versa, resulting in the desired outcome.

  • I added a capture group between each digit. This creates flexibility and allows me to select the possible string-group numbers for the replacement string based on the digit location, as you can see in the suggested replacement string below.

  • There will be an issue if there are other digits between the digits that we are looking to capture, for example the color is in rgb-format including numbers. This, however, that was not part of the scope of the question.

REGEX PATTERN AND REPLACEMENT STRING(PRCE2 flavor):

$gsPattern = '^(?P<string1_beginning>[^\d]*)(?P<n1>\d)(?P<string2>[^\d]*)(?P<n2>\d)(?P<string3>[^\d]*)(?P<n3>\d)(?P<string4>[^\d]*)(?P<n4>\d)(?P<string5>[^\d]*)(?P<n5>\d)(?P<string6>[^\d]*)(?P<n6>\d)(?P<string7>[^\d]*)(?P<n7>\d)(?P<string8>[^\d]*)(?P<n8>\d)(?P<string9>[^\d]*)(?P<n9>\d)(?P<string10>[^\d]*)(?P<n10>\d)(?P<string11>[^\d]*)(?P<n11>\d)(?P<string12>[^\d]*)(?P<n12>\d)(?P<string13>[^\d]*)(?P<n13>\d)(?P<string14>[^\d]*)(?P<n14>\d)(?P<string15_end>[^\d]*$)'
$gsReplacement = '<br>${n1}${n2}-${n3}${n4}-${n5}${string6}${string7}${n6}-${n7}${n8}${n9}${string10}${string11}${n10}-${n11}${n12}-${n13}${n14} <br>'

Regex Demo: https://regex101/r/EmP12l/4

INPUT:

1: $gsRowCode = "<td>160301<b style=\"color: red;\">1234</b>2525</td>";
2: $gsRowCode = "<td>16030<b style=\"color: red;\">1123</b>42525</td>";

OUTPUT:

1: <br>16-03-0<b style=\"color: red;\">1-123</b>4-25-25 <br>
2: <br>16-03-0<b style=\"color: red;\">1-123</b>4-25-25 <br>

DESIRED OUTPUT:

   <br>16-03-0<b style="color: red;">1-123</b>4-25-25 <br>

REGEX NOTES:

  • ^ Match beginning of the string.
  • (?P<string1_beginning>[^\d]*) Begin Named capture group (?P<name>...). Negated character class [^...]. Matches any character that is NOT a digit \d 0 or more times (*). In the replacement string, the string captured in this group would be retrieved with using $<string1_beginning>.
  • (?P<n1>\d) Named capture group (?P<name>...). Matches one digit \d. In the replacement string, the string captured in this group is retrieved with using $<n1>.
  • The same pattern repeats for to capture a total of 14 digits, and 15 non-digit strings.
  • $ Matches end of string.

Basically its just getting finer granularity digit capture around the color tag to shift the digits over by 1 place.

This is a ECMAScript example. It could be further shrunk if using Pcre style eng.

^.*?(\d{2}).*?(\d{2}).*?(?=\d*(<b[^>]*?color[^>]*?>)\d{4}(</b>))(?:(\d)\3(\d)(\d{3})\4(\d)|(\d)(\d)\3(\d{3})(\d)\4).*?(\d{2}).*?(\d{2}).*

Replace $1-$2-$5$9$3$6$10-$7$11$4$8$12-$13-$14

https://regex101/r/mJuvUJ/1

Below seems to work ok:

$gsPattern = '/^(.*?\d.*?\d)(.*?\d.*?\d)(.*?\d.*?\d)(.*?\d.*?\d.*?\d.*?\d)(.*?\d.*?\d)(.*)/i';
$gsReplacement = '${1}-${2}-${3}-${4}-${5}-${6}';

本文标签: regexapply format to number potential HTML between the numbersStack Overflow