admin管理员组

文章数量:1335623

Here's a code that retrieves information about package imports in a LaTeX file. I fail to catch the optional dates in square brackets. How can I do this?

import re

test_str = r"""
\RequirePackage[
  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded
]{geometry}%
 [2020-01-02]

\RequirePackage{tocbasic}

\RequirePackage[svgnames]%
               {xcolor}%
               [2023/11/15]

\RequirePackage[raggedright]%  OK?
                {titlesec}

\RequirePackage{xcolor}%
               [2022/06/12]

\RequirePackage{hyperref}% To load after titlesec!
               [2023-02-07]
    """

pattern = repile(
    r"\\RequirePackage(\[(.*?)\])?([^{]*?)?{(.*?)}",
    flags = re.S
)

matches = pattern.finditer(test_str)

for m in matches:
    print('---')

    for i in [0, 2, 4]:
        print(f"m.group({i}):")
        print(m.group(i))
        print()

Here is the actual output.

---
m.group(0):
\RequirePackage[
  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded
]{geometry}

m.group(2):

  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded


m.group(4):
geometry

---
m.group(0):
\RequirePackage{tocbasic}

m.group(2):
None

m.group(4):
tocbasic

---
m.group(0):
\RequirePackage[svgnames]%
               {xcolor}

m.group(2):
svgnames

m.group(4):
xcolor

---
m.group(0):
\RequirePackage[raggedright]%  OK?
                {titlesec}

m.group(2):
raggedright

m.group(4):
titlesec

---
m.group(0):
\RequirePackage{xcolor}

m.group(2):
None

m.group(4):
xcolor

---
m.group(0):
\RequirePackage{hyperref}

m.group(2):
None

m.group(4):
hyperref

Here's a code that retrieves information about package imports in a LaTeX file. I fail to catch the optional dates in square brackets. How can I do this?

import re

test_str = r"""
\RequirePackage[
  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded
]{geometry}%
 [2020-01-02]

\RequirePackage{tocbasic}

\RequirePackage[svgnames]%
               {xcolor}%
               [2023/11/15]

\RequirePackage[raggedright]%  OK?
                {titlesec}

\RequirePackage{xcolor}%
               [2022/06/12]

\RequirePackage{hyperref}% To load after titlesec!
               [2023-02-07]
    """

pattern = repile(
    r"\\RequirePackage(\[(.*?)\])?([^{]*?)?{(.*?)}",
    flags = re.S
)

matches = pattern.finditer(test_str)

for m in matches:
    print('---')

    for i in [0, 2, 4]:
        print(f"m.group({i}):")
        print(m.group(i))
        print()

Here is the actual output.

---
m.group(0):
\RequirePackage[
  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded
]{geometry}

m.group(2):

  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded


m.group(4):
geometry

---
m.group(0):
\RequirePackage{tocbasic}

m.group(2):
None

m.group(4):
tocbasic

---
m.group(0):
\RequirePackage[svgnames]%
               {xcolor}

m.group(2):
svgnames

m.group(4):
xcolor

---
m.group(0):
\RequirePackage[raggedright]%  OK?
                {titlesec}

m.group(2):
raggedright

m.group(4):
titlesec

---
m.group(0):
\RequirePackage{xcolor}

m.group(2):
None

m.group(4):
xcolor

---
m.group(0):
\RequirePackage{hyperref}

m.group(2):
None

m.group(4):
hyperref
Share Improve this question edited Nov 20, 2024 at 8:43 projetmbc asked Nov 19, 2024 at 23:17 projetmbcprojetmbc 1,4621 gold badge13 silver badges27 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 1

You could update the pattern using negated character classes and omit the flags = re.S

\\RequirePackage(\[([^][]*)\])?([^{]*){([^{}]*)}.*(?:\n\s*\[([^][]*)])?

The pattern matches:

  • \\RequirePackage Match \RequirePackage
  • (\[([^][]*)\])? Optionally capture [...]
  • ([^{]*) Capture optional chars other than {
  • {([^{}]*)} Capture what is between {...}
  • .* Match the rest of the line
  • (?: Non capture group
    • \n\s*\[([^][]*)] Match a newline, optional whitespace chars and then capture what is between [...]
  • )? Close the non capture group and make it optional

See a regex 101 demo and a Python demo.


If you are only interested in group 2, 4 and the added group 5 then you can omit 2 capture groups which are not interesting use 3 capture groups in total in the regex:

\\RequirePackage(?:\[([^][]*)\])?[^{]*{([^{}]*)}.*(?:\n\s*\[([^][]*)])?

See the group values in the regex101 demo and another Python demo

I've added ?\[([0-9\-\/]*?)\] to your regex, so the final result is:

r"\\RequirePackage(\[(.*?)\])?([^{]*?)?{(.*?)}?\[(.*?)\]"

However, it's in the first and 6th matching group. (0 and 5). I don't know if you need it in 0.

import re

test_str = r"""
\RequirePackage[
  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded
]{geometry}%
 [2020-01-02]

\RequirePackage{tocbasic}

\RequirePackage[svgnames]%
               {xcolor}%
               [2023/11/15]

\RequirePackage[raggedright]%  OK?
                {titlesec}

\RequirePackage{xcolor}%
               [2022/06/12]

\RequirePackage{hyperref}% To load after titlesec!
               [2023-02-07]
    """

pattern = repile(
    r"\\RequirePackage(\[(.*?)\])?([^{]*?)?{(.*?)}?\[([0-9\-\/]*?)\]", # edited this line
    flags = re.S
)

matches = pattern.finditer(test_str)

for m in matches:
    print('---')

    for i in [0, 2, 4, 5]: # Added 5
        print(f"m.group({i}):")
        print(m.group(i))
        print()

Here's the output:

---
m.group(0):
\RequirePackage[
  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded
]{geometry}%
 [2020-01-02]

m.group(2):

  top            = 2.5cm,
  bottom         = 2.5cm,
  left           = 2.5cm,
  right          = 2.5cm,
  marginparwidth = 2cm,
  marginparsep   = 2mm,
  heightrounded


m.group(4):
geometry}%
 

m.group(5):
2020-01-02

---
m.group(0):
\RequirePackage{tocbasic}

\RequirePackage[svgnames]%
               {xcolor}%
               [2023/11/15]

m.group(2):
None

m.group(4):
tocbasic}

\RequirePackage[svgnames]%
               {xcolor}%
               

m.group(5):
2023/11/15

---
m.group(0):
\RequirePackage[raggedright]%  OK?
                {titlesec}

\RequirePackage{xcolor}%
               [2022/06/12]

m.group(2):
raggedright

m.group(4):
titlesec}

\RequirePackage{xcolor}%
               

m.group(5):
2022/06/12

---
m.group(0):
\RequirePackage{hyperref}% To load after titlesec!
               [2023-02-07]

m.group(2):
None

m.group(4):
hyperref}% To load after titlesec!
               

m.group(5):
2023-02-07


本文标签: python 3xCatching optional content just after a new lineStack Overflow