admin管理员组

文章数量:1123232

Introduction: I have a Markdown file which an external mechanism will update. My concern is that this file will have "empty blocks" (see below) that I would like to purge. The problem I am encountering is that my regex requires an anchor into other such blocks and I remove only every second block.

Details: The blocks are daily notes and everyday at midnight a new "block for the day" gets added. For example today this would have been prepended to the file (it will be at the very top of the file):


# ⬆ 2025-01-10 ⬆ Friday
---

If I do not add any notes today, tomorrow I will have:


# ⬆ 2025-01-11 ⬆ Saturday
---

# ⬆ 2025-01-10 ⬆ Friday
---

This leaves an empty block for Friday, which I want to remove.

To this, I used the following regex in Python (daily is the content of the file (a string)):

daily: str = re.sub(
                r"(#\ ⬆\ \d\d\d\d\-\d\d\-\d\d\ ⬆.*\n\-\-\-)(\s*#\ ⬆\ \d\d\d\d\-\d\d\-\d\d\ ⬆.*\n\-\-\-)",
                r"\1",
                daily,
            )

An "empty block" is the date entry and above it only blank characters (space & new line), up to the next block. What the code does is that it matches two groups: the next block (empty or not, we don't know), and blanks + the daily stamp (and a MD line). It then keeps only the first group.

Here is a Regex101 entry that shows my problem: only every second "empty block" is removed.

This is because in my definition I need to anchor to the block before, and this block is not tested for emptiness anymore (because it has already been matched).

I will look at running the regex twice (there are no performance concerns) but I was wondering if

  • I can define my "empty block" without reaching to the block before
  • or I can use a "temporary anchor" where the previous block is still in the game as the regex parses the string
  • or something else.

本文标签: pythonHow to quotreplayquot elements of a string processed by regexStack Overflow