admin管理员组

文章数量:1398831

I'm trying to extract the 'abc-def' part of the URLs below. I can do it with two different regex patterns. (see examples) Is it possible to write one regex that works for all cases below?

(BigQuery doesn't seem to support lookback)

SELECT REGEXP_EXTRACT('/', r'([^/]+)/?$')
UNION ALL

SELECT REGEXP_EXTRACT('', r'([^/]+)/?$')
UNION ALL

SELECT REGEXP_EXTRACT('', r'([^/]+)/?[$|\?]')
UNION ALL

SELECT REGEXP_EXTRACT('/?p=294', r'([^/]+)/?[$|\?]')
UNION ALL

SELECT REGEXP_EXTRACT('/?p=294', r'([^/]+)/?[$|\?]')

Expected output 'abc-def'

I'm trying to extract the 'abc-def' part of the URLs below. I can do it with two different regex patterns. (see examples) Is it possible to write one regex that works for all cases below?

(BigQuery doesn't seem to support lookback)

SELECT REGEXP_EXTRACT('https://www.example/post/abc-def/', r'([^/]+)/?$')
UNION ALL

SELECT REGEXP_EXTRACT('https://www.example/post/abc-def', r'([^/]+)/?$')
UNION ALL

SELECT REGEXP_EXTRACT('https://www.example/post/abc-def?p=294', r'([^/]+)/?[$|\?]')
UNION ALL

SELECT REGEXP_EXTRACT('https://www.example/post/abc-def/?p=294', r'([^/]+)/?[$|\?]')
UNION ALL

SELECT REGEXP_EXTRACT('http://www.example/abc-def/?p=294', r'([^/]+)/?[$|\?]')

Expected output 'abc-def'

Share Improve this question edited Mar 26 at 22:39 Barmar 784k57 gold badges548 silver badges660 bronze badges asked Mar 26 at 22:35 David FricksDavid Fricks 113 bronze badges 0
Add a comment  | 

2 Answers 2

Reset to default 0

Note that [$|\?] matches either a $, | or ? chars since [...] specifies a character class.

Using REGEXP_EXTRACT that only returns the first match from the given input string, you may use the ([^/?]+)/?(?:$|\?) regex:

REGEXP_EXTRACT(col, r'([^/?]+)/?(?:$|\?)')

Details

  • ([^/?]+) - Group 1:
  • /? - an optional / symbol
  • (?:$|\?) - a non-capturing group matching either end of string or a ? char.

If you want to test the pattern at regex101, make sure you test against each input individually, not a multiline string.

Another solution is using the ^(?:.*/)?([^/?]+) pattern (add \n into the negated character class when testing at regex101).

  • ^ - start of string
  • (?:.*/)? - an optional sequence of any zero or more chars as many as possible followed with a / char
  • ([^/?]+) - Group 1: any one or more chars other than / and ?.

Try

REGEXP_EXTRACT(url, r'.*/([^/?]+?)')
  • .*/ skips over everything until the / before the last word
  • ([^/?]+?) captures the last word, everything up to a following / or ?.

本文标签: sqlBigQuery REGEXPEXTRACT end of URL without unsupported lookbackStack Overflow