javascript - regex to parse string with escaped characters - Stack Overflow

IT技术

更新时间：2025-03-060

admin管理员组
文章数量:1287608

I am reading information out of a formatted string. The format looks like this:

"foo:bar:beer:123::lol"

Everything between the ":" is data I want to extract with regex. If a : is followed by another : (like "::") the data for this has to be "" (an empty string).

Currently I am parsing it with this regex:

(.*?)(:|$)

Now it came to my mind that ":" may exist within the data, as well. So it has to be escaped. Example:

"foo:bar:beer:\::1337"

How can I change my regular expression so that it matches the "\:" as data, too?

Edit: I am using JavaScript as programming language. It seems to have some limitations regarding plex regulat expressions. The solution should work in JavaScript, as well.

Thanks, McFarlane

I am reading information out of a formatted string. The format looks like this:

"foo:bar:beer:123::lol"

Everything between the ":" is data I want to extract with regex. If a : is followed by another : (like "::") the data for this has to be "" (an empty string).

Currently I am parsing it with this regex:

(.*?)(:|$)

Now it came to my mind that ":" may exist within the data, as well. So it has to be escaped. Example:

"foo:bar:beer:\::1337"

How can I change my regular expression so that it matches the "\:" as data, too?

Edit: I am using JavaScript as programming language. It seems to have some limitations regarding plex regulat expressions. The solution should work in JavaScript, as well.

Thanks, McFarlane

Share Improve this question edited Apr 18, 2012 at 12:05 asked Apr 18, 2012 at 11:47 McFarlane 1,8772 gold badges23 silver badges39 bronze badges

Add a ment |

3 Answers 3

Sorted by: Reset to default 4

var myregexp = /((?:\\.|[^\\:])*)(?::|$)/g;
var match = myregexp.exec(subject);
while (match != null) {
    for (var i = 0; i < match.length; i++) {
        // Add match[1] to the list of matches
    }
    match = myregexp.exec(subject);
}

Input: "foo:bar:beer:\\:::1337"

Output: ["foo", "bar", "beer", "\\:", "", "1337", ""]

You'll always get an empty string as the last match. This is unavoidable given the requirement that you also want empty strings to match between delimiters (and the lack of lookbehind assertions in JavaScript).

Explanation:

(          # Match and capture:
 (?:       # Either match...
  \\.      # an escaped character
 |         # or
  [^\\:]   # any character except backslash or colon
 )*        # zero or more times
)          # End of capturing group
(?::|$)    # Match (but don't capture) a colon or end-of-string

Here's a solution:

function tokenize(str) {
  var reg = /((\\.|[^\\:])*)/g;
  var array = [];
  while(reg.lastIndex < str.length) {
    match = reg.exec(str);
    array.push(match[0].replace(/\\(\\|:)/g, "$1"));
    reg.lastIndex++;
  }
  return array;
}

It splits a string into token depending on the : character.

But you can escape the : character with \ if you want it to be part of a token.
you can escape the \ with \ if you want it to be part of a token
any other \ won't be interpreted. (ie: \a remains \a)
So you can put any data in your tokens provided that data is correctly formatted before hand.

Here is an example with the string \a:b:\n::\\:\::x, which should give these token: \a, b, \n, <empty string>, \, :, x.

>>> tokenize("\\a:b:\\n::\\\\:\\::x");
["\a", "b", "\n", "", "\", ":", "x"]

In an attempt to be clearer: the string put into the tokenizer will be interpreted, it has 2 special character: \ and :

\ will only have a special meaning only if followed by \ or :, and will effectively "escape" these character: meaning that they will loose their special meaning for tokenizer, and they'll be considered as any normal character (and thus will be part of tokens).
: is the marker separating 2 tokens.

I realize the OP didn't ask for slash escaping, but other viewers could need a plete parsing library allowing any character in data.

Use a negative lookbehind assertion.

(.*?)((?<!\\):|$)

This will only match : if it's not preceded by \.

本文标签： javascriptregex to parse string with escaped charactersStack Overflow

版权声明：本文标题：javascript - regex to parse string with escaped characters - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741244871a2364682.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

javascript - regex to parse string with escaped characters - Stack Overflow

3 Answers 3

更多相关文章