admin管理员组

文章数量:1410674

I am in need for a regex in Javascript. I have a string:

'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'

I want to split this string by periods such that I get an array:

[
    '*window',
    'some1',
    'some\.2',   //ignore the . because it's escaped
    '(a.b ? cc\.c : d.n [a.b, cc\.c])',  //ignore everything inside ()
    'some\.3',
    '(this.o.p ? ".mike." [ff\.])',
    'some5'
]

What regex will do this?

I am in need for a regex in Javascript. I have a string:

'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'

I want to split this string by periods such that I get an array:

[
    '*window',
    'some1',
    'some\.2',   //ignore the . because it's escaped
    '(a.b ? cc\.c : d.n [a.b, cc\.c])',  //ignore everything inside ()
    'some\.3',
    '(this.o.p ? ".mike." [ff\.])',
    'some5'
]

What regex will do this?

Share Improve this question edited Feb 19, 2012 at 18:40 outis 77.5k23 gold badges154 silver badges226 bronze badges asked Nov 5, 2011 at 20:03 user1031396user1031396 1031 silver badge7 bronze badges 5
  • What about {foo.bar}, etc... – Mark Byers Commented Nov 5, 2011 at 20:07
  • 1 What are you trying to do with this. It sounds like you want something more powerful then a regex... – hugomg Commented Nov 5, 2011 at 20:12
  • Perhaps stackoverflow./questions/812144/…? – Brad Koch Commented Nov 5, 2011 at 20:14
  • A split will always return a simple string or something in parenthesis. So I will never end up with {foo.bar} – user1031396 Commented Nov 5, 2011 at 20:31
  • 3 Friend, you are in need of a full-fledged parser... – Kenan Banks Commented Nov 5, 2011 at 20:45
Add a ment  | 

5 Answers 5

Reset to default 7
var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array

Fiddle: http://jsfiddle/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:

/             Start of RegExp literal
(?:            Create a group without reference (example: say, group A)
   \(          `(` character
   (?:         Create a group without reference (example: say, group B)
      (['"])     ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
      \)         `)` character
      \1         The character as matched at group 1, either `'` or `"`
     |          OR
      [^)]+?     Any non-`)` character, at least once (see below)
   )+          End of group (B). Let this group occur at least once
  |           OR
   \\\.        `\.` (escaped backslash and dot, because they're special chars)
  |           OR
   [^.]+?      Any non-`.` character, at least once (see below)
)+            End of group (A). Let this group occur at least once
/g           "End of RegExp, global flag"
        /*Summary: Match everything which is not satisfying the split-by-dot
                 condition as specified by the OP*/

There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.

The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.

When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:

Index 0: <Whole match>
Index 1: <Group 1>

The regex below :

result = subject.match(/(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g);

Can be used to acquire the desired results. Group 1 has the results since you want to omit the .

Use this :

var myregexp = /(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
    for (var i = 0; i < match.length; i++) {
        // matched text: match[i]
    }
    match = myregexp.exec(subject);
}

Explanation :

// (?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))
// 
// Match the regular expression below «(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))»
//    Match the regular expression below and capture its match into backreference number 1 «(\(.*?[^'"]\)|.*?[^\\])»
//       Match either the regular expression below (attempting the next alternative only if this one fails) «\(.*?[^'"]\)»
//          Match the character “(” literally «\(»
//          Match any single character that is not a line break character «.*?»
//             Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
//          Match a single character NOT present in the list “'"” «[^'"]»
//          Match the character “)” literally «\)»
//       Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^\\]»
//          Match any single character that is not a line break character «.*?»
//             Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
//          Match any character that is NOT a “A \ character” «[^\\]»
//    Match the regular expression below «(?:\.|$)»
//       Match either the regular expression below (attempting the next alternative only if this one fails) «\.»
//          Match the character “.” literally «\.»
//       Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
//          Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

It is notoriously difficult to use a Regex to do balanced parenthesis matching, especially in Javascript.

You would be way better off creating your own parser. Here's a clever way to do this that will utilize the strength of Regex's:

  • Create a Regex that matches and captures any "pattern of interest" - /(?:(\\.)|([\(\[\{])|([\)\]\}])|(\.))/g
  • Use string.replace(pattern, function (...)), and in the function, keep a count of opening braces and closing braces.
  • Add the matching text to a buffer.
  • If the split character is found and the opening and closing braces are balanced, add the buffer to your results array.

This solution will take a bit of work, and requires knowledge of closures, and you should probably see the documentation of string.replace, but I think it is a great way to solve your problem!

Update:
After noticing the number of questions related to this one, I decided to take on the above challenge.
Here is the live code to use a Regex to split a string.
This code has the following features:

  • Uses a Regex pattern to find the splits
  • Only splits if there are balanced parenthesis
  • Only splits if there are balanced quotes
  • Allows escaping of parenthesis, quotes, and splits using \

This code will work perfectly for your example.

not need regex for this work.

var s = '*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5';

console.log(s.match(/(?:\([^\)]+\)|.*?\.)/g));

output:

  ["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]

So, was working with this, and now I see that @FailedDev is rather not a failure, since that was pretty nice. :)

Anyhow, here's my solution. I'll just post the regex only.

((\(.*?((?<!")\)(?!")))|((\\\.)|([^.]))+)

Sadly this won't work in your case however, since I'm using negative lookbehind, which I don't think is supported by javascript regex engine. It should work as intended in other engines however, as can be confirmed here: http://gskinner./RegExr/. Replace with $1\n.

本文标签: javascriptRegex needed to split a string by quotquotStack Overflow