admin管理员组

文章数量:1279214

I need to split the string "thisIs12MyString" to an array looking like [ "this", "Is", "12", "My", "String" ]

I've got so far as to "thisIs12MyString".split(/(?=[A-Z0-9])/) but it splits on each digit and gives the array [ "this", "Is", "1", "2", "My", "String" ]

So in words I need to split the string on upper case letter and digits that does not have an another digit in front of it.

I need to split the string "thisIs12MyString" to an array looking like [ "this", "Is", "12", "My", "String" ]

I've got so far as to "thisIs12MyString".split(/(?=[A-Z0-9])/) but it splits on each digit and gives the array [ "this", "Is", "1", "2", "My", "String" ]

So in words I need to split the string on upper case letter and digits that does not have an another digit in front of it.

Share Improve this question asked Feb 26, 2012 at 13:20 Jesper PalmJesper Palm 7,24833 silver badges36 bronze badges 5
  • @Jesper Oh, it's actually pretty tricky, because javascript does not handle lookbehinds. – Mikulas Dite Commented Feb 26, 2012 at 13:30
  • Is it important that you do this with a single regex? Why not break it down into two parts? It will probably increase readability. – davin Commented Feb 26, 2012 at 13:32
  • I don't think you can do this with a lookahead or even one expression along (that said, I would not consider myself as a regex expert). One idea could be to insert a special character before each sequence of digits and split by this character too. – Felix Kling Commented Feb 26, 2012 at 13:33
  • @MikulasDite Thanks for trying I got the feeling this one wasn't totally easy. Probably easier to change my naming convention. – Jesper Palm Commented Feb 26, 2012 at 13:34
  • It's also quite unclear in one regex: why should 1 and 2 group together but not 1, 2 and M? what would happen with more than one upper-case letter next to another? Perhaps 2 splits is the way to go. – cmbuckley Commented Feb 26, 2012 at 13:36
Add a ment  | 

5 Answers 5

Reset to default 9

Are you looking for this?

"thisIs12MyString".match(/[A-Z]?[a-z]+|[0-9]+/g)

returns

["this", "Is", "12", "My", "String"]

As I said in my ment, my approach would be to insert a special character before each sequence of digits first, as a marker:

"thisIs12MyString".replace(/\d+/g, '~$&').split(/(?=[A-Z])|~/)

where ~ could be any other character, preferably a non-printable one (e.g. a control character), as it is unlikely to appear "naturally" in a string.

In that case, you could even insert the marker before each capital letter as well, and omit the lookahead, making the split very easy:

"thisIs12MyString".replace(/\d+|[A-Z]/g, '~$&').split('~')

It might or might not perform better.

In my rhino console,

js> "thisIs12MyString".replace(/([A-Z]|\d+)/g, function(x){return " "+x;}).split(/ /);
this,Is,12,My,String

another one,

js> "thisIs12MyString".split(/(?:([A-Z]+[a-z]+))/g).filter(function(a){return  a;});
this,Is,12,My,String

You can fix the JS missing of lookbehinds working on the array split using your current regex.
Quick pseudo code:

var result = [];
var digitsFlag = false;
"thisIs12MyString".split(/(?=[A-Z0-9])/).forEach(function(word) {

    if (isSingleDigit(word)) {
        if (!digitsFlag) {
            result.push(word);
        } else {
            result[result.length - 1] += word;
        }
        digitsFlag = true;
    } else {
        result.push(word);
        digitsFlag = false;
    }

});

I can't think of any ways to achieve this with a RegEx.

I think you will need to do it in code.

Please check the URL, same question different language (ruby) ->

The code is at the bottom: http://code.activestate./recipes/440698-split-string-on-capitalizeduppercase-char/

本文标签: javascriptRegex split on upper case and first digitStack Overflow