admin管理员组

文章数量:1129424

I am trying to parse url-encoded strings that are made up of key=value pairs separated by either & or &.

The following will only match the first occurrence, breaking apart the keys and values into separate result elements:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342', 'Adam%20Franco']

Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/g)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342=Adam%20Franco', '&348572=Bob%20Jones']

While I could split the string on & and break apart each key/value pair individually, is there any way using JavaScript's regular expression support to match multiple occurrences of the pattern /(?:&|&)?([^=]+)=([^&]+)/ similar to PHP's preg_match_all() function?

I'm aiming for some way to get results with the sub-matches separated like:

[['1111342', '348572'], ['Adam%20Franco', 'Bob%20Jones']]

or

[['1111342', 'Adam%20Franco'], ['348572', 'Bob%20Jones']]

I am trying to parse url-encoded strings that are made up of key=value pairs separated by either & or &.

The following will only match the first occurrence, breaking apart the keys and values into separate result elements:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342', 'Adam%20Franco']

Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/g)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342=Adam%20Franco', '&348572=Bob%20Jones']

While I could split the string on & and break apart each key/value pair individually, is there any way using JavaScript's regular expression support to match multiple occurrences of the pattern /(?:&|&)?([^=]+)=([^&]+)/ similar to PHP's preg_match_all() function?

I'm aiming for some way to get results with the sub-matches separated like:

[['1111342', '348572'], ['Adam%20Franco', 'Bob%20Jones']]

or

[['1111342', 'Adam%20Franco'], ['348572', 'Bob%20Jones']]
Share Improve this question edited Feb 6, 2009 at 15:25 Adam Franco asked Feb 6, 2009 at 15:07 Adam FrancoAdam Franco 85.5k4 gold badges38 silver badges39 bronze badges 2
  • 10 it's a little odd that no one recommended using replace here. var data = {}; mystring.replace(/(?:&|&)?([^=]+)=([^&]+)/g, function(a,b,c,d) { data[c] = d; }); done. "matchAll" in JavaScript is "replace" with a replacement handler function instead of a string. – Mike 'Pomax' Kamermans Commented Feb 11, 2014 at 19:21
  • Note that for those still finding this question in 2020, the answer is "don't use regex, use URLSearchParams, which does all of this for you." – Mike 'Pomax' Kamermans Commented Jan 23, 2020 at 22:19
Add a comment  | 

15 Answers 15

Reset to default 171

Hoisted from the comments

2020 comment: rather than using regex, we now have URLSearchParams, which does all of this for us, so no custom code, let alone regex, are necessary anymore.

– Mike 'Pomax' Kamermans

Browser support is listed here https://caniuse.com/#feat=urlsearchparams


I would suggest an alternative regex, using sub-groups to capture name and value of the parameters individually and re.exec():

function getUrlParams(url) {
  var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
      match, params = {},
      decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};

  if (typeof url == "undefined") url = document.location.href;

  while (match = re.exec(url)) {
    params[decode(match[1])] = decode(match[2]);
  }
  return params;
}

var result = getUrlParams("http://maps.google.de/maps?f=q&source=s_q&hl=de&geocode=&q=Frankfurt+am+Main&sll=50.106047,8.679886&sspn=0.370369,0.833588&ie=UTF8&ll=50.116616,8.680573&spn=0.35972,0.833588&z=11&iwloc=addr");

result is an object:

{
  f: "q"
  geocode: ""
  hl: "de"
  ie: "UTF8"
  iwloc: "addr"
  ll: "50.116616,8.680573"
  q: "Frankfurt am Main"
  sll: "50.106047,8.679886"
  source: "s_q"
  spn: "0.35972,0.833588"
  sspn: "0.370369,0.833588"
  z: "11"
}

The regex breaks down as follows:

(?:            # non-capturing group
  \?|&         #   "?" or "&"
  (?:amp;)?    #   (allow "&", for wrongly HTML-encoded URLs)
)              # end non-capturing group
(              # group 1
  [^=&#]+      #   any character except "=", "&" or "#"; at least once
)              # end group 1 - this will be the parameter's name
(?:            # non-capturing group
  =?           #   an "=", optional
  (            #   group 2
    [^&#]*     #     any character except "&" or "#"; any number of times
  )            #   end group 2 - this will be the parameter's value
)              # end non-capturing group

You need to use the 'g' switch for a global search

var result = mystring.match(/(&|&)?([^=]+)=([^&]+)/g)

2020 edit

Use URLSearchParams, as this job no longer requires any kind of custom code. Browsers can do this for you with a single constructor:

const str = "1111342=Adam%20Franco&348572=Bob%20Jones";
const data = new URLSearchParams(str);
for (pair of data) console.log(pair)

yields

Array [ "1111342", "Adam Franco" ]
Array [ "348572", "Bob Jones" ]

So there is no reason to use regex for this anymore.

Original answer

If you don't want to rely on the "blind matching" that comes with running exec style matching, JavaScript does come with match-all functionality built in, but it's part of the replace function call, when using a "what to do with the capture groups" handling function:

var data = {};

var getKeyValue = function(fullPattern, group1, group2, group3) {
  data[group2] = group3;
};

mystring.replace(/(?:&|&)?([^=]+)=([^&]+)/g, getKeyValue);

done.

Instead of using the capture group handling function to actually return replacement strings (for replace handling, the first arg is the full pattern match, and subsequent args are individual capture groups) we simply take the groups 2 and 3 captures, and cache that pair.

So, rather than writing complicated parsing functions, remember that the "matchAll" function in JavaScript is simply "replace" with a replacement handler function, and much pattern matching efficiency can be had.

For capturing groups, I'm used to using preg_match_all in PHP and I've tried to replicate it's functionality here:

<script>

// Return all pattern matches with captured groups
RegExp.prototype.execAll = function(string) {
    var match = null;
    var matches = new Array();
    while (match = this.exec(string)) {
        var matchArray = [];
        for (i in match) {
            if (parseInt(i) == i) {
                matchArray.push(match[i]);
            }
        }
        matches.push(matchArray);
    }
    return matches;
}

// Example
var someTxt = 'abc123 def456 ghi890';
var results = /[a-z]+(\d+)/g.execAll(someTxt);

// Output
[["abc123", "123"],
 ["def456", "456"],
 ["ghi890", "890"]]

</script>

Set the g modifier for a global match:

/…/g

Source:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

Finding successive matches

If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property). For example, assume you have this script:

var myRe = /ab*/g;
var str = 'abbcdefabh';
var myArray;
while ((myArray = myRe.exec(str)) !== null) {
  var msg = 'Found ' + myArray[0] + '. ';
  msg += 'Next match starts at ' + myRe.lastIndex;
  console.log(msg);
}

This script displays the following text:

Found abb. Next match starts at 3
Found ab. Next match starts at 912

Note: Do not place the regular expression literal (or RegExp constructor) within the while condition or it will create an infinite loop if there is a match due to the lastIndex property being reset upon each iteration. Also be sure that the global flag is set or a loop will occur here also.

Hеllo from 2020. Let me bring String.prototype.matchAll() to your attention:

let regexp = /(?:&|&amp;)?([^=]+)=([^&]+)/g;
let str = '1111342=Adam%20Franco&348572=Bob%20Jones';

for (let match of str.matchAll(regexp)) {
    let [full, key, value] = match;
    console.log(key + ' => ' + value);
}

Outputs:

1111342 => Adam%20Franco
348572 => Bob%20Jones

If someone (like me) needs Tomalak's method with array support (ie. multiple select), here it is:

function getUrlParams(url) {
  var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
      match, params = {},
      decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};

  if (typeof url == "undefined") url = document.location.href;

  while (match = re.exec(url)) {
    if( params[decode(match[1])] ) {
        if( typeof params[decode(match[1])] != 'object' ) {
            params[decode(match[1])] = new Array( params[decode(match[1])], decode(match[2]) );
        } else {
            params[decode(match[1])].push(decode(match[2]));
        }
    }
    else
        params[decode(match[1])] = decode(match[2]);
  }
  return params;
}
var urlParams = getUrlParams(location.search);

input ?my=1&my=2&my=things

result 1,2,things (earlier returned only: things)

Just to stick with the proposed question as indicated by the title, you can actually iterate over each match in a string using String.prototype.replace(). For example the following does just that to get an array of all words based on a regular expression:

function getWords(str) {
  var arr = [];
  str.replace(/\w+/g, function(m) {
    arr.push(m);
  });
  return arr;
}

var words = getWords("Where in the world is Carmen Sandiego?");
// > ["Where", "in", "the", "world", "is", "Carmen", "Sandiego"]

If I wanted to get capture groups or even the index of each match I could do that too. The following shows how each match is returned with the entire match, the 1st capture group and the index:

function getWords(str) {
  var arr = [];
  str.replace(/\w+(?=(.*))/g, function(m, remaining, index) {
    arr.push({ match: m, remainder: remaining, index: index });
  });
  return arr;
}

var words = getWords("Where in the world is Carmen Sandiego?");

After running the above, words will be as follows:

[
  {
    "match": "Where",
    "remainder": " in the world is Carmen Sandiego?",
    "index": 0
  },
  {
    "match": "in",
    "remainder": " the world is Carmen Sandiego?",
    "index": 6
  },
  {
    "match": "the",
    "remainder": " world is Carmen Sandiego?",
    "index": 9
  },
  {
    "match": "world",
    "remainder": " is Carmen Sandiego?",
    "index": 13
  },
  {
    "match": "is",
    "remainder": " Carmen Sandiego?",
    "index": 19
  },
  {
    "match": "Carmen",
    "remainder": " Sandiego?",
    "index": 22
  },
  {
    "match": "Sandiego",
    "remainder": "?",
    "index": 29
  }
]

In order to match multiple occurrences similar to what is available in PHP with preg_match_all you can use this type of thinking to make your own or use something like YourJS.matchAll(). YourJS more or less defines this function as follows:

function matchAll(str, rgx) {
  var arr, extras, matches = [];
  str.replace(rgx.global ? rgx : new RegExp(rgx.source, (rgx + '').replace(/[\s\S]+\//g , 'g')), function() {
    matches.push(arr = [].slice.call(arguments));
    extras = arr.splice(-2);
    arr.index = extras[0];
    arr.input = extras[1];
  });
  return matches[0] ? matches : null;
}

If you can get away with using map this is a four-line-solution:

var mystring = '1111342=Adam%20Franco&348572=Bob%20Jones';

var result = mystring.match(/(&|&amp;)?([^=]+)=([^&]+)/g) || [];
result = result.map(function(i) {
  return i.match(/(&|&amp;)?([^=]+)=([^&]+)/);
});

console.log(result);

Ain't pretty, ain't efficient, but at least it is compact. ;)

Use window.URL:

> s = 'http://www.example.com/index.html?1111342=Adam%20Franco&348572=Bob%20Jones'
> u = new URL(s)
> Array.from(u.searchParams.entries())
[["1111342", "Adam Franco"], ["348572", "Bob Jones"]]

To capture several parameters using the same name, I modified the while loop in Tomalak's method like this:

  while (match = re.exec(url)) {
    var pName = decode(match[1]);
    var pValue = decode(match[2]);
    params[pName] ? params[pName].push(pValue) : params[pName] = [pValue];
  }

input: ?firstname=george&lastname=bush&firstname=bill&lastname=clinton

returns: {firstname : ["george", "bill"], lastname : ["bush", "clinton"]}

Well... I had a similar problem... I want an incremental / step search with RegExp (eg: start search... do some processing... continue search until last match)

After lots of internet search... like always (this is turning an habit now) I end up in StackOverflow and found the answer...

Whats is not referred and matters to mention is "lastIndex" I now understand why the RegExp object implements the "lastIndex" property

Splitting it looks like the best option in to me:

'1111342=Adam%20Franco&348572=Bob%20Jones'.split('&').map(x => x.match(/(?:&|&amp;)?([^=]+)=([^&]+)/))

To avoid regex hell you could find your first match, chop off a chunk then attempt to find the next one on the substring. In C# this looks something like this, sorry I've not ported it over to JavaScript for you.

        long count = 0;
        var remainder = data;
        Match match = null;
        do
        {
            match = _rgx.Match(remainder);
            if (match.Success)
            {
                count++;
                remainder = remainder.Substring(match.Index + 1, remainder.Length - (match.Index+1));
            }
        } while (match.Success);
        return count;

本文标签: