admin管理员组

文章数量:1327970

I'm trying to get the base url from a string (So no window.location).

  • It needs to remove the trailing slash
  • It needs to be regex (No New URL)
  • It need to work with query parameters and anchor links

In other words all the following should return or for the last one.

  • ;slash=false
  • #anchor=true&slash=false
  • ;slash=true&whatever=foo

These are just examples, urls can have different subdomains like /?query=foo should return - It could be any url like:

The closer I got is with:

const baseUrl = url.replace(/^((\w+:)?\/\/[^\/]+\/?).*$/,'$1').replace(/\/$/, ""); // Base Path & Trailing slash

But this doesn't work with anchor links and queries which start right after the url without the / before

Any idea how I can get it to work on all cases?

I'm trying to get the base url from a string (So no window.location).

  • It needs to remove the trailing slash
  • It needs to be regex (No New URL)
  • It need to work with query parameters and anchor links

In other words all the following should return https://apple. or https://www.apple. for the last one.

  • https://apple.?query=true&slash=false
  • https://apple.#anchor=true&slash=false
  • http://www.apple./#anchor=true&slash=true&whatever=foo

These are just examples, urls can have different subdomains like https://shop.apple.co.uk/?query=foo should return https://shop.apple.co.uk - It could be any url like: https://foo.bar

The closer I got is with:

const baseUrl = url.replace(/^((\w+:)?\/\/[^\/]+\/?).*$/,'$1').replace(/\/$/, ""); // Base Path & Trailing slash

But this doesn't work with anchor links and queries which start right after the url without the / before

Any idea how I can get it to work on all cases?

Share Improve this question edited Jan 12, 2019 at 15:01 marc_s 756k184 gold badges1.4k silver badges1.5k bronze badges asked Jan 9, 2019 at 20:44 CostantinCostantin 2,6568 gold badges34 silver badges52 bronze badges 2
  • You could add # and ? to your negated character class. Try ^https?:\/\/[^#?\/]+ demo – The fourth bird Commented Jan 9, 2019 at 20:51
  • Do you need to use regex for some reason? Or do you just want the protocol and hostname? – Matt Morgan Commented Jan 9, 2019 at 20:52
Add a ment  | 

4 Answers 4

Reset to default 4

You could add # and ? to your negated character class. You don't need .* because that will match until the end of the string.

For your example data, you could match:

^https?:\/\/[^#?\/]+

Regex demo

strings = [
"https://apple.?query=true&slash=false",
    "https://apple.#anchor=true&slash=false",
    "http://www.apple./#anchor=true&slash=true&whatever=foo",
    "https://foo.bar/?q=true"
];

strings.forEach(s => {
    console.log(s.match(/^https?:\/\/[^#?\/]+/)[0]);
})

You could use Web API's built-in URL for this. URL will also provide you with other parsed properties that are easy to get to, like the query string params, the protocol, etc.

Regex is a painful way to do something that the browser makes otherwise very simple.

I know that you asked about using regex, but in the event that you (or someone ing here in the future) really just cares about getting the information out and isn't mitted to using regex, maybe this answer will help.

let one = "https://apple.?query=true&slash=false"
let two = "https://apple.#anchor=true&slash=false"
let three = "http://www.apple./#anchor=true&slash=true&whatever=foo"

let urlOne = new URL(one)
console.log(urlOne.origin)

let urlTwo = new URL(two)
console.log(urlTwo.origin)

let urlThree = new URL(three)
console.log(urlThree.origin)

    const baseUrl = url.replace(/(.*:\/\/.*)[\?\/#].*/, '$1');

This will get you everything up to the . part. You will have to append . once you pull out the first part of the url.

^http.*?(?=\.)

Or maybe you could do:

myUrl.Replace(/(#|\?|\/#).*$/, "")

To remove everything after the host name.

本文标签: Get base url from string with Regex and JavascriptStack Overflow