javascript regex to find image urls in string - Stack Overflow

IT技术

更新时间：2025-04-205

admin管理员组
文章数量:1414628

I'm using a Javascript regEx to parse a database field for image urls and format them for output - so far, I have been using

input = input.replace(/(https?:\/\/.*?\.(?:png|jpe?g|gif)(.*))(\w|$)/ig, "<br><img style='max-width:100%;overflow:hidden;' src='$1'>");

and its been serving me well. All png, jpe?g and gif references get replaced by IMG tags and images show in the output stream as intended.

However, I've been thrown a loop.

I've noticed some urls (notably those from Facebook CDN - though I supposed others could also be doing this as well) have appended a whole pile of "stuff" after the image type ... stuff that if not present results in the files not being available, and a missing image icon gets produced. For example, this is a valid picture url from fbcdn:

.0-9/11147160_10156300867440377_5455334309678688318_n.jpg?oh=916e68ac2c908bbe15961825c373d6bc&oe=5606B6F4

Can someone suggest a change/improvement to the regEx that would pick up the extra trailing characters? Or is another method of attack necessary

(I personally like the global regEx as I can nail all of the instances in the stream at once... having to manually parse the stream is not something I would look forward to...)

Update: I understand there is some ambiguity in the request - hopefully this will clarify.

I need to pull out any image url - regardless of the "stuff" after the image extention. It could be the first item in the text string, or the last, or embedded somewhere in the middle.

The processing is done in Javascript. I am currently using this as my validity test. All images within it are valid urls pulled from Google image search.

.png?20150508104424447 This is arbitrary text .0-9/11147160_10156300867440377_5455334309678688318_n.jpg?oh=916e68ac2c908bbe15961825c373d6bc&oe=5606B6F4 this is arbitrary text

.jpg?imgmax=800 this is arbitrary text .jpg?cb=1409089267

Hopefully this sheds sufficient light into the types of variations I may encounter (The only one I know for sure is the FBCDN - I'm basing the others on knowledge of what else I've seen out there... so a generalized solution is needed, not one specific to FBCDN).

Thank you to all that offer suggestions...

I'm using a Javascript regEx to parse a database field for image urls and format them for output - so far, I have been using

input = input.replace(/(https?:\/\/.*?\.(?:png|jpe?g|gif)(.*))(\w|$)/ig, "<br><img style='max-width:100%;overflow:hidden;' src='$1'>");

and its been serving me well. All png, jpe?g and gif references get replaced by IMG tags and images show in the output stream as intended.

However, I've been thrown a loop.

I've noticed some urls (notably those from Facebook CDN - though I supposed others could also be doing this as well) have appended a whole pile of "stuff" after the image type ... stuff that if not present results in the files not being available, and a missing image icon gets produced. For example, this is a valid picture url from fbcdn:

https://scontent-lga1-1.xx.fbcdn/hphotos-xtf1/v/t1.0-9/11147160_10156300867440377_5455334309678688318_n.jpg?oh=916e68ac2c908bbe15961825c373d6bc&oe=5606B6F4

Can someone suggest a change/improvement to the regEx that would pick up the extra trailing characters? Or is another method of attack necessary

(I personally like the global regEx as I can nail all of the instances in the stream at once... having to manually parse the stream is not something I would look forward to...)

Update: I understand there is some ambiguity in the request - hopefully this will clarify.

I need to pull out any image url - regardless of the "stuff" after the image extention. It could be the first item in the text string, or the last, or embedded somewhere in the middle.

The processing is done in Javascript. I am currently using this as my validity test. All images within it are valid urls pulled from Google image search.

http://well-being.esdc.gc.ca/misme-iowb/auto/diagramme-chart/stg2/c_4_21_6_1_eng.png?20150508104424447 This is arbitrary text https://scontent-lga1-1.xx.fbcdn/hphotos-xtf1/v/t1.0-9/11147160_10156300867440377_5455334309678688318_n.jpg?oh=916e68ac2c908bbe15961825c373d6bc&oe=5606B6F4 this is arbitrary text

http://lh6.ggpht./-1Rua79J-EDo/TwuyZkHwcmI/AAAAAAAADvA/ENfg1TeayvU/type_catalog_error_thumb%25255B1%25255D.jpg?imgmax=800 this is arbitrary text http://image.slidesharecdn./top5thingstodoafteranaccident-140826163850-phpapp02/95/top-five-things-to-do-after-any-type-of-accident-causing-injury-1-638.jpg?cb=1409089267

Hopefully this sheds sufficient light into the types of variations I may encounter (The only one I know for sure is the FBCDN - I'm basing the others on knowledge of what else I've seen out there... so a generalized solution is needed, not one specific to FBCDN).

Thank you to all that offer suggestions...

Share Improve this question edited Jun 2, 2015 at 14:47 asked Jun 2, 2015 at 4:59 Scott Brown 3013 silver badges16 bronze badges

To catch the optional question mark and the rest, you would use (\?blabla)? but typing this almost sounds too easy. Is there a problem? – Mr Lister Commented Jun 2, 2015 at 5:15
@MrLister - yes, the problem was I was staring at it too long and getting nowhere with my testing on regexpal. .. all the variations I tried were either too greedy, or not greedy enough. The FB urls have some consistency - but I'm sure I should be limiting myself to this. I've also seen a few (example not available, sorry) where there is sizing info appended, and others who seem to append a timestamp (for a cache?) Who knows what evil concoctions others have put in place. – Scott Brown Commented Jun 2, 2015 at 12:15

Add a ment |

1 Answer 1

Sorted by: Reset to default 6

Updated after OP updated with more example input.

There are three issues with your attempt: boundaries of your matches, using '.*' and missing pattern for legal postfix.

The dot star notation is a bad idea in RegEx, which the article "Death to Dot Star!" illustrates quite well. Use negated character classes instead, and here I chose "\S*?" which is "any character that is not a whitespace". If you try replacing that with ".*?" instead on regex101, you can see it failing to match properly (it includes a link that is not an image).

Since it is all in the same string, boundries must be defined for the match, and since whitespace is sufficient "\b" does the trick nicely. This also removes the need for the "(.*)" and "(\w|$)" parts.

The last thing you missed was the legal endings to the url, and there are two solutions to this: Either define what you think is plausible to include most scenarios and have no false positives, or include anything but have a chance of getting too many results.

Wrap it all together, and you are left with these two different approaches:

Solution 1 - define what is correct

\b(https?:\/\/\S*?\.(?:png|jpe?g|gif)
  # allowed postfixes to the filetype
  (?:\?(?:
    # alphnumeric key/value pairs
    (?:(?:[\w_-]+=[\w_-]+)(?:&[\w_-]+=[\w_-]+)*)|
    # alphnumeric postfix
    (?:[\w_-]+)
  ))?
)\b

Try it out on regex101

Solution 2 - use whitespace as the only factor

\b(https?:\/\/\S+(?:png|jpe?g|gif)\S*)\b

Try it out on regex101

本文标签： javascript regex to find image urls in stringStack Overflow

版权声明：本文标题：javascript regex to find image urls in string - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1745150870a2644915.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

javascript regex to find image urls in string - Stack Overflow

1 Answer 1

Solution 1 - define what is correct

Solution 2 - use whitespace as the only factor

更多相关文章

javascript regex to find image urls in string - Stack Overflow

发表评论

推荐文章

c# - How to refresh the parent page and closing the child window using javascript - Stack Overflow

javascript - If Internet Explorer - Add Script - Stack Overflow

javascript - CKEditor: call a plugin function without toolbar button - Stack Overflow

javascript - Trying to change background color of iFrame in IE - Stack Overflow

javascript - async API call inside forEach loop - Stack Overflow

热门文章

javascript - What is best way to compress file and text &quot;inside browser&quot; before sending to server? - Stack Ove

put coldfusion query result into javascript array as a javascript object (google maps) - Stack Overflow

javascript - How to continuously add objects to nested collection in Firestore - Stack Overflow

javascript - Return promise from promise - Stack Overflow

javascript - Is it better to use divs or tables to contain columns of links? - Stack Overflow

javascript - Difference between destructing export default and module.exports - Stack Overflow

html - Javascript: Best place to register event handlers - Stack Overflow

html - What is the priority of execution of javascript? - Stack Overflow

How to send new line whatsapp message using PHPJavaScript? - Stack Overflow

javascript - How do you reference Array.prototype.slice.call()? - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

php - Convert all images to PNG on file upload

c# - Why is XMLHttpRequest.status == 0 on all the browsers except IE? - Stack Overflow

javascript - Detect if browserdevice supports double click events - Stack Overflow

javascript - Check radio buttons in a loop with a delay - Stack Overflow

Why does Internet Explorer (or other browsers) use old JavaScript files when I try to debug my ASP.NET program? - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

javascript - What is best way to compress file and text "inside browser" before sending to server? - Stack Ove