javascript - Node JS grab the first image in an html string - Stack Overflow

IT技术

更新时间：2025-03-150

admin管理员组
文章数量:1316980

I'm trying to grab the first image in an html string like this one

  <table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"><a href=";amp;fd=R&amp;ct2=us&amp;usg=AFQjCNFfn6RXQ3v898sGY_-sFLGCJ4EV5Q&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52778551504048&amp;ei=zfK5U7D4JoLi1Ab0wIHwDw&amp;url="><img src="//t3.gstatic/images?q=tbn:ANd9GcQVyQsQJvKMgXHEX9riJuZKWav5U1nI-jdB-i1HwFYQ-7jGvGrbk9N_k0XEDMVH-HAbLxP1wrU" alt="" border="1" width="80" height="80" /><br /><font size="-2">Wall Street Journal</font></a></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br /><div style="padding-top:0.8em;"><img alt="" height="1" width="1" /></div><div class="lh"><a href=";amp;fd=R&amp;ct2=us&amp;usg=AFQjCNFfn6RXQ3v898sGY_-sFLGCJ4EV5Q&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52778551504048&amp;ei=zfK5U7D4JoLi1Ab0wIHwDw&amp;url="><b><b>Obama&#39;s</b> Letters to Corinthian</b></a><br /><font size="-1"><b><font color="#6f6f6f">Wall Street Journal</font></b></font><br /><font size="-1">The <b>Obama</b> Administration has targeted for-profit colleges as if they are enemy batants. And now it has succeeded in putting out of business Santa Ana-based Corinthian Colleges for a dilatory response to document requests. Does the White House plan&nbsp;...</font><br /><font size="-1" class="p"></font><br /><font class="p" size="-1"><a class="p" href=";amp;authuser=0&amp;ned=us"><nobr><b>and more&nbsp;&raquo;</b></nobr></a></font></div></font></td></tr></table>

here is the tag of the image

<img src="//t3.gstatic/images?q=tbn:ANd9GcQVyQsQJvKMgXHEX9riJuZKWav5U1nI-jdB-i1HwFYQ-7jGvGrbk9N_k0XEDMVH-HAbLxP1wrU" alt="" border="1" width="80" height="80">

every images has got this kind of url //tx.gstatic where x is a number i think between 0<x<3

That's what I do without success and I don't understand why this happen

      var re = /<img[^>]+src="?([^"\s]+)"?\s*\/>/g;
      var results = re.exec(HTMLSTRING);
      var img="";
      if(results!=null && results.length!=0) img = results[0];

I'm trying to grab the first image in an html string like this one

  <table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;"><tr><td width="80" align="center" valign="top"><font style="font-size:85%;font-family:arial,sans-serif"><a href="http://news.google./news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNFfn6RXQ3v898sGY_-sFLGCJ4EV5Q&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52778551504048&amp;ei=zfK5U7D4JoLi1Ab0wIHwDw&amp;url=http://online.wsj./articles/obamas-letters-to-corinthian-1404684555"><img src="//t3.gstatic./images?q=tbn:ANd9GcQVyQsQJvKMgXHEX9riJuZKWav5U1nI-jdB-i1HwFYQ-7jGvGrbk9N_k0XEDMVH-HAbLxP1wrU" alt="" border="1" width="80" height="80" /><br /><font size="-2">Wall Street Journal</font></a></font></td><td valign="top" class="j"><font style="font-size:85%;font-family:arial,sans-serif"><br /><div style="padding-top:0.8em;"><img alt="" height="1" width="1" /></div><div class="lh"><a href="http://news.google./news/url?sa=t&amp;fd=R&amp;ct2=us&amp;usg=AFQjCNFfn6RXQ3v898sGY_-sFLGCJ4EV5Q&amp;clid=c3a7d30bb8a4878e06b80cf16b898331&amp;cid=52778551504048&amp;ei=zfK5U7D4JoLi1Ab0wIHwDw&amp;url=http://online.wsj./articles/obamas-letters-to-corinthian-1404684555"><b><b>Obama&#39;s</b> Letters to Corinthian</b></a><br /><font size="-1"><b><font color="#6f6f6f">Wall Street Journal</font></b></font><br /><font size="-1">The <b>Obama</b> Administration has targeted for-profit colleges as if they are enemy batants. And now it has succeeded in putting out of business Santa Ana-based Corinthian Colleges for a dilatory response to document requests. Does the White House plan&nbsp;...</font><br /><font size="-1" class="p"></font><br /><font class="p" size="-1"><a class="p" href="http://news.google./news/more?ncl=dPkBozywrsIXKoM&amp;authuser=0&amp;ned=us"><nobr><b>and more&nbsp;&raquo;</b></nobr></a></font></div></font></td></tr></table>

here is the tag of the image

<img src="//t3.gstatic./images?q=tbn:ANd9GcQVyQsQJvKMgXHEX9riJuZKWav5U1nI-jdB-i1HwFYQ-7jGvGrbk9N_k0XEDMVH-HAbLxP1wrU" alt="" border="1" width="80" height="80">

every images has got this kind of url //tx.gstatic. where x is a number i think between 0<x<3

That's what I do without success and I don't understand why this happen

      var re = /<img[^>]+src="?([^"\s]+)"?\s*\/>/g;
      var results = re.exec(HTMLSTRING);
      var img="";
      if(results!=null && results.length!=0) img = results[0];

Share Improve this question asked Jul 7, 2014 at 1:14 Usi Usi 2,9975 gold badges40 silver badges70 bronze badges

Why what happen? Please explain what exactly is the problem. – Amadan Commented Jul 7, 2014 at 1:17
the results[0] is empty I think that the regex expression is not valid – Usi Usi Commented Jul 7, 2014 at 1:20
You're trying to match through to the ending \>, but you're only allowing for space characters after the end of the src value. Would seem that you really don't need to go all the way to the \>. – cookie monster Commented Jul 7, 2014 at 1:30
...and FYI, if results is not null, then its .length will not be 0, though it would seem that you'd want index [1] if you want the src value. Also, your approach is dependent on the order of attributes, the lower case of the tag name, and the use of double quotes instead of single. Just thought I'd point that out. – cookie monster Commented Jul 7, 2014 at 1:31
stackoverflow./questions/1732348/… – rgajrawala Commented Sep 27, 2014 at 17:58

Add a ment |

2 Answers 2

Sorted by: Reset to default 9

The regular expression you provide indeed is not general enough to capture your <img> tag.

There are two options:

Make a better regular expression. This way lies madness. But in this case, it is sufficient to add the possibility of other attributes after src:
```
var re = /<img[^>]+src="?([^"\s]+)"?[^>]*\/>/g;
var results = re.exec(HTMLSTRING);
var img="";
if(results) img = results[1];
```
Note [^>]* replacing your \s*, and also note results[1] instead of results[0] if you want the source and not the tag itself.

Use a DOM parser to handle DOM. This is the easy path.

var jsdom = require("jsdom");
var img_sources = jsdom.env(
  HTMLSTRING,
  function (errors, window) {
    var imgs = window.document.getElementsByTagName('img');
    for (var i = 0; i < imgs.length; i++) {
      var src = imgs[i].getAttribute('src');
      if (src) console.log(src);
    }
  }
);

You could use the jQuery NPM module and do this:

var jQuery = require('jquery');

try {
    var src = jQuery('YOUR_HTML_STRING').find('img')[0].src;
    console.log('Output:\nSrc: ' + src + '\nNum: ' + (src.match(/\/\/t[0-3]/)[0])[3]);
} catch (e) {
    console.log('Could not find <img>!');
}

本文标签： javascriptNode JS grab the first image in an html stringStack Overflow

版权声明：本文标题：javascript - Node JS grab the first image in an html string - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1742012508a2413152.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

javascript - Node JS grab the first image in an html string - Stack Overflow

2 Answers 2

更多相关文章