admin管理员组文章数量:1397142
I want to get HTML code of a website and then get a certain element from that HTML file.
There are things that can get HTML code like ajax and jquery. I am using node and want it to be in total javascript. Also, I have no idea how to get a certain element from that.
I have done this in python but I need it in javascript. For simplicity. Let's take the website- . This is the body of the HTML code of website.
<body>
<div>
#Some Stuff
</div>
</body>
I want to get the div class lets take <div>
to be <div class="test">
to make it easier.
Finally, I want to get- the content of <div class="test">
Like this-
<div class="test">
#Some Stuff
</div>
Thanks in Advance
I want to get HTML code of a website and then get a certain element from that HTML file.
There are things that can get HTML code like ajax and jquery. I am using node and want it to be in total javascript. Also, I have no idea how to get a certain element from that.
I have done this in python but I need it in javascript. For simplicity. Let's take the website- https://example.. This is the body of the HTML code of website.
<body>
<div>
#Some Stuff
</div>
</body>
I want to get the div class lets take <div>
to be <div class="test">
to make it easier.
Finally, I want to get- the content of <div class="test">
Like this-
<div class="test">
#Some Stuff
</div>
Thanks in Advance
Share Improve this question edited Sep 22, 2019 at 18:06 Laczkó Örs 1,1161 gold badge27 silver badges43 bronze badges asked Sep 22, 2019 at 18:01 Weirdo914Weirdo914 1931 gold badge2 silver badges14 bronze badges 1- 1 Are you planning on attempting to fetch the html and parse it in-browser or on a Node.js server? – Jacob Penney Commented Sep 22, 2019 at 19:44
3 Answers
Reset to default 4For Node.js there are two native fetching modules: http
and https
. If you're looking to scrape with a Node.js application, then you should probably use https
, get the page's html, parse it with an html parser, I'd remend cheerio
. Here's an example:
// native Node.js module
const https = require('https')
// don't forget to `npm install cheerio` to get the parser!
const cheerio = require('cheerio')
// custom fetch for Node.js
const fetch = (method, url, payload=undefined) => new Promise((resolve, reject) => {
https.get(
url,
res => {
const dataBuffers = []
res.on('data', data => dataBuffers.push(data.toString('utf8')))
res.on('end', () => resolve(dataBuffers.join('')))
}
).on('error', reject)
})
const scrapeHtml = url => new Promise((resolve, reject) =>{
fetch('GET', url)
.then(html => {
const cheerioPage = cheerio.load(html)
// cheerioPage is now a loaded html parser with a similar interface to jQuery
// FOR EXAMPLE, to find a table with the id productData, you would do this:
const productTable = cheerioPage('table .productData')
// then you would need to reload the element into cheerio again to
// perform more jQuery like searches on it:
const cheerioProductTable = cheerio.load(productTable)
const productRows = cheerioProductTable('tr')
// now we have a reference to every row in the table, the object
// returned from a cheerio search is array-like, but native JS functions
// such as .map don't work on it, so we need to do a manually calibrated loop:
let i = 0
let cheerioProdRow, prodRowText
const productsTextData = []
while(i < productRows.length) {
cheerioProdRow = cheerio.load(productRows[i])
prodRowText = cheerioProdRow.text().trim()
productsTextData.push(prodRowText)
i++
}
resolve(productsTextData)
})
.catch(reject)
})
scrapeHtml(/*URL TO SCRAPE HERE*/)
.then(data => {
// expect the data returned to be an array of text from each
// row in the table from the html we loaded. Now we can do whatever
// else you want with the scraped data.
console.log('data: ', data)
})
.catch(err => console.log('err: ', err)
Happy scraping!
See the working example below. You can do more stuff in the for loop if necessary.
In general you can access the Html-Content of most Objects via .innerHTML
.
var divs = document.getElementsByClassName("test");
for(var i = 0; i < divs.length; i++)
{
console.log(divs[i].innerHTML);
}
<div class="test">
#Some Stuff
</div>
When you want to get the SourceCode of a Site on the same Domain you can use this How do I get source code from a webpage? and bine it with my given solution for your second problem.
You will never be able to get the source code of a page that is not on the same domain as your page in javascript. Please take a look : http://en.wikipedia/wiki/Same_origin_policy
If the domain is the same you can get it by iFrames:
var url=".../xxxx"; // same domain !
var iframe=document.createElement("iframe");
iframe.onload=function()
{
var result= iframe.contentWindow.document.body.querySelector('.test').innerHTML);
alert(result);
}
iframe.src=url;
iframe.style.display="none";
document.body.appendChild(iframe);
var result access the iframe and select the element that have class named "test" and get its inner html in your case the result will be :
#Some Stuff
版权声明:本文标题:javascript - Get html source code from a website and then get an element from the html file - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744147706a2592916.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论