admin管理员组文章数量:1406177
I am taking an old hardcoded website of mine and trying to strip the data out of the HTML and drop it into a new JSON object.
Currently I am receiving a table of items (reduced for simplicity) as 1 giant string, there are almost 1000 rows. There are no classes or attributes on any of the HTML
let tableString = `
<tr>
<td>01/01/1999</td>
<td>Item 1</td>
<td>55</td>
</tr>
<tr>
<td>01/01/2000</td>
<td>Item 2</td>
<td>35</td>
</tr>
`
I am working towards achieving the following object
[{
date: '01/01/1999',
name: 'Item 1',
cost: 55
},
{
date: '01/01/2000',
name: 'Item 2',
cost: 35
}]
Current code I have implemented
let newData = []
let stringArray = results.split('</tr>')
stringArray.map(item => {
let stripped = item.replace('/n', '')
stripped = stripped.replace('<tr>', '')
let items = stripped.split('<td>')
let newItem = {
data: items[0],
name: items[1],
cost: items[2]
}
return newData.push(newItem)
})
I am taking the giant string and splitting it at the end of every item. This works however it strips the actual tag out of the item itself and leaves me with an extra (empty string item in my array).
Next I am mapping over each string in my array and further trying to strip all line breaks out as well as the in order to have an array of table cells, then In theory I can build out my object (after I strip the table cells out).
However as I am doing this replace
doesnt seem to be working, is my thinking process correct on how I am moving forward, should I look at regex patterns to target this better?
I am taking an old hardcoded website of mine and trying to strip the data out of the HTML and drop it into a new JSON object.
Currently I am receiving a table of items (reduced for simplicity) as 1 giant string, there are almost 1000 rows. There are no classes or attributes on any of the HTML
let tableString = `
<tr>
<td>01/01/1999</td>
<td>Item 1</td>
<td>55</td>
</tr>
<tr>
<td>01/01/2000</td>
<td>Item 2</td>
<td>35</td>
</tr>
`
I am working towards achieving the following object
[{
date: '01/01/1999',
name: 'Item 1',
cost: 55
},
{
date: '01/01/2000',
name: 'Item 2',
cost: 35
}]
Current code I have implemented
let newData = []
let stringArray = results.split('</tr>')
stringArray.map(item => {
let stripped = item.replace('/n', '')
stripped = stripped.replace('<tr>', '')
let items = stripped.split('<td>')
let newItem = {
data: items[0],
name: items[1],
cost: items[2]
}
return newData.push(newItem)
})
I am taking the giant string and splitting it at the end of every item. This works however it strips the actual tag out of the item itself and leaves me with an extra (empty string item in my array).
Next I am mapping over each string in my array and further trying to strip all line breaks out as well as the in order to have an array of table cells, then In theory I can build out my object (after I strip the table cells out).
However as I am doing this replace
doesnt seem to be working, is my thinking process correct on how I am moving forward, should I look at regex patterns to target this better?
- 4 You seem to have a structure html document, why not read it as an html element and parse your way through? – Icepickle Commented Jan 2, 2020 at 16:11
- It would be much easier to stick the trs into a table, and then you could perform querySelectors and querySelectorAlls on the table to get the elements, rather than manually parsing the html – Taplar Commented Jan 2, 2020 at 16:14
- You could try using 2 regex – aemonge Commented Jan 2, 2020 at 16:21
- It's generally ill-advised to build your own HTML parser, In this case just iterate through your TR and TDs and get the textContent. – Mike Commented Jan 3, 2020 at 0:27
5 Answers
Reset to default 6You could just stick the trs into a table and process the data out of the table element.
let tableString = `
<tr>
<td>01/01/1999</td>
<td>Item 1</td>
<td>55</td>
</tr>
<tr>
<td>01/01/2000</td>
<td>Item 2</td>
<td>35</td>
</tr>
`;
const table = document.createElement('table');
table.innerHTML = tableString;
console.log(
[...table.querySelectorAll('tr')].map(tr => {
return {
date: tr.children[0].innerText,
name: tr.children[1].innerText,
cost: tr.children[2].innerText
};
})
);
As others have suggested:
- Create a hidden table
- Populate it with the row data
- Return a mapped JSON array with fields
const tableString = `
<tr>
<td>01/01/1999</td>
<td>Item 1</td>
<td>55</td>
</tr>
<tr>
<td>01/01/2000</td>
<td>Item 2</td>
<td>35</td>
</tr>
`;
console.log(tableRowsToJSON(tableString, ['date', 'name', 'cost']));
function tableRowsToJSON(tableRows, fields) {
let table = document.querySelector('.hidden-table');
populateTable(emptyTable(table), tableRows);
return Array.from(table.querySelectorAll('tbody tr')).map(tr => {
let tds = tr.querySelectorAll('td');
return fields.reduce((obj, field, index) => {
return Object.assign(obj, { [field] : tds[index].textContent });
}, {});
});
}
function populateTable(table, dataString) {
if (table.querySelector('tbody') == null) {
table.appendChild(document.createElement('tbody'));
}
table.querySelector('tbody').innerHTML = dataString;
return table;
}
function emptyTable(table) {
let tbody = table.querySelector('tbody');
if (tbody) {
while (tbody.hasChildNodes()) {
tbody.removeChild(tbody.lastChild);
}
}
return table;
}
.as-console-wrapper { top: 0; max-height: 100% !important; }
.hidden-table { display: none; }
<table class="hidden-table"></table>
As a plugin
You can call this instead:
let parser = new TableRowParser()
console.log(parser.parse(tableString, ['date', 'name', 'cost']))
const tableString = `
<tr>
<td>01/01/1999</td>
<td>Item 1</td>
<td>55</td>
</tr>
<tr>
<td>01/01/2000</td>
<td>Item 2</td>
<td>35</td>
</tr>
`;
class TableRowParser {
constructor(config) {
this.options = Object.assign({}, TableRowParser.defaults, config)
if (document.querySelector('.' + this.options.selector) == null) {
let hiddenTable = document.createElement('table')
hiddenTable.classList.add(this.options.selector)
document.body.appendChild(hiddenTable)
}
this.tableRef = document.querySelector('.' + this.options.selector)
}
/* @public */
parse(dataString, fields) {
this.__emptyTable()
this.__populateTable(dataString)
return Array.from(this.tableRef.querySelectorAll('tbody tr')).map(tr => {
let tds = tr.querySelectorAll('td')
return fields.reduce((obj, field, index) => {
return Object.assign(obj, { [field] : tds[index].textContent })
}, {});
});
}
/* @private */
__populateTable(dataString) {
if (this.tableRef.querySelector('tbody') == null) {
this.tableRef.appendChild(document.createElement('tbody'))
}
this.tableRef.querySelector('tbody').innerHTML = dataString
}
/* @private */
__emptyTable() {
let tbody = this.tableRef.querySelector('tbody')
if (tbody) {
while (tbody.hasChildNodes()) {
tbody.removeChild(tbody.lastChild)
}
}
}
}
/* @static */
TableRowParser.defaults = {
selector : 'hidden-table'
}
let parser = new TableRowParser()
console.log(parser.parse(tableString, ['date', 'name', 'cost']))
.as-console-wrapper { top: 0; max-height: 100% !important; }
.hidden-table { display: none; }
Here's a while loop that uses substrings and indexOfs. It makes use of the often neglected second parameter for indexOf, which allows you to specify the minimum starting position for the search. It's probably better to just create the HTML table element and read the innerHTML of each td, but if this is easier for you, here you go:
let str = `
<tr>
<td>01/01/1999</td>
<td>Item 1</td>
<td>55</td>
</tr>
<tr>
<td>01/01/2000</td>
<td>Item 2</td>
<td>35</td>
</tr>
`;
var BEGIN = "<td>";
var END = "</td>";
var objs = [];
while (str.indexOf(BEGIN) > -1 && str.indexOf(END, str.indexOf(BEGIN)) > -1) {
var obj = {};
obj.date = str.substring(str.indexOf(BEGIN) + BEGIN.length, str.indexOf(END, str.indexOf(BEGIN)));
str = str.substring(0, str.indexOf(BEGIN)) + str.substring(str.indexOf(END, str.indexOf(BEGIN)) + BEGIN.length);
obj.name = str.substring(str.indexOf(BEGIN) + BEGIN.length, str.indexOf(END, str.indexOf(BEGIN)));
str = str.substring(0, str.indexOf(BEGIN)) + str.substring(str.indexOf(END, str.indexOf(BEGIN)) + BEGIN.length);
obj.const = str.substring(str.indexOf(BEGIN) + BEGIN.length, str.indexOf(END, str.indexOf(BEGIN)));
str = str.substring(0, str.indexOf(BEGIN)) + str.substring(str.indexOf(END, str.indexOf(BEGIN)) + BEGIN.length);
objs.push(obj);
}
console.log(objs);
I prefer to use X-ray npm-module for crawling data from html
pages. For example:
const Xray = require('x-ray');
const x = Xray();
let html = `
<tr>
<td>01/01/1999</td>
<td>Item 1</td>
<td>55</td>
</tr>
<tr>
<td>01/01/2000</td>
<td>Item 2</td>
<td>35</td>
</tr>
`;
x(html, 'tr', [['td']])
.then(function(res) {
console.log(res); // prints first result
});
Which will give you:
[ [ '01/01/1999', 'Item 1', '55' ], [ '01/01/2000', 'Item 2', '35' ] ]
So the next step will be iterating over array of arrays and form with it a necessary json
, so I guess it won't be a problem according to this question.
Also, you could use old table-to-json from converting table-oriented sites right in-to pretty JSON.
read the html tag as an XML, the DOM is a XML .
let tableString = ' <record> '+
' <tr> '+
'<td>01/01/1999</td>'+
'<td>Item 1</td>'+
'<td>55</td>'+
'</tr>'+
'<tr>'+
' <td>01/01/2000</td>'+
' <td>Item 2</td>'+
' <td>35</td>'+
'</tr>'+
' </record> ';
let source = ( new DOMParser() ).parseFromString( tableString, "application/xml" );
console.log(source);
let size = source.childNodes[0].childNodes.length;
for (let id =0; id< size;id++){
let tag = source.childNodes[0].childNodes[id];
if(tag.nodeName=='tr'){
let tagTr = tag.childNodes;
console.log(tagTr[1].textContent);
console.log(tagTr[2].textContent);
console.log(tagTr[3].textContent);
}
}
console.log(size);
本文标签: javascriptConvert string of HTML into JSON ObjectStack Overflow
版权声明:本文标题:javascript - Convert string of HTML into JSON Object - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744367619a2602859.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论