admin管理员组文章数量:1291309
Confused by Node's filesystem parsing. Here's my code:
var fs = require('fs'),
xml2js = require('xml2js');
var parser = new xml2js.Parser();
var stream = fs.createReadStream('xml/bigXML.xml');
stream.setEncoding('utf8');
stream.on('data', function(chunk){
parser.parseString(chunk, function (err, result) {
console.dir(result);
console.log('Done');
});
});
stream.on('end', function(chunk){
// file have been read over,do something...
console.log("IT'S OVER")
});
This causes...nothing to happen. No output from XML2JS/the parser at all. When I try to console.log(chunk)
it seems that the chunks
aren't being output in any sort of meaningful chunks based on anything other than perhaps byte size. The output for one 'chunk' is:
<?xml version="1.0" encoding="UTF-8"?>
<merchandiser xmlns:xsi="" xsi:noNamespaceSchemaLocation="merchandiser.xsd">
<header><merchantId>1237</merchantId><merchantName>NORDSTROM</merchantName><createdOn>12/13/2013 23:50:57</createdOn></header>
<product product_id="52863929">// product info</product>
<product product_id="26537849">// product info</product>
<product product_id="25535647">// product info</product>
This chunk has lots and lots of <product>
entries from the XML inside of it. The chunk will end somewhere in the middle of a <product>
entry and the next chunk will begin from where this left off.
The main question is How do I get the createReadStream
to output chunks starting at <product
and ending at </product>
?
EDIT: for the purposes of getting the proper output, here's what the XML from the beginning to the end of the first <product>
looks like:
<?xml version="1.0" encoding="UTF-8" ?>
<merchandiser xmlns:xsi="" xsi:noNamespaceSchemaLocation="merchandiser.xsd">
<header>
<merchantId>1237</merchantId>
<merchantName>NORDSTROM</merchantName>
<createdOn>12/13/2013 23:50:57</createdOn>
</header>
<product product_id="52863929" name="Teva 'Psyclone' Print Sandal (Baby, Walker & Toddler) Camo/ Dark Olive 6 M" sku_number="52863929" manufacturer_name="Teva" part_number="1001701">
<category>
<primary>Toddler Unisex</primary>
<secondary>Shoes~~Sandals/Slides</secondary>
</category>
<URL>
<product>;amp;offerid=276223.52863929&type=15&murl=http%3A%2F%2Fshop.nordstrom%2FS%2F3297406%3Fcm_cat%3Ddatafeed%26cm_pla%3Dshoes%3Asandals%252fslides%26cm_ite%3Dteva_%2527psyclone%2527_print_sandal_%2528baby%252c_walker_%2526_toddler%2529%3A503158_1%26cm_ven%3DLinkshare</product>
<productImage>.jpg</productImage>
<buy></buy>
</URL>
<description>
<short>Rugged construction and stylish good looks define a sporty sandal, with the added convenience and security of hook-and-loop closures across the toe and at the instep.Rugged construction and stylish good looks define a sporty sandal, with the added
convenience and security of h...</short>
<long>Rugged construction and stylish good looks define a sporty sandal, with the added convenience and security of hook-and-loop closures across the toe and at the instep.Rugged construction and stylish good looks define a sporty sandal, with the added
convenience and security of hook-and-loop closures across the toe and at the instep. Color(s): camo/ dark olive, daisy blue. Brand: Teva. Style Name: Teva 'Psyclone' Print Sandal (Baby, Walker & Toddler). Style Number: 503158_1.</long>
</description>
<discount currency="USD">
<amount></amount>
<type>amount</type>
</discount>
<price currency="USD">
<sale begin_date="" end_date="">24.95</sale>
<retail>24.95</retail>
</price>
<brand>Teva</brand>
<shipping>
<cost currency="USD">
<amount>0.00</amount>
<currency>USD</currency>
</cost>
<information></information>
<availability>Y</availability>
</shipping>
<keywords></keywords>
<upc>737872649135</upc>
<m1>503158_1.</m1>
<pixel>;amp;bids=276223.52863929&type=15&subid=0</pixel>
<attributeClass class_id="60">
<Misc></Misc>
<Product_Type>Shoes</Product_Type>
<Size>6 M</Size>
<Material></Material>
<Color>CAMO/ DARK OLIVE</Color>
<Gender>Unisex</Gender>
<Style></Style>
<Age></Age>
</attributeClass>
</product>
Confused by Node's filesystem parsing. Here's my code:
var fs = require('fs'),
xml2js = require('xml2js');
var parser = new xml2js.Parser();
var stream = fs.createReadStream('xml/bigXML.xml');
stream.setEncoding('utf8');
stream.on('data', function(chunk){
parser.parseString(chunk, function (err, result) {
console.dir(result);
console.log('Done');
});
});
stream.on('end', function(chunk){
// file have been read over,do something...
console.log("IT'S OVER")
});
This causes...nothing to happen. No output from XML2JS/the parser at all. When I try to console.log(chunk)
it seems that the chunks
aren't being output in any sort of meaningful chunks based on anything other than perhaps byte size. The output for one 'chunk' is:
<?xml version="1.0" encoding="UTF-8"?>
<merchandiser xmlns:xsi="http://www.w3/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="merchandiser.xsd">
<header><merchantId>1237</merchantId><merchantName>NORDSTROM.</merchantName><createdOn>12/13/2013 23:50:57</createdOn></header>
<product product_id="52863929">// product info</product>
<product product_id="26537849">// product info</product>
<product product_id="25535647">// product info</product>
This chunk has lots and lots of <product>
entries from the XML inside of it. The chunk will end somewhere in the middle of a <product>
entry and the next chunk will begin from where this left off.
The main question is How do I get the createReadStream
to output chunks starting at <product
and ending at </product>
?
EDIT: for the purposes of getting the proper output, here's what the XML from the beginning to the end of the first <product>
looks like:
<?xml version="1.0" encoding="UTF-8" ?>
<merchandiser xmlns:xsi="http://www.w3/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="merchandiser.xsd">
<header>
<merchantId>1237</merchantId>
<merchantName>NORDSTROM.</merchantName>
<createdOn>12/13/2013 23:50:57</createdOn>
</header>
<product product_id="52863929" name="Teva 'Psyclone' Print Sandal (Baby, Walker & Toddler) Camo/ Dark Olive 6 M" sku_number="52863929" manufacturer_name="Teva" part_number="1001701">
<category>
<primary>Toddler Unisex</primary>
<secondary>Shoes~~Sandals/Slides</secondary>
</category>
<URL>
<product>http://click.linksynergy./link?id=LUyP0GcLCGc&offerid=276223.52863929&type=15&murl=http%3A%2F%2Fshop.nordstrom.%2FS%2F3297406%3Fcm_cat%3Ddatafeed%26cm_pla%3Dshoes%3Asandals%252fslides%26cm_ite%3Dteva_%2527psyclone%2527_print_sandal_%2528baby%252c_walker_%2526_toddler%2529%3A503158_1%26cm_ven%3DLinkshare</product>
<productImage>http://content.nordstrom./imagegallery/store/product/large/0/_6880020.jpg</productImage>
<buy></buy>
</URL>
<description>
<short>Rugged construction and stylish good looks define a sporty sandal, with the added convenience and security of hook-and-loop closures across the toe and at the instep.Rugged construction and stylish good looks define a sporty sandal, with the added
convenience and security of h...</short>
<long>Rugged construction and stylish good looks define a sporty sandal, with the added convenience and security of hook-and-loop closures across the toe and at the instep.Rugged construction and stylish good looks define a sporty sandal, with the added
convenience and security of hook-and-loop closures across the toe and at the instep. Color(s): camo/ dark olive, daisy blue. Brand: Teva. Style Name: Teva 'Psyclone' Print Sandal (Baby, Walker & Toddler). Style Number: 503158_1.</long>
</description>
<discount currency="USD">
<amount></amount>
<type>amount</type>
</discount>
<price currency="USD">
<sale begin_date="" end_date="">24.95</sale>
<retail>24.95</retail>
</price>
<brand>Teva</brand>
<shipping>
<cost currency="USD">
<amount>0.00</amount>
<currency>USD</currency>
</cost>
<information></information>
<availability>Y</availability>
</shipping>
<keywords></keywords>
<upc>737872649135</upc>
<m1>503158_1.</m1>
<pixel>http://ad.linksynergy./fs-bin/show?id=LUyP0GcLCGc&bids=276223.52863929&type=15&subid=0</pixel>
<attributeClass class_id="60">
<Misc></Misc>
<Product_Type>Shoes</Product_Type>
<Size>6 M</Size>
<Material></Material>
<Color>CAMO/ DARK OLIVE</Color>
<Gender>Unisex</Gender>
<Style></Style>
<Age></Age>
</attributeClass>
</product>
Share
Improve this question
edited Dec 16, 2013 at 4:34
JVG
asked Dec 16, 2013 at 3:42
JVGJVG
21.2k48 gold badges140 silver badges215 bronze badges
2 Answers
Reset to default 10You have two possibilities to tackle your issue.
As stated by damphat, XML2JS needs the full XML content before it can parse the data. But you have a file stream, which, well, streams data chunk by chunks. The first solution is to convert this stream of data into a nice big Buffer, and then send it to XML2JS. For this purpose, you can use the stream-to
package (npm i stream-to
) which will convert the file stream into an array of buffers, which we'll then concatenate into one single buffer using Buffer.concat
, like this:
var fs = require('fs')
var streamTo = require('stream-to')
var xml2js = require('xml2js')
var file = fs.createReadStream('input.xml')
streamTo.array(file, function (err, arr) {
if (err) return console.log(err.message)
var content = Buffer.concat(arr)
var parser = new xml2js.Parser()
parser.parseString(content, function (err, res) {
if (err) return console.log(err.message)
console.log(res.merchandiser.product)
})
})
This works quite well, but since it needs to hold the full file into memory, it won't work if your input files are really big. To handle really big files, you need to use a streaming XML parser, such as sax
. However sax
doesn't create Javascript objects, but is an EventEmitter, and is a bit harder to use since you have to handle all relevant events to build your object on the fly.
You can use for instance the SaXPath library, which supports a small subset of the XPath syntax. This library emits a match
event every time it matches the XPath pattern. Here's an example:
var saxpath = require('saxpath')
var fs = require('fs')
var sax = require('sax')
var saxParser = sax.createStream(true)
var streamer = new saxpath.SaXPath(saxParser, '/merchandiser/product')
streamer.on('match', function(xml) {
console.log(xml);
});
fs.createReadStream('input.xml').pipe(saxParser)
You then have two options:
- Since you now have the XML that matches only one product at a time, you can use
xml2js
to parse a single product at a time - SaXPath supports multiple recorders: the default recorder listens to sax events and re-creates the corresponding XML (which is what allowed us to use the first solution), but you can roll out your own recorder, that listens to sax events and creates on the fly javascript objects.
xml2js is for full loaded xml.
In your case using sax, it is a stream parser:
// install
npm install sax
// this code is for print all product_id
var fs = require('fs');
var sax = require('sax');
var saxStream = sax.createStream();
saxStream.onopentag = function (node) {
if(node.name === 'PRODUCT'){
console.log(node.attributes.PRODUCT_ID);
}
};
fs.createReadStream('xml/bigXML.xml').pipe(saxStream);
ouput:
52863929
26537849
25535647
本文标签: javascriptNodeJS parseStreamdefining a start and end point for a chunkStack Overflow
版权声明:本文标题:javascript - NodeJS parseStream, defining a start and end point for a chunk - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741498186a2381928.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论