admin管理员组文章数量:1290934
This is valid XPath in Javascript:
id("priceInfo")/div[@class="standardProdPricingGroup"]/span[1]
And this turned into valid PHP XPath to be used with DOMXPath->query() is
//*[@id="priceInfo"]//div[@class="standardProdPricingGroup"]//span[1]
- do you know any libraries or custom ponents that already do this transformation?
- do you know available documentation that lists the two syntax differences?
My main concern is that there could be a lot of differences, and I am looking to identify these differences, and I have problems to identify these.
The question could be put also in different way: Since Javascript can have different valid XPath formats, how to normalize them to work with the PHP.
One of the updates also mention that the id() function is valid XPath if there is a valid DTD that contains this definition. I don't have power over the input DTD, and if there is a way to find a solution that works without any specific DTD it would be awesome.
Update:
I want to transform the first format into the second with an algorithm. My input is the first one and not the second one. Can't change this.
As @Nison Maël pointed out, the 2nd format is valid Javascript XPath as presented here: this unfortunately just adds to the problem of Javascript XPath "fragmentation".
@salathe pointed out that the valid Javascript XPath query works fine in PHP if the input documented has valid DTD ( @Dimitre Novatchev mentioned this in a ment, but overlooked the importance). Unfortunately I don't have control of the input DTD, so now I have to investigate a way to overe this, or to find a solution that works even without valid DTD.
This is valid XPath in Javascript:
id("priceInfo")/div[@class="standardProdPricingGroup"]/span[1]
And this turned into valid PHP XPath to be used with DOMXPath->query() is
//*[@id="priceInfo"]//div[@class="standardProdPricingGroup"]//span[1]
- do you know any libraries or custom ponents that already do this transformation?
- do you know available documentation that lists the two syntax differences?
My main concern is that there could be a lot of differences, and I am looking to identify these differences, and I have problems to identify these.
The question could be put also in different way: Since Javascript can have different valid XPath formats, how to normalize them to work with the PHP.
One of the updates also mention that the id() function is valid XPath if there is a valid DTD that contains this definition. I don't have power over the input DTD, and if there is a way to find a solution that works without any specific DTD it would be awesome.
Update:
I want to transform the first format into the second with an algorithm. My input is the first one and not the second one. Can't change this.
As @Nison Maël pointed out, the 2nd format is valid Javascript XPath as presented here: http://jsbin./elatum/2/edit this unfortunately just adds to the problem of Javascript XPath "fragmentation".
@salathe pointed out that the valid Javascript XPath query works fine in PHP if the input documented has valid DTD ( @Dimitre Novatchev mentioned this in a ment, but overlooked the importance). Unfortunately I don't have control of the input DTD, so now I have to investigate a way to overe this, or to find a solution that works even without valid DTD.
Share Improve this question edited Aug 6, 2012 at 22:43 Kev 120k53 gold badges305 silver badges391 bronze badges asked Aug 3, 2012 at 13:06 Pentium10Pentium10 208k124 gold badges434 silver badges511 bronze badges 9- 1 This is a great question! It doesn't look like there's any documentation out there (at least not through a cursory google search). I'm excited to see the answer to this one. – Matt Commented Aug 3, 2012 at 13:14
-
The first expression is a legal XPath expression. However, for the Xpath function
id()
to work, the XML must have a DTD and element definitions in the DTD must have attributes thet have theID
keyword. – Dimitre Novatchev Commented Aug 3, 2012 at 13:25 -
@DimitreNovatchev: And what about the translation of
//
to/
? – choroba Commented Aug 3, 2012 at 13:38 -
1
@choroba Java is not mentioned once in this whole question. Also,
id()
is a node set function mentioned in the very spec you linked to. – toniedzwiedz Commented Aug 5, 2012 at 14:25 - I do not think that javascript has a much different xpath than php. I mean the xpath language should be the same, right? Can you please add to which javascript xpath you are referring to specifically? For php it's clear, there is only one. but wait, there is more than one but you already wrote you are referring to the standard DOMDocument extension, right. – hakre Commented Aug 5, 2012 at 17:10
3 Answers
Reset to default 7 +250Just seeing that Salathe actually answered the same, but taking your ment into account and to stress this a bit more:
You do not need to specify any DTD. As long as you use the DOMDocument::loadHTML
or DOMDocument::loadHTMLFile
functions, the HTML id
attribute is actually registered for the the xpath id()
function. With the demo HTML given in http://jsbin./elatum/2/edit, you even get an error when you load the document:
Warning: DOMDocument::loadHTMLFile(): ID priceInfo already defined in ...
Which is already a sign that this is a true ID attribute because it moans about duplicates. A related sample code looks like:
$xpath = 'id("priceInfo")/div[@class="standardProdPricingGroup"]/span[1]';
$doc = new DOMDocument();
$doc->loadHTMLFile(__DIR__ . '/../data/file-11796340.html');
$xp = new DOMXPath($doc);
$r = $xp->query($xpath);
echo $xpath, "\n";
echo $r ? $r->length : 0, ' elements found', "\n";
if (!$r) return;
foreach($r as $node) {
echo " - ", $node->nodeValue, "\n";
}
The output is:
id("priceInfo")/div[@class="standardProdPricingGroup"]/span[1]
1 elements found
- hello
In case you need more control, first run an xpath to mark all HTML id
attributes as ID for xpath:
$r = $xp->query("//*[@id]");
if ($r) foreach($r as $node) {
$node->setIdAttribute('id', true);
}
You can then use the same xpath with the id()
function, no need to change it.
Can't you just translate id("...")
to //*[@id="..."][1]
at the start of your expression?
For instance, if can assume you won't have any parentheses in the id(...)
expressions:
$queryRewritten = preg_replace('/^id\(([^\)]+)\)/','//*[@id=$1][1]',$query);
Sample code
EDIT: corrected the replacement, id() imust be the first in the expression
This isn't a full answer but it's too big to put as a ment and it may help you a little.
If you have control over the input XML, then instead of using a DTD to declare id
attributes, you can declare them explicitly in the XML document itself by prefixing id
attributes with xml:
.
For example, if you had XML of
<foo id="x27"/>
and changed it to
<foo xml:id="x27"/>
then the id() function would recognise that attribute as a formal XML id
type, not just as an attribute with the name id
.
I know this "trick" works on the Saxon processor, but I must admit I've not tried it with PHP.
W3C xml:id
本文标签: Transform Javascript XPath in valid PHP query() XPathnormalize JS XPath gt PHPStack Overflow
版权声明:本文标题:Transform Javascript XPath in valid PHP query() XPath | normalize JS XPath --> PHP - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741508781a2382480.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论