admin管理员组文章数量:1391929
I want to download tables from metal-archives, exactly from , but there is one big problem. This tables are generated by javascript. In fact I don't know what to do in this case.
Is there a possibility to parse this site with R and XML package?
I want to download tables from metal-archives., exactly from http://www.metal-archives./artist/rip, but there is one big problem. This tables are generated by javascript. In fact I don't know what to do in this case.
Is there a possibility to parse this site with R and XML package?
Share Improve this question asked Feb 12, 2012 at 15:23 MaciejMaciej 3,3131 gold badge30 silver badges43 bronze badges2 Answers
Reset to default 7Here's all information in JSON format
http://www.metal-archives./artist/ajax-rip
Thanks to user bubmu I achieve what I wanted. Below is code, which solves my problem.
a<-1:8
b<-200*a
x<-paste("http://www.metal-archives./artist/ajax-rip?iDisplayStart=",b,"&sEcho=",a,sep="")
x<-c(x,"http://www.metal-archives./artist/ajax-rip?iDisplayStart=1700&sEcho=9")
JSONparse<-function(x){
library(XML)
doc<-htmlParse(x)
str<-xpathApply(doc,'//p',xmlValue)[[1]][1]
x1<-strsplit(str,'\\[')
x1<-x1[[1]][-1]
x1<-x1[-1]
x2<-strsplit(x1,'\\",')
x3<-lapply(x2, function(y) {
y<-gsub('\\t','',y)
y<-gsub('\\n','',y)
y<-gsub('\\r','',y)
y<-gsub('\\\"','',y)
y<-gsub('\\]}','',y)
y<-gsub('\\],','',y)
y<-as.data.frame(t(y))
y})
allinall<-do.call('rbind',x3)
colnames(allinall)<-c("Artist","Country","Band","When","Why")
allinall
}
metallum<-lapply(x,JSONparse)
metallum<-do.call('rbind',metallum)
But it works only for this site. Of course better is RJSONIO or rjson package.
本文标签: Scraping javascript with RStack Overflow
版权声明:本文标题:Scraping javascript with R - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744717595a2621497.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论