Scraping javascript with R - Stack Overflow

IT技术

更新时间：2025-04-151

admin管理员组
文章数量:1391929

I want to download tables from metal-archives, exactly from , but there is one big problem. This tables are generated by javascript. In fact I don't know what to do in this case.

Is there a possibility to parse this site with R and XML package?

I want to download tables from metal-archives., exactly from http://www.metal-archives./artist/rip, but there is one big problem. This tables are generated by javascript. In fact I don't know what to do in this case.

Is there a possibility to parse this site with R and XML package?

Share Improve this question asked Feb 12, 2012 at 15:23 Maciej 3,3131 gold badge30 silver badges43 bronze badges

Add a ment |

2 Answers 2

Sorted by: Reset to default 7

Here's all information in JSON format

http://www.metal-archives./artist/ajax-rip

Thanks to user bubmu I achieve what I wanted. Below is code, which solves my problem.

a<-1:8
b<-200*a
x<-paste("http://www.metal-archives./artist/ajax-rip?iDisplayStart=",b,"&sEcho=",a,sep="")
x<-c(x,"http://www.metal-archives./artist/ajax-rip?iDisplayStart=1700&sEcho=9")

JSONparse<-function(x){
  library(XML)
  doc<-htmlParse(x)
  str<-xpathApply(doc,'//p',xmlValue)[[1]][1]
  x1<-strsplit(str,'\\[')
  x1<-x1[[1]][-1]
  x1<-x1[-1]

  x2<-strsplit(x1,'\\",')
  x3<-lapply(x2, function(y) {
    y<-gsub('\\t','',y)
    y<-gsub('\\n','',y)
    y<-gsub('\\r','',y)
    y<-gsub('\\\"','',y)
    y<-gsub('\\]}','',y)
    y<-gsub('\\],','',y)
    y<-as.data.frame(t(y))
    y})

  allinall<-do.call('rbind',x3)
  colnames(allinall)<-c("Artist","Country","Band","When","Why")
  allinall
}

metallum<-lapply(x,JSONparse)
metallum<-do.call('rbind',metallum)

But it works only for this site. Of course better is RJSONIO or rjson package.

本文标签： Scraping javascript with RStack Overflow

版权声明：本文标题：Scraping javascript with R - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744717595a2621497.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

Scraping javascript with R - Stack Overflow

2 Answers 2

更多相关文章

Scraping javascript with R - Stack Overflow

发表评论

推荐文章

javascript - Using the .find().fetch() from within a function in Meteor - Stack Overflow

javascript - How to put two divs created programmatically side by side? - Stack Overflow

javascript - How to add my own color list in quill editor font-color menu? - Stack Overflow

javascript - Loop Through GeoJSON Properties - Stack Overflow

plugin development - How do I resolve Notice: Undefined offset: 0 in wp-includescapabilities.php on line 1145

热门文章

javascript - Cleanup of AudioNodes in Web Audio - Stack Overflow

javascript - HTML - placing rows in a nested table - Stack Overflow

A7670C GSM AT commands to send email - Stack Overflow

javascript - How to add more input fields when user click on plus sign - Stack Overflow

javascript - Overwriting XMLHttpRequest.responseText - Stack Overflow

javascript - Angularjs Request header field Access-Control-Allow-Headers is not allowed by Access-Control-Allow-Headers in prefl

javascript - How to get background page from a window opened in chrome extension? - Stack Overflow

finance - How to color code volume bars based on thresholds? - Stack Overflow

plugins - What is the right way to show reusable content

Is this the correct way to add post-slug input field?

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

php - How to upload and read textcsv file without submitting? - Stack Overflow

plugins - Wordpress - WPBakery - Near Footer jump issue

linear programming - Reduced Costs in IBM ILOG CPLEX Optimization Studio - Stack Overflow

javascript - A hundred contexts vs one context in HTML5 Canvas? - Stack Overflow

Laravel Nova Breadcrumbs with Custom Icons - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价