javascript - How to scrape multiple pages with an unchanging URL - Python 3 - Stack Overflow

IT技术

更新时间：2025-03-130

admin管理员组
文章数量:1305565

I recently got in touch with web scraping and tried to web scrape various pages. For now, I am trying to scrape the following site -

So far I've used selenium to get the longitude and latitude scraped. However, my code right now only extracts the first page. I know there is a dynamic web scraping that executes javascript and loads different pages, but had hard time trying to find a right solution. I was wondering if there's a way to access the other 49 pages or so, because when I click next page the URL does not change because it is set, so I cannot just iterate over a different URL each time

Following is my code so far:

import os
import requests
import csv
import sys
import time
from bs4 import BeautifulSoup

page = requests.get('')

soup = BeautifulSoup(page.text, 'html.parser')

for row in soup.find_all('div',class_='re_RNew'):
    name = row.find('p',class_='re_NameNew').string
    info = row.find('input').get('value')
    location = info.split('|')
    location_data = location[0].split(',')
    longitude = location_data[0]
    latitude = location_data[1]
    print(longitude, latitude)

Thank you so much for helping out. Much appreciated

I recently got in touch with web scraping and tried to web scrape various pages. For now, I am trying to scrape the following site - http://www.pizzahut..cn/StoreList

So far I've used selenium to get the longitude and latitude scraped. However, my code right now only extracts the first page. I know there is a dynamic web scraping that executes javascript and loads different pages, but had hard time trying to find a right solution. I was wondering if there's a way to access the other 49 pages or so, because when I click next page the URL does not change because it is set, so I cannot just iterate over a different URL each time

Following is my code so far:

import os
import requests
import csv
import sys
import time
from bs4 import BeautifulSoup

page = requests.get('http://www.pizzahut..cn/StoreList')

soup = BeautifulSoup(page.text, 'html.parser')

for row in soup.find_all('div',class_='re_RNew'):
    name = row.find('p',class_='re_NameNew').string
    info = row.find('input').get('value')
    location = info.split('|')
    location_data = location[0].split(',')
    longitude = location_data[0]
    latitude = location_data[1]
    print(longitude, latitude)

Thank you so much for helping out. Much appreciated

Share Improve this question edited Feb 27, 2018 at 5:52 asked Feb 26, 2018 at 10:01 DanLee 3491 gold badge4 silver badges12 bronze badges

Add a ment |

1 Answer 1

Sorted by: Reset to default 7

Steps to get the data:

Open the developer tools in your browser (for Google Chrome it's Ctrl+Shift+I). Now, go to the XHR tab which is located inside the Network tab.

After doing that, click on the next page button. You'll see the following file.

Click on that file. In the General block, you'll see these 2 things that we need.

Scrolling down, in the Form Data tab, you can see the 3 variables as

Here, you can see that changing the value of pageIndex will give all the pages required.

Now, that we've got all the required data, we can write a POST method for the URL http://www.pizzahut..cn/StoreList/Index using the above data.

Code:

I'll show you the code to scrape first 2 pages, you can scrape any number of pages you want by changing the range().

for page_no in range(1, 3):
    data = {
        'pageIndex': page_no,
        'pageSize': 10,
        'keyword': '输入餐厅地址或餐厅名称'
    }
    page = requests.post('http://www.pizzahut..cn/StoreList/Index', data=data)
    soup = BeautifulSoup(page.text, 'html.parser')

    print('PAGE', page_no)
    for row in soup.find_all('div',class_='re_RNew'):
        name = row.find('p',class_='re_NameNew').string
        info = row.find('input').get('value')
        location = info.split('|')
        location_data = location[0].split(',')
        longitude = location_data[0]
        latitude = location_data[1]
        print(longitude, latitude)

Output:

PAGE 1
31.085877 121.399176
31.271117 121.587577
31.098122 121.413396
31.331458 121.440183
31.094581 121.503654
31.270737000 121.481178000
31.138214 121.386943
30.915685 121.482079
31.279029 121.529255
31.168283 121.283322
PAGE 2
31.388674 121.35918
31.231706 121.472644
31.094857 121.219961
31.228564 121.516609
31.235717 121.478692
31.288498 121.521882
31.155139 121.428885
31.235249 121.474639
30.728829 121.341429
31.260372 121.343066

Note: You can change the results per page by changing the value of pageSize (currently it's 10).

本文标签： javascriptHow to scrape multiple pages with an unchanging URLPython 3Stack Overflow

版权声明：本文标题：javascript - How to scrape multiple pages with an unchanging URL - Python 3 - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741804500a2398419.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

javascript - How to scrape multiple pages with an unchanging URL - Python 3 - Stack Overflow

1 Answer 1

Steps to get the data:

Code:

更多相关文章

javascript - How to scrape multiple pages with an unchanging URL - Python 3 - Stack Overflow

发表评论

推荐文章

Add WordPress hook outside of Plugin or Theme

javascript - Weather API request cors error - Stack Overflow

jquery - Creating JavaScript API for first time - Stack Overflow

javascript - Can&#39;t save the changes i made in the VSCode - Stack Overflow

javascript - Get the difference two objects by subtracting properties - Stack Overflow

热门文章

JavaScriptjQuery in ASP .NET MVC layout page not working - Stack Overflow

java - &quot;Duplicate key value violates unique constraint &quot;int_lock_pk&quot;&quot; while locking in Amazo

plugins - Set featured image randomly from Wordpress Database on post submission

javascript - Mongoose.connect not throwing any error, when Mongodb is not running - Stack Overflow

Boolean array masks in Javascript - Stack Overflow

jquery - how to get Hmac code with javascript - Stack Overflow

angular - switchMap is not executing its code in Http Interceptor - Stack Overflow

Form input to JavaScript Object - Stack Overflow

javascript &quot;document.getElementById undefined&quot; - Stack Overflow

node.js - Why are my typescript interfaces being compiled to javascript? - Stack Overflow

最新文章

Win7各正式版下载地址和SHA验证

怎么样把中文版的Windows7改成英文版的Windows7

Win7系统笔记本蓝牙打开指南：详细步骤助你轻松连接

win7开机弹计算机,win7开机弹出Windows Installer窗口的解决方法

windows7虚拟机安装vmtools方法

esp32 - I can&#39;t get a response from the AT command on the lilygo t-sim 7670 board - Stack Overflow

javascript - How to set database connection string in Waterline ORM - Stack Overflow

plugin development - How to get color name in PanelColorSettings in custom Gutenberg block?

excel - How to make picture %60 transparency in vba? - Stack Overflow

javascript - How to add Timestamp in firebase cloud functions - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

javascript - Can't save the changes i made in the VSCode - Stack Overflow

java - "Duplicate key value violates unique constraint "int_lock_pk"" while locking in Amazo

javascript "document.getElementById undefined" - Stack Overflow

esp32 - I can't get a response from the AT command on the lilygo t-sim 7670 board - Stack Overflow