admin管理员组文章数量:1122846
I am trying to scrape .php and have the results returned in a dataframe. There are about a dozen tables on the page, and ideally I could loop through all of them while appending a new column denoting which market they correspond to. I have been unable to find the direct API link in the developer tools and then tried using requests/BeautifulSoup like this unsuccessfully, but was able to get the visible part of the first table using Selenium with the code below (it returns one column but I figured I could find a way to reshape it later).
import selenium
from selenium import webdriver
from selenium.webdrivermon.by import By
import time
import os
from dotenv import load_dotenv
load_dotenv()
##Some code to login in
def get_nba_props():
driver = webdriver.Chrome()
driver.get('.php')
data = []
markets = driver.find_elements(By.CLASS_NAME, "prop-table")
for market in markets:
bodies = market.find_elements(By.CLASS_NAME, "webix_ss_body")
for body in bodies:
cells = body.find_elements(By.CLASS_NAME, "webix_cell")
for cell in cells:
cell_text = cell.text
data.append(cell_text)
df = pd.DataFrame(data, columns=['Value'])
driver.quit()
return df
Have tried a few variations but can't seem to get anything to work (both in terms of the other tables and in returning the non-visible rows). New to web scraping/Selenium so any help is greatly appreciated. I am a site subscriber so using the 'export CSV' button is also a possibility if that's an easier route.
I am trying to scrape https://www.rotowire.com/betting/nba/player-props.php and have the results returned in a dataframe. There are about a dozen tables on the page, and ideally I could loop through all of them while appending a new column denoting which market they correspond to. I have been unable to find the direct API link in the developer tools and then tried using requests/BeautifulSoup like this unsuccessfully, but was able to get the visible part of the first table using Selenium with the code below (it returns one column but I figured I could find a way to reshape it later).
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
import os
from dotenv import load_dotenv
load_dotenv()
##Some code to login in
def get_nba_props():
driver = webdriver.Chrome()
driver.get('https://www.rotowire.com/betting/nba/player-props.php')
data = []
markets = driver.find_elements(By.CLASS_NAME, "prop-table")
for market in markets:
bodies = market.find_elements(By.CLASS_NAME, "webix_ss_body")
for body in bodies:
cells = body.find_elements(By.CLASS_NAME, "webix_cell")
for cell in cells:
cell_text = cell.text
data.append(cell_text)
df = pd.DataFrame(data, columns=['Value'])
driver.quit()
return df
Have tried a few variations but can't seem to get anything to work (both in terms of the other tables and in returning the non-visible rows). New to web scraping/Selenium so any help is greatly appreciated. I am a site subscriber so using the 'export CSV' button is also a possibility if that's an easier route.
Share Improve this question asked Nov 22, 2024 at 23:16 AMJAMJ 236 bronze badges 2- 1 The csv option is much better than parsing the html. – LMC Commented Nov 22, 2024 at 23:25
- Thanks - any guidance on how I could implement that to create a dataframe of all the tables without having to manually open a bunch locally downloaded CSV files? – AMJ Commented Nov 29, 2024 at 19:47
1 Answer
Reset to default 0for the requests
method: the data is here but you must parse some raw javascript and as LMC said in the comments, getting the csv is a better solution. but the javascript is easy to parse here: so why not? the data is mostly alone in one big line and needs only some trimming.
assuming you already have the soup, this should do:
import json
data = {}
for prop_table in soup.select('.prop-table'):
# each prop-table has a script tag in it containing the data
# the data is alone on its line, isolate it
raw_javascript = [
line.strip()
for line in prop_table.script.text.splitlines()
if line.strip().startswith('data')
]
# [0]: there's only one line starting with "data" per script
# [6:-1]: remove the "data: " part and the trailing comma
json_string = raw_javascript[0][6:-1]
# section['data-prop']: PTS, REB, AST...
data[section['data-prop']] = json.loads(json_string)
print(data)
本文标签: pythonScraping RotoWire Player Props and Returning Them as a DataFrameStack Overflow
版权声明:本文标题:python - Scraping RotoWire Player Props and Returning Them as a DataFrame - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736300383a1930794.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论