admin管理员组文章数量:1193389
I am trying to scrape swim club/team data from the USA Swimming - Find a Team page () using Selenium. The page displays swim club/team details in modals that appear when clicking pins on the map.
Steps in My Approach
Locate Map Pins: Each pin on the map (<div class="maplibregl-marker">) represents a location.
Click a Pin: Clicking a pin opens a modal (<div class="popup-content-container">) listing swim clubs or teams associated with that location.
Extract Data from Modal:
Club Name
Email
Phone
Website
Club Size
Address
Iterate Through Pins: Repeat the process for each pin.
Challenges
Modal Detection Issue:
After clicking a pin, the modal sometimes isn’t detected.
I use WebDriverWait with presence_of_element_located, but the script often fails with:
Error locating modal or swim club links: Message:
Efficient Pin Iteration:
The map spans the entire USA, and pins are scattered across the country.
Panning or zooming manually isn’t scalable, and I need a way to iterate through all pins efficiently.
My Current Code
Here’s the script I am using:
from selenium import webdriver
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time
# Initialize Selenium WebDriver
options = webdriver.ChromeOptions()
# Comment this line to visually debug
# options.add_argument('--headless')
driver = webdriver.Chrome(options=options)
# URL of the Swim Club Finder page
url = ";
# Open the website
driver.get(url)
# Wait for the map to load
wait = WebDriverWait(driver, 20)
# Data storage
swim_club_data = []
try:
# Locate all pins on the map
print("Locating pins on the map...")
pins = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "maplibregl-marker")))
print(f"Found {len(pins)} pins on the map.")
# Limit for testing
for i, pin in enumerate(pins[:5]): # Test with the first 5 pins
print(f"Clicking pin {i+1}...")
driver.execute_script("arguments[0].click();", pin) # Use JavaScript click
time.sleep(3) # Allow modal to load
# Locate the modal
modal = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "popup-content-container")))
print("Modal located.")
# Extract links to swim clubs in the modal
club_links = modal.find_elements(By.XPATH, "//ul/li/a")
print(f"Found {len(club_links)} swim club links.")
for club_link in club_links:
# Click on each club link
print(f"Clicking swim club link: {club_link.text}...")
driver.execute_script("arguments[0].click();", club_link)
time.sleep(3) # Allow details to load
# Extract swim club details
try:
details_modal = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "popup-content-container")))
club_name = details_modal.find_element(By.CLASS_NAME, "popupTitle").text
email = details_modal.find_element(By.CSS_SELECTOR, "a[href^='mailto:']").get_attribute("href").replace("mailto:", "")
phone = details_modal.find_element(By.CSS_SELECTOR, "a[href^='tel:']").get_attribute("href").replace("tel:", "")
website = details_modal.find_element(By.CSS_SELECTOR, "a[target='_blank']").get_attribute("href")
club_size = details_modal.find_element(By.XPATH, "//li[contains(text(), 'Club Size')]").text.split(": ")[1]
address = details_modal.find_element(By.XPATH, "//ul[@class='popupSubTitle']/following-sibling::text()").strip()
swim_club_data.append({
"Name": club_name,
"Email": email,
"Phone": phone,
"Website": website,
"Club Size": club_size,
"Address": address
})
print(f"Extracted data for: {club_name}")
except Exception as e:
print(f"Error extracting club details: {e}")
# Close the club details modal
try:
close_button = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "popup-close")))
close_button.click()
time.sleep(2)
except Exception as e:
print(f"Error closing club details modal: {e}")
# Close the pin modal
try:
close_button = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "popup-close")))
close_button.click()
time.sleep(2)
except Exception as e:
print(f"Error closing pin modal: {e}")
except Exception as e:
print(f"Error interacting with the map: {e}")
# Quit the browser
driver.quit()
# Print the results
print("Final Extracted Data:")
for club in swim_club_data:
print(club)
# Save to CSV
if swim_club_data:
df = pd.DataFrame(swim_club_data)
df.to_csv("swim_clubs.csv", index=False)
print("Data saved to 'swim_clubs.csv'")
else:
print("No data to save.")
Questions
Why does the modal fail to load or be detected by Selenium? Are there any additional steps or elements I should be waiting for after clicking a pin?
How can I efficiently iterate through all pins across the map? The pins are spread over the entire USA. Is there a way to programmatically pan and zoom the map to detect all pins?
Is there an alternative way to access the data directly? Could the data be fetched via an API or a JSON structure embedded on the page?
What I’ve Tried
- Adjusted wait times and timeouts.
- Used non-headless mode to observe interactions.
- Verified class names and elements with browser dev tools.
- Explored the Network tab in dev tools for potential API calls.
Any guidance or suggestions would be greatly appreciated!
版权声明:本文标题:python - How to scrape swim clubteam data from a map with modals using Selenium efficiently? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1738471840a2088637.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论