admin管理员组

文章数量:1193389

I am trying to scrape swim club/team data from the USA Swimming - Find a Team page () using Selenium. The page displays swim club/team details in modals that appear when clicking pins on the map.

Steps in My Approach

Locate Map Pins: Each pin on the map (<div class="maplibregl-marker">) represents a location.
Click a Pin: Clicking a pin opens a modal (<div class="popup-content-container">) listing swim clubs or teams associated with that location.
Extract Data from Modal:
    Club Name
    Email
    Phone
    Website
    Club Size
    Address
Iterate Through Pins: Repeat the process for each pin.

Challenges

Modal Detection Issue:
    After clicking a pin, the modal sometimes isn’t detected.
    I use WebDriverWait with presence_of_element_located, but the script often fails with:

    Error locating modal or swim club links: Message:

Efficient Pin Iteration:
    The map spans the entire USA, and pins are scattered across the country.
    Panning or zooming manually isn’t scalable, and I need a way to iterate through all pins efficiently.

My Current Code

Here’s the script I am using:

from selenium import webdriver
from selenium.webdrivermon.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time

# Initialize Selenium WebDriver
options = webdriver.ChromeOptions()
# Comment this line to visually debug
# options.add_argument('--headless')
driver = webdriver.Chrome(options=options)

# URL of the Swim Club Finder page
url = ";

# Open the website
driver.get(url)

# Wait for the map to load
wait = WebDriverWait(driver, 20)

# Data storage
swim_club_data = []

try:
    # Locate all pins on the map
    print("Locating pins on the map...")
    pins = wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "maplibregl-marker")))
    print(f"Found {len(pins)} pins on the map.")

    # Limit for testing
    for i, pin in enumerate(pins[:5]):  # Test with the first 5 pins
        print(f"Clicking pin {i+1}...")
        driver.execute_script("arguments[0].click();", pin)  # Use JavaScript click
        time.sleep(3)  # Allow modal to load

        # Locate the modal
        modal = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "popup-content-container")))
        print("Modal located.")

        # Extract links to swim clubs in the modal
        club_links = modal.find_elements(By.XPATH, "//ul/li/a")
        print(f"Found {len(club_links)} swim club links.")

        for club_link in club_links:
            # Click on each club link
            print(f"Clicking swim club link: {club_link.text}...")
            driver.execute_script("arguments[0].click();", club_link)
            time.sleep(3)  # Allow details to load

            # Extract swim club details
            try:
                details_modal = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "popup-content-container")))
                club_name = details_modal.find_element(By.CLASS_NAME, "popupTitle").text
                email = details_modal.find_element(By.CSS_SELECTOR, "a[href^='mailto:']").get_attribute("href").replace("mailto:", "")
                phone = details_modal.find_element(By.CSS_SELECTOR, "a[href^='tel:']").get_attribute("href").replace("tel:", "")
                website = details_modal.find_element(By.CSS_SELECTOR, "a[target='_blank']").get_attribute("href")
                club_size = details_modal.find_element(By.XPATH, "//li[contains(text(), 'Club Size')]").text.split(": ")[1]
                address = details_modal.find_element(By.XPATH, "//ul[@class='popupSubTitle']/following-sibling::text()").strip()

                swim_club_data.append({
                    "Name": club_name,
                    "Email": email,
                    "Phone": phone,
                    "Website": website,
                    "Club Size": club_size,
                    "Address": address
                })
                print(f"Extracted data for: {club_name}")

            except Exception as e:
                print(f"Error extracting club details: {e}")

            # Close the club details modal
            try:
                close_button = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "popup-close")))
                close_button.click()
                time.sleep(2)
            except Exception as e:
                print(f"Error closing club details modal: {e}")

        # Close the pin modal
        try:
            close_button = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "popup-close")))
            close_button.click()
            time.sleep(2)
        except Exception as e:
            print(f"Error closing pin modal: {e}")

except Exception as e:
    print(f"Error interacting with the map: {e}")

# Quit the browser
driver.quit()

# Print the results
print("Final Extracted Data:")
for club in swim_club_data:
    print(club)

# Save to CSV
if swim_club_data:
    df = pd.DataFrame(swim_club_data)
    df.to_csv("swim_clubs.csv", index=False)
    print("Data saved to 'swim_clubs.csv'")
else:
    print("No data to save.")

Questions

  • Why does the modal fail to load or be detected by Selenium? Are there any additional steps or elements I should be waiting for after clicking a pin?

  • How can I efficiently iterate through all pins across the map? The pins are spread over the entire USA. Is there a way to programmatically pan and zoom the map to detect all pins?

  • Is there an alternative way to access the data directly? Could the data be fetched via an API or a JSON structure embedded on the page?

What I’ve Tried

  • Adjusted wait times and timeouts.
  • Used non-headless mode to observe interactions.
  • Verified class names and elements with browser dev tools.
  • Explored the Network tab in dev tools for potential API calls.

Any guidance or suggestions would be greatly appreciated!

本文标签: pythonHow to scrape swim clubteam data from a map with modals using Selenium efficientlyStack Overflow