admin管理员组文章数量:1332374
I am trying to parse /
I can access the divs that have class_=review_body
if I download the html files locally on to my system. But if I try to use Selenium, the divs aren't being detected.
With my research online, one suggestion is to wait for the JavaScript to load, but I don't know if that's really the problem.
This is a snippet of my code:
for album_url in albums:
print(f"Processing album: {album_url}")
has_reviews = True # Flag to check for reviews
for page in range(1, 100): # Assume a maximum of 100 pages
try:
url = f"{album_url}{page}/"
print(f"Scraping page: {url}")
driver.get(url)
# Parse page source with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
# Extract reviews
header = soup.find_all('div', class_='review_header')
print(header)
reviews = soup.find_all('div', class_='review_body')
print(reviews)
if not reviews:
print(f"No reviews found on page {page}. Moving to the next album.")
has_reviews = False
break
I tried implementing code that waits for the JS to load and detects the review_body divs, but that didn't work.
I am trying to parse https://rateyourmusic/release/album/tyler-the-creator/igor/reviews/1/
I can access the divs that have class_=review_body
if I download the html files locally on to my system. But if I try to use Selenium, the divs aren't being detected.
With my research online, one suggestion is to wait for the JavaScript to load, but I don't know if that's really the problem.
This is a snippet of my code:
for album_url in albums:
print(f"Processing album: {album_url}")
has_reviews = True # Flag to check for reviews
for page in range(1, 100): # Assume a maximum of 100 pages
try:
url = f"{album_url}{page}/"
print(f"Scraping page: {url}")
driver.get(url)
# Parse page source with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
# Extract reviews
header = soup.find_all('div', class_='review_header')
print(header)
reviews = soup.find_all('div', class_='review_body')
print(reviews)
if not reviews:
print(f"No reviews found on page {page}. Moving to the next album.")
has_reviews = False
break
I tried implementing code that waits for the JS to load and detects the review_body divs, but that didn't work.
Share Improve this question asked Nov 21, 2024 at 7:04 NateNate 1 2- Are you running selenium on headless mode? If so, are you in an environment where you can run with the UI to investigate it? – atb00ker Commented Nov 21, 2024 at 7:06
- 1 Those divs aren't being dynamically created with java script, so that's not the issue. The issue is they are detecting your bot and blocking you. – chitown88 Commented Nov 21, 2024 at 7:53
1 Answer
Reset to default 0Cloudflare protects your target web app, so you must use the user-agent
header and the cookies
value.
How to find your cookie?:
Firefox --> visit site --> inspect element
--> network tab
--> reload
--> Click one of your target app requests and copy the cookie
value from the request header
Sample code:
import requests
from bs4 import BeautifulSoup
header = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:132.0) Gecko/20100101 Firefox/132.0",
"Cookie": "__cf_bm=I8C6iBjZ.zp.qXQ13P2FsxzLb362R16BWT53UwdQrRk-1732513201-1.0.1.1-CpplZtMR7L8f5AwHD9oY8F2Bj0R1x1fUjywNUgA_lTWE_hGAVt47mczcNLZe5Jyk9tQGD9djIvGJobHvESyzBg; sec_bs=00c4ebc8b28d25682c8dba54389a7840; sec_ts=1732513201; sec_id=d0fb19821d3484c1662fe8dc51619120; _pubcid=8818ae1b-a8d4-4822-81ba-122ab5c9cbc9; _pubcid_cst=zix7LPQsHA%3D%3D; _lr_retry_request=true; _lr_env_src_ats=false; _ga_CPSL518SBG=GS1.1.1732513161.1.1.1732513331.33.0.0; _ga=GA1.2.554695171.1732513161; _gid=GA1.2.812266299.1732513162; pbjs-unifiedid=%7B%22TDID%22%3A%227effdc7e-4e13-4d35-9e0b-95cd77f2a88b%22%2C%22TDID_LOOKUP%22%3A%22FALSE%22%2C%22TDID_CREATED_AT%22%3A%222024-11-25T05%3A39%3A22%22%7D; pbjs-unifiedid_cst=zix7LPQsHA%3D%3D; _cc_id=165db17624b3044976fcff430ac28280; panoramaId_expiry=1732599565628; cto_bundle=_HrcpF95aW1hcDJaT2VndFYxRGx2U2QwMnVxYlBTMTJxUEZMUXNQOWRkNyUyQnRFbFd2M05KTFdwbXl3SGI1SUp2RW5UNE9SZkR0WHlYRzhabm1HMEpyZkV0QTNnN1J5YXpzbmhmcGNJbW1MbFNmRjBiQSUyRnVBN0ZzNUZpaW9aODhsQ3NSd21PaVhaNUR6SDluZ0RRbHNyWG1JVmNRJTNEJTNE; __gads=ID=b670e67b3846330c:T=1732513172:RT=1732513172:S=ALNI_MbxzumQZIuPX73CnRySyAh-0ro14Q; __gpi=UID=00000f76d1b941ee:T=1732513172:RT=1732513172:S=ALNI_MZjfUBSgavUsjr5s4vOAkqybJlfMg; __eoi=ID=df7cd92c7b179fe7:T=1732513172:RT=1732513172:S=AA-AfjYHQAPZ3ZLKUyyzCtxS9sex; cf_clearance=R2HH.fp5RxK6s5_hqBLAWZTmKd4hOSC7GYlbc4bXwWI-1732513307-1.2.1.1-Muc3AQpTZJJ3sau8VQ571lM9QX3SHEBkcdfSyOsi3MUe5yblUHu1Yz.60dpHRP9UE7vPlzxskK7lFFtN8HuMXf34GG4kqV4.WMjkLkhzGJI1JoxXtkwZkXiAMmdvr.pL92D9ZA5KkkvKyFA_pICq2dNypVAp7OhtLVi0BbrSx0qGmEzGims2mR_f14TqdLulZmTOXfiUsb6pYZVmfZL7tbZ1GMdATy95OC02vbWCruY.4hKOiEfs24vuLGMoCtWQJOLJn9oPrgoN7QIMpNq3LgWzNQiUi9ah8SDeBoaEaGUIC5hccTLzcl.K14qU4.r7coWY1rW_Sekq4Qt.ZQEzVfK5CJXRP4cQMuDOQcj6NL.taEfdISst_AN1zxyc3qiuuR9on8R_BZcnyazwbdx9QQ; _gat_gtag_UA_59057_1=1; FCNEC=%5B%5B%22AKsRol8yeIu3-UyCFx1yzeTOLDPIwg_hbC0pnkwSgVeQ_qMV46rRoPUhUq2j-DKJnEmeMeAFqMAPRPmSLzUXSyhCjFI0iXEDuWf22DWdpEIGbIDCY5qK1bd6Idg5_P8DfNxrxYXXHeT0x6NMaGx5DrAgQqesIa90cg%3D%3D%22%5D%5D"
}
url = 'https://rateyourmusic/release/album/tyler-the-creator/igor/reviews/1/'
resp = requests.get(url, headers=header).text
soup = BeautifulSoup(resp, 'lxml')
content = soup.findAll('div', class_="review")
for i in content:
try:
title = i.find('div', class_='review_title').text
except Exception:
title = None
pass
print(f"---------------------------\nreview_user: {i.find('a', class_='user').text}\nreview_title: {title}\nreview_body: {i.find('span', itemprop='description').text.strip()}")
Output:
---------------------------
review_user: pinkacemusic
review_title: If perfection is possible, IGOR encapsulates it all around.
review_body: Whether you focus on the raw, dreamy production, the hard-hitting lyrics, the small-yet-grand features or any other aspect of IGOR With the story centralising around the title character and the developing breakup story, Tyler delivers his vision in many genres throughout the album, including hip-hop ("WHAT'S GOOD"), pop ("I THINK"), soul ("EARFQUAKE"), funk ("ARE WE STILL FRIENDS"), R&B ("GONE, GONE / THANK YOU") With undeniably great features from Lil Uzi Vert, Kanye West, CeeLo Green, Playboi Carti and others, from the compositional perfection of ever Favourite track – "BOYFRIEND"ital-version skit, IGOR solidifies itself as a ten out of ten album.
---------------------------
review_user: Emi64
review_title: None
review_body: This is the album that permanently put me onto super artistic music forever. If Tyler The Creator just dropped this in-dipped it would probably be considered among the likes of any other like true music nerd level albums and would probably be in the top 10 of this site in many others. My attachment to this album it difficult to describe what makes it a masterpiece but just know. It is absolutely a masterpiece that will be remembered as more than a classic for years to come
---------------------------
review_user: Qimeunchong
review_title: Tyler, The Creator <IGOR> 2019
review_body: Tyler, The Creator's <IGOR> is the best work of his musical career. Several experimental musical attempts give newness and are goo 100/100ack- EXACTLY WHAT YOU RUN FROM YOU END UP CHASINGamazing.
---------------------------
review_user: Corro837
review_title: None
review_body: Track 1 - IGOR’S THEME: Love the synths and overall production on this song and additional guest vocals from Lil Uzi Vert and Sol Track 2 - EARFQUAKE: Love Tyler's high pitched singing on this song and Playboi Carti delivers one of his best ever feature verse. Love everyt Track 3 - I THINK: Very similarly immaculate production to Igor's theme, love the piano in the outro, also love Tyler's performance on this so Track 6 - NEW MAGIC WAND: This song has the craziest production and best rapping off the album, also the darkest song on the album if you list Track 7 - A BOY IS A GUN: Best production off the album, best written track as well in my opinion. Love the sample on this song and the way it Track 8 - PUPPET: Love the way Tyler flows on this song, and I absolutely love Kanye's feature his voice just sounds so heavenly on top of the Track 9 - WHAT’S GOOD: Grooviest beat off the album during the first half and the second half is also an insane, as some of the best rapping o Track 10 - GONE, GONE / THANK YOU: This is best song off the album, it's such a beautiful song with very melancholic lyrics along with being t Track 11 - I DON’T LOVE YOU ANYMORE: Beutiful track once again with great meaning to the orverall plot of the album. Most underated track off Track 12 - ARE WE STILL FRIENDS? Another beautiful track, it's got the most grandiose production of the album along with the best chorus. The last 30 seconds or so of the song is the best moment off the album and it geniunly might be the best outro to any i've ever heard, overall it's Igor was one of the first albums I ever listened to and to this day is still one of my favorites, I love how basically every song is equally as important to the plot of the album and how it just all wraps up so well with "ARE WE STILL FRIENDS?" outro making the album loop able meaning that Tyler's character on the album is in a never ending loop of falling in love, realizing that it wouldn't work because he already has somebody else, wishing his partner would die, then realizing he wasn't really the one for him, falling out of love, paying his farewells, and ending it off with him trying to fix it by becoming best friends with him again, with the cycle starting over then, and this is just a very summarized version of the plot as it is much more complex than that and there parts that I didn't say. The album is straight up perfect there is not one11.ack L I DON’T LOVE YOU ANYMOREthe rest and the album is very well written overall.
---------------------------
review_user: bjm_b_
review_title: Perfection
review_body: I think Igor is not only the best album of Tyler’s career but also one of the best albums we’ve seen from any artist in recent mem Are We Still Friends? is probably my favorite album closer ever, perfectly concluding an experience that keeps you hooked from start to finish This is one of the best-performed and produced albums I’ve ever listened to. For me, it’s a 10 out of 10.
---------------------------
review_user: SellMeAGod
review_title: None
Tyler, The Creator is good as a rapper cos he has energy. He could never write a song or a hook, for sure. So why do you all settle for this nonsense song based Odd Future? He is horrible as a singer, average as a producer, and has no vision artistically whatsoever. Why would anyone listen to this over early Odd Future? I feel like I am going crazy whenever I talk to people about Tyler, and how radical and real he once felt. What if there was a R&B song about love....? Who cares.great that we still have that early stuff, tho.
---------------------------
review_user: BibjaTV__
review_title: Beautiful
review_body: There is not a single bad song on this album, even the unreleased songs are perfect, even though songs like running out of time might not be songs I reposted to often, songs like new magic wand and Gone Gone/ thank you, in my mind, are classics
---------------------------
review_user: z4m
review_title: Easily his best
review_body: Put simply, IGOR is a concept album executed flawlessly. It tells a relatively simple story of a character dealing with the emotional torment of unrequited love. The narrative begins with infatuation, evolves into jealousy, and ends in heartbreak and regret. Innumerable albums have been devoted to these themes, but Tyler approaches them in a way only he could. His delivery conjures an immature vulnerability, which is fitting when he pleads for a second chance or delivers an impassioned thank you to this person who clearly ruined his life. His pitched up vocals, which show up in nearly every track, add to the wistful melancholy he’s trying to convey. The production is nothing short of cinematic, as Tyler tells a story with his sonic arrangements that is just as powerful as the one he tells with his words. The blaring 808s on “New Magic Wand” match the aggression and recklessness of Tyler’s envy on the track, while the dramatic metamorphosis in the middle of “Gone Gone / Thank You” mirrors the evolution from coming to terms with the end of a romance to excruciating regret. Fetting for one second how emotionally poignant they are, the compositions on IGOR are just as impressive for their lush beauty. Once again, Tyler uses his mastery of synths to enhance his instrumentals, and those dreamy melodies are complemented by swelling strings and soft electric piano progressions. On some occasions, he makes his songs feel massive with those aforementioned blasting 808s, but on others, he opts for a more intimate feel with more relaxing offerings that are soaked in a soothing static. He doesn’t rap much on this record, but when he does, he mixes up his flows with ease while committing to the concept of the album in a way that he had not previously been able to do without indulging in his impulses to be self-aggrandizing or sensationalist. Though some tracks don’t hit as hard as others on IGOR, each one feels essential to the narrative and are made up for many times over by the album’s best moments.
本文标签: html parsingDivs not being detected with BeautifulSoupStack Overflow
版权声明:本文标题:html parsing - Divs not being detected with BeautifulSoup - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742308048a2450313.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论