web scraping - VBA scrape URL with nested DIV's - Stack Overflow

IT技术

更新时间：2025-03-130

admin管理员组
文章数量:1306834

I've been following this This Tutorial to scrape data from a URL as it matches my need pretty closely (3 div's deep). Unfortunately where StackOverflow no longer supports IE I can't test the Tutorial's code to see if it works as is. In my case I cannot use the chrome plugin and I have to authenticate to the site first before navigating to the URL. I also tried the solutions on question 15191847 - specifically gembird's solution - it gave me the same error.

When I run the below I get a "Run-time error '91'. I printed the ie.document to a text file and verified that the div id's I'm searching on are correct and that they're being captured. The error tosses on Set Questions = QuestionList.Children. Any thoughts on why it may be showing me the error?

Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim QuestionList As IHTMLElement, QuestionField As IHTMLElement
Dim Questions As IHTMLElementCollection, QuestionFieldLinks As IHTMLElementCollection, QuestionFields As IHTMLElementCollection
Dim Question As IHTMLElement

Dim RowNumber As Long
Dim votes As String, url As String, views As String, QuestionId As String

url = "<<my url>>"

'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate url


'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
  Application.StatusBar = "Trying to go to " & url
  DoEvents
Loop

Cells.Clear
'show text of HTML document returned
Set html = ie.Document

'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""

'put heading across the top of row 3
Range("A3").Value = "Field"
Range("B3").Value = "Values"

Set QuestionList = html.getElementByID("fieldgroup ")    
Set Questions = QuestionList.Children

RowNumber = 4

For Each Question In Questions
  If Question.className = "fieldrow _text-field" Then
    'get a list of all of the parts of this question, and loop over them
    Set QuestionFields = Question.all
    
    For Each QuestionField In QuestionFields
      'if this is the question's votes, store it (get rid of any surrounding text)
      If QuestionField.className = "fieldlabel" Then
        Cells(RowNumber, 1).Value = Trim(QuestionField.innerText)
      End If

      'likewise for views (getting rid of any text)
      If QuestionField.className = "fieldvalue" Then
        Cells(RowNumber, 2).Value = Trim(QuestionField.innerText)
      End If
    Next QuestionField
    'go on to next row of worksheet
    RowNumber = RowNumber + 1
    
  End If
Next
Set html = Nothing

The HTML output looks like this.

<div class="fieldgroup " style="" group-title="">
 
<div class="fieldrow _text-field">

<div class="fieldlabel">Reporting</div>
<div class="fieldvalue">Yes</div>
 </div>
    
<div class="fieldrow _text-field">
<div class="fieldlabel">Annotate ''Yes''</div>
<div class="fieldvalue">Yes</div>
 </div>
...

I've been following this This Tutorial to scrape data from a URL as it matches my need pretty closely (3 div's deep). Unfortunately where StackOverflow no longer supports IE I can't test the Tutorial's code to see if it works as is. In my case I cannot use the chrome plugin and I have to authenticate to the site first before navigating to the URL. I also tried the solutions on question 15191847 - specifically gembird's solution - it gave me the same error.

When I run the below I get a "Run-time error '91'. I printed the ie.document to a text file and verified that the div id's I'm searching on are correct and that they're being captured. The error tosses on Set Questions = QuestionList.Children. Any thoughts on why it may be showing me the error?

Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim QuestionList As IHTMLElement, QuestionField As IHTMLElement
Dim Questions As IHTMLElementCollection, QuestionFieldLinks As IHTMLElementCollection, QuestionFields As IHTMLElementCollection
Dim Question As IHTMLElement

Dim RowNumber As Long
Dim votes As String, url As String, views As String, QuestionId As String

url = "<<my url>>"

'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate url


'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
  Application.StatusBar = "Trying to go to " & url
  DoEvents
Loop

Cells.Clear
'show text of HTML document returned
Set html = ie.Document

'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""

'put heading across the top of row 3
Range("A3").Value = "Field"
Range("B3").Value = "Values"

Set QuestionList = html.getElementByID("fieldgroup ")    
Set Questions = QuestionList.Children

RowNumber = 4

For Each Question In Questions
  If Question.className = "fieldrow _text-field" Then
    'get a list of all of the parts of this question, and loop over them
    Set QuestionFields = Question.all
    
    For Each QuestionField In QuestionFields
      'if this is the question's votes, store it (get rid of any surrounding text)
      If QuestionField.className = "fieldlabel" Then
        Cells(RowNumber, 1).Value = Trim(QuestionField.innerText)
      End If

      'likewise for views (getting rid of any text)
      If QuestionField.className = "fieldvalue" Then
        Cells(RowNumber, 2).Value = Trim(QuestionField.innerText)
      End If
    Next QuestionField
    'go on to next row of worksheet
    RowNumber = RowNumber + 1
    
  End If
Next
Set html = Nothing

The HTML output looks like this.

<div class="fieldgroup " style="" group-title="">
 
<div class="fieldrow _text-field">

<div class="fieldlabel">Reporting</div>
<div class="fieldvalue">Yes</div>
 </div>
    
<div class="fieldrow _text-field">
<div class="fieldlabel">Annotate ''Yes''</div>
<div class="fieldvalue">Yes</div>
 </div>
...

Share Improve this question edited Feb 3 at 21:00 braX 11.8k5 gold badges22 silver badges37 bronze badges asked Feb 3 at 16:04 Chasester 7043 gold badges17 silver badges34 bronze badges

I think there is no method getElementsByID - only getElementByID (singular). The ID of an HTML should be unique, so there shouldn't be more than one element. Are you using Option Explicit - that should give you a compiler error. – FunThomas Commented Feb 3 at 16:13
it was singular. That prompted me to a different error (run-time error 91 object variable or with block variable not set - i'll update the question. I had that error on Friday when i last tried this and had retyped it all not realizing it was a different error num. – Chasester Commented Feb 3 at 16:19
Likely there is no element with ID fieldgroup and therefore QuestionList is Nothing. Try to step thru the code with the debugger and check the outcome of the single steps. Not much we can help you with that as we don't have access to the web page. – FunThomas Commented Feb 3 at 16:31
it's there - rather it's "fieldgroup<space>" another typo i had. Same result though. I put the HTML output in the post as well. I should have thought to display that originally. Stepping though the code only works up to where it errors with no further details available. it was why i had printed the output to a text file to make sure it even saw the div's i was looking for. – Chasester Commented Feb 3 at 17:02

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

You are mixing up id and class which are different things, see for example What's the difference between an id and a class?.
Your elements have class-attributes, not IDs. To search for elements with specific class attributes, use the function getElementsByClassName. Note that this is a "plural" function, it will return all elements having that specific class attribute. Even if it finds only one element, it returns a data structure that can hold any number of elements and you need to use an index to access one of the elements. As far as I remember, in VBA that's an 0-based array.

If you are sure that always one element is found, either use

Set QuestionList = html.getElementsByClassName("fieldgroup ")(0)
Set Questions = QuestionList.Children

or (but in this case you need a different definition for QuestionList):

Set QuestionList = html.getElementsByClassName("fieldgroup ")
Set Questions = QuestionList(0).Children

I omitted error checking, so if the HTML doesn't contain any elements with that class name, you will still get a runtime error. To write robust code, you should add that check.

本文标签： web scrapingVBA scrape URL with nested DIV39sStack Overflow

版权声明：本文标题：web scraping - VBA scrape URL with nested DIV's - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741811702a2398835.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

web scraping - VBA scrape URL with nested DIV's - Stack Overflow

1 Answer 1

更多相关文章