admin管理员组

文章数量:1335098

Example HTML:

<p class="labels">
  <span>Item1</span>
  <span>Item2</span>
  <time class="time">
    <span>I dont want to get this span</span>
  </time>
</p>

I am currently getting all the spans within the tag with the labels class, but i just want to get the 2 spans directly under the labels class and i dont want to get any span tags from child elements.

Currently i am doing it like this obviously:

First i am getting the labels HTML from a much bigger HTML:

labels = html.findAll(_class="labels")

Then i extract the span tags out of this.

spans = labels[0].findAll('span', {"class": None}

In my case the "class": None doesn't change anything because no span tag has any class.

So my question again is, how can i just get the first 2 span tags without all child elements?

Example HTML:

<p class="labels">
  <span>Item1</span>
  <span>Item2</span>
  <time class="time">
    <span>I dont want to get this span</span>
  </time>
</p>

I am currently getting all the spans within the tag with the labels class, but i just want to get the 2 spans directly under the labels class and i dont want to get any span tags from child elements.

Currently i am doing it like this obviously:

First i am getting the labels HTML from a much bigger HTML:

labels = html.findAll(_class="labels")

Then i extract the span tags out of this.

spans = labels[0].findAll('span', {"class": None}

In my case the "class": None doesn't change anything because no span tag has any class.

So my question again is, how can i just get the first 2 span tags without all child elements?

Share Improve this question edited Nov 26, 2015 at 22:31 Bioaim asked Nov 26, 2015 at 16:17 BioaimBioaim 1,0161 gold badge15 silver badges28 bronze badges 3
  • Couldn't you make a list prehension that iterates over the direct children of labels[0] and grabs any spans from there? – SuperBiasedMan Commented Nov 26, 2015 at 16:23
  • Do you need all span tags before time tag inside p tag ? – Learner Commented Nov 26, 2015 at 18:30
  • Yes, exactly - and there could be more or less then 2. – Bioaim Commented Nov 26, 2015 at 18:41
Add a ment  | 

3 Answers 3

Reset to default 4

There is a little sentence in the BeautifulSoup Docs where one can find recursive = False

So the answer on this problem was:

spans = labels[0].findAll('span', {"class": None}, recursive=False)
for container in html.findAll(_class="labels"):
    spans = container.findAll('span', {"class": None})
    spans = [span for span in spans if span.parent is container]

Alternatively iterate the .children:

for container in html.findAll(_class="labels"):
    filter = lambda c: c.name == 'span' and c.class_ == None
    spans = [child for child in container.children if filter(child)]

To extract first two span elements try below

>>>[i.text for i in html.find('p',{"class":"labels"}).findAll('span', {"class": None})[0:2]]
>>>[u'Item1', u'Item2']

If you want to grab all span inside class labels then remove the slice-

>>>[i.text for i in html.find('p',{"class":"labels"}).findAll('span', {"class": None})]
>>>[u'Item1', u'Item2', u'I dont want to get this span']

本文标签: javascriptPython BeautifulSoupget elements without child elementsStack Overflow