如何按类别查找元素-IT科技

摘要：问题描述：我在使用 Beautifulsoup 解析具有“class”属性的 HTML 元素时遇到了麻烦。代码如下所示soup = BeautifulSoup(sdata) mydivs = soup.findAll('div') for div in mydivs: if (div["c...

问题描述：

我在使用 Beautifulsoup 解析具有“class”属性的 HTML 元素时遇到了麻烦。代码如下所示

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs: 
    if (div["class"] == "stylelistrow"):
        print div

脚本完成“之后”在同一行出现错误。

File "./beautifulcoding.py", line 130, in getlanguage
  if (div["class"] == "stylelistrow"):
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 599, in __getitem__
   return self._getAttrMap()[key]
KeyError: 'class'

我该如何消除这个错误？

解决方案 1：

你可以使用 BS3 优化搜索以仅查找具有给定类的 div：

mydivs = soup.find_all("div", {"class": "stylelistrow"})

解决方案 2：

来自文档：

从 Beautiful Soup 4.1.2 开始，你可以使用关键字参数按 CSS 类进行搜索 class_：

soup.find_all("a", class_="sister")

在这种情况下将是：

soup.find_all("div", class_="stylelistrow")

它也适用于：

soup.find_all("div", class_="stylelistrowone stylelistrowtwo")

解决方案 3：

更新：2016 在最新版本的 beautifulsoup 中，方法“findAll”已重命名为“find_all”。链接至官方文档

已更改的方法名称列表

因此答案是

soup.find_all("html_element", class_="your_class_name")

解决方案 4：

CSS 选择器

单班首场比赛

soup.select_one('.stylelistrow')

比赛列表

soup.select('.stylelistrow')

复合类（即 AND 另一个类）

soup.select_one('.stylelistrow.otherclassname')
soup.select('.stylelistrow.otherclassname')

例如，复合类名中的空格class = stylelistrow otherclassname将被替换为“。”。您可以继续添加类。

类别列表（或 - 匹配任何存在的类别）

soup.select_one('.stylelistrow, .otherclassname')
soup.select('.stylelistrow, .otherclassname')

类属性的值包含字符串，例如“stylelistrow”：

以“风格”开头：

[class^=style]

以“row”结尾

[class$=row]

包含“列表”：

[class*=list]

^、$ 和 * 是运算符。点击此处了解更多：https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors

如果您想要排除此类，则以锚标签为例，选择不包含此类的锚标签：

a:not(.stylelistrow)

您可以在 :not() 伪类中传递简单、复合和复杂的 CSS 选择器列表。请参阅https://facelessuser.github.io/soupsieve/selectors/pseudo-classes/#:not

BS4 4.7.1 +

innerText包含字符串的特定类

soup.select_one('.stylelistrow:contains("some string")')
soup.select('.stylelistrow:contains("some string")')

注意：

soupsieve 2.1.0 + 2020 年 12 月起

新功能：为了避免与未来的 CSS 规范变更发生冲突，非标准伪类现在将以 :-soup- 前缀开头。因此，:contains() 现在将被称为 :-soup-contains()，尽管暂时仍允许使用已弃用的 :contains() 形式，但会发出警告，告知用户应迁移到 :-soup-contains()。
新功能：添加了新的非标准伪类 :-soup-contains-own()，其操作类似于 :-soup-contains()，不同之处在于它只查看与当前范围元素直接关联的文本节点，而不是其后代。

具有特定子元素的特定类，例如a标签

soup.select_one('.stylelistrow:has(a)')
soup.select('.stylelistrow:has(a)')

解决方案 5：

特定于BeautifulSoup 3：

soup.findAll('div',
             {'class': lambda x: x 
                       and 'stylelistrow' in x.split()
             }
            )

将会发现所有这些：

<div class="stylelistrow">
<div class="stylelistrow button">
<div class="button stylelistrow">

解决方案 6：

一种直接的方法是：

soup = BeautifulSoup(sdata)
for each_div in soup.findAll('div',{'class':'stylelist'}):
    print each_div

确保你取下了findAll的大小写，而不是findall

解决方案 7：

class_=如果您想查找元素而不说明 HTML 标签，请使用。

对于单个元素：

soup.find(class_='my-class-name')

对于多个元素：

soup.find_all(class_='my-class-name')

解决方案 8：

如何按类别查找元素
我在使用 Beautifulsoup 解析具有“class”属性的 html 元素时遇到了麻烦。

你可以很容易地通过一个类来找到，但如果你想通过两个类的交集来找到，那就有点困难了，

来自文档（重点添加）：

如果您想要搜索与两个或更多CSS 类匹配的标签，您应该使用 CSS 选择器：
css_soup.select("p.strikeout.body")
# [<p class="body strikeout"></p>]

要明确的是，这仅选择同时为 strikeout 和 body 类别的 p 标签。

为了查找一组类中的任意一个的交集（不是交集，而是并集），你可以给class_关键字参数提供一个列表（从 4.1.2 开始）：

soup = BeautifulSoup(sdata)
class_list = ["stylelistrow"] # can add any other classes to this list.
# will find any divs with any names in class_list:
mydivs = soup.find_all('div', class_=class_list)

还请注意，findAll 已从 camelCase 重命名为更 Pythonic 的find_all。

解决方案 9：

从 BeautifulSoup 4+ 开始，

如果您只有一个类名，那么您可以将类名作为参数传递，例如：

mydivs = soup.find_all('div', 'class_name')

或者如果您有多个类名，只需将类名列表作为参数传递，例如：

mydivs = soup.find_all('div', ['class1', 'class2'])

解决方案 10：

以下对我有用

a_tag = soup.find_all("div",class_='full tabpublist')

解决方案 11：

单身的

soup.find("form",{"class":"c-login__form"})

多种的

res=soup.find_all("input")
for each in res:
    print(each)

解决方案 12：

这对我来说可以访问类属性（在 beautifulsoup 4 上，与文档所说的相反）。KeyError 表示返回的是列表，而不是字典。

for hit in soup.findAll(name='span'):
    print hit.contents[1]['class']

解决方案 13：

其他答案对我来说不起作用。

在其他答案中，findAll正在对汤对象本身使用，但我需要一种方法来通过类名对从我执行后获得的对象中提取的特定元素内的对象进行查找findAll。

如果您尝试在嵌套的 HTML 元素内进行搜索以按类名获取对象，请尝试以下操作 -

# parse html
page_soup = soup(web_page.read(), "html.parser")

# filter out items matching class name
all_songs = page_soup.findAll("li", "song_item")

# traverse through all_songs
for song in all_songs:

    # get text out of span element matching class 'song_name'
    # doing a 'find' by class name within a specific song element taken out of 'all_songs' collection
    song.find("span", "song_name").text

注意事项：

我没有明确定义搜索在“class”属性上findAll("li", {"class": "song_item"})，因为它是我正在搜索的唯一属性，如果你没有明确说明想要查找哪个属性，它将默认搜索 class 属性。
当您执行findAll或时find，结果对象属于bs4.element.ResultSet的子类。您可以在任意数量的嵌套元素（只要它们是类型）内list利用的所有方法来执行查找或查找全部。ResultSet`ResultSet`
我的 BS4 版本 - 4.9.1，Python 版本 - 3.8.1

解决方案 14：

关于@Wernight对部分匹配的最佳答案的评论......

您可以部分匹配：

<div class="stylelistrow">和
<div class="stylelistrow button">

配西班牙凉菜汤：

from gazpacho import Soup

my_divs = soup.find("div", {"class": "stylelistrow"}, partial=True)

两者都将被捕获并作为对象列表返回Soup。

解决方案 15：

或者我们可以使用 lxml，它支持 xpath 而且非常快！

from lxml import html, etree 

attr = html.fromstring(html_text)#passing the raw html
handles = attr.xpath('//div[@class="stylelistrow"]')#xpath exresssion to find that specific class

for each in handles:
    print(etree.tostring(each))#printing the html as string

解决方案 16：

soup = BeautifulSoup(sdata)
mydivs = soup.select('div.stylelistrow')
print(len(mydivs))

解决方案 17：

首先尝试检查 div 是否具有 class 属性，如下所示：

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs:
    if "class" in div:
        if (div["class"]=="stylelistrow"):
            print div

解决方案 18：

这对我有用：

for div in mydivs:
    try:
        clazz = div["class"]
    except KeyError:
        clazz = ""
    if (clazz == "stylelistrow"):
        print div

解决方案 19：

这应该有效：

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs: 
    if (div.find(class_ == "stylelistrow"):
        print div

解决方案 20：

以下应该有效

soup.find('span', attrs={'class':'totalcount'})

将“totalcount”替换为您的班级名称，将“span”替换为您要查找的标签。此外，如果您的班级包含多个带空格的名称，只需选择一个并使用即可。

PS 这将查找符合给定条件的第一个元素。如果要查找所有元素，请将“find”替换为“find_all”。

解决方案 21：

以下内容应该可以解决您的问题，这是一个简明的解决方案：

from bs4 import BeautifulSoup

soup = BeautifulSoup(sdata, 'html.parser')
mydivs = soup.find_all('div', class_='stylelistrow')

for div in mydivs:
    print(div)