Python Selenium 等待几个元素加载

2024-11-29 08:41:00
admin
原创
173
摘要:问题描述:我有一个列表,它是通过 AJAX 动态加载的。首先,在加载时,它的代码如下:<ul><li class="last"><a class="loading" href="#"><ins>&...

问题描述:

我有一个列表,它是通过 AJAX 动态加载的。首先,在加载时,它的代码如下:

<ul><li class="last"><a class="loading" href="#"><ins>&nbsp;</ins>Загрузка...</a></li></ul>

当列表加载时,所有的 li 和 a 都会改变。并且总是多于 1 个 li。像这样:

<ul class="ltr">
<li id="t_b_68" class="closed" rel="simple">
<a id="t_a_68" href="javascript:void(0)">Category 1</a>
</li>
<li id="t_b_64" class="closed" rel="simple">
<a id="t_a_64" href="javascript:void(0)">Category 2</a>
</li>
...

我需要检查列表是否已加载,因此我检查它是否有几个 li。

到目前为止我尝试过:

1)自定义等待条件

class more_than_one(object):
    def __init__(self, selector):
        self.selector = selector

    def __call__(self, driver):
        elements = driver.find_elements_by_css_selector(self.selector)
        if len(elements) > 1:
            return True
        return False

...

try:
        query = WebDriverWait(driver, 30).until(more_than_one('li'))
    except:
        print "Bad crap"
    else:
        # Then load ready list

2)基于find_elements_by的自定义函数

def wait_for_several_elements(driver, selector, min_amount, limit=60):
    """
    This function provides awaiting of <min_amount> of elements found by <selector> with
    time limit = <limit>
    """
    step = 1   # in seconds; sleep for 500ms
    current_wait = 0
    while current_wait < limit:
        try:
            print "Waiting... " + str(current_wait)
            query = driver.find_elements_by_css_selector(selector)
            if len(query) > min_amount:
                print "Found!"
                return True
            else:
                time.sleep(step)
                current_wait += step
        except:
            time.sleep(step)
            current_wait += step

    return False

这不起作用,因为驱动程序(传递给此函数的当前元素)在 DOM 中丢失。UL 没有改变,但 Selenium 出于某种原因无法再找到它。

3) 显式等待。这很糟糕,因为有些列表会立即加载,而有些则需要 10 秒以上才能加载。如果我使用这种技术,每次发生时我都必须等待最长时间,这对我的情况非常不利。

4) 此外,我无法等待 XPATH 正确显示子元素。这个元素只是期望 ul 出现。

try:
    print "Going to nested list..."
    #time.sleep(WAIT_TIME)
    query = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, './/ul')))
    nested_list = child.find_element_by_css_selector('ul')

请告诉我正确的方法,以确保为指定元素加载多个继承元素。

PS 所有这些检查和搜索都应该与当前元素相关。


解决方案 1:

首先,这些元素是AJAX元素。

现在,根据定位所有所需元素并创建列表要求,最简单的方法是诱导WebDriverWait,visibility_of_all_elements_located()您可以使用以下任一定位器策略:

  • 使用CSS_SELECTOR

elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.ltr li[id^='t_b_'] > a[id^='t_a_'][href]")))
  • 使用XPATH

elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='ltr']//li[starts-with(@id, 't_b_')]/a[starts-with(@id, 't_a_') and starts-with(., 'Category')]")))
  • 注意:您必须添加以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

如果您的用例是等待一定数量的元素被加载(例如10 个元素),则可以使用lambda以下函数:

  • 使用>

myLength = 9
WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath("//ul[@class='ltr']//li[starts-with(@id, 't_b_')]/a[starts-with(@id, 't_a_') and starts-with(., 'Category')]")) > int(myLength))
  • 使用==

myLength = 10
WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath("//ul[@class='ltr']//li[starts-with(@id, 't_b_')]/a[starts-with(@id, 't_a_') and starts-with(., 'Category')]")) == int(myLength))

您可以在“如何使用 Selenium 和 Python 等待加载元素数量”中找到相关讨论


参考

您可以在以下位置找到几个相关的详细讨论:

  • 获取硒中的特定元素

  • 在 selenium python 中无法从 div 元素中找到表元素

  • 从 aria-label selenium webdriver 中提取文本(python)

解决方案 2:

我创建的AllEc内容基本上是基于 WebDriverWait.until 逻辑的。

这将等待直到超时或找到所有元素。

from typing import Callable
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import StaleElementReferenceException

class AllEc(object):
    def __init__(self, *args: Callable, description: str = None):
        self.ecs = args
        self.description = description

    def __call__(self, driver):
        try:
            for fn in self.ecs:
                if not fn(driver):
                    return False
            return True
        except StaleElementReferenceException:
            return False

# usage example:
wait = WebDriverWait(driver, timeout)
ec1 = EC.invisibility_of_element_located(locator1)
ec2 = EC.invisibility_of_element_located(locator2)
ec3 = EC.invisibility_of_element_located(locator3)

all_ec = AllEc(ec1, ec2, ec3, description="Required elements to show page has loaded.") 
found_elements = wait.until(all_ec, "Could not find all expected elements")

或者我创建了 AnyEc 来查找多个元素但返回找到的第一个元素。

class AnyEc(object):
    """
    Use with WebDriverWait to combine expected_conditions in an OR.

    Example usage:

        >>> wait = WebDriverWait(driver, 30)
        >>> either = AnyEc(expectedcondition1, expectedcondition2, expectedcondition3, etc...)
        >>> found = wait.until(either, "Cannot find any of the expected conditions")
    """

    def __init__(self, *args: Callable, description: str = None):
        self.ecs = args
        self.description = description

    def __iter__(self):
        return self.ecs.__iter__()

    def __call__(self, driver):
        for fn in self.ecs:
            try:
                rt = fn(driver)
                if rt:
                    return rt
            except TypeError as exc:
                raise exc
            except Exception as exc:
                # print(exc)
                pass

    def __repr__(self):
        return " ".join(f"{e!r}," for e in self.ecs)

    def __str__(self):
        return f"{self.description!s}"

either = AnyEc(ec1, ec2, ec3)
found_element = wait.until(either, "Could not find any of the expected elements")

最后,如果可以的话,您可以尝试等待 Ajax 完成。这并非在所有情况下都有用——例如 Ajax 始终处于活动状态。在 Ajax 运行并完成的情况下,它可以工作。还有一些 ajax 库不设置该active属性,因此请仔细检查您是否可以依赖它。

def is_ajax_complete(driver)
    rt = driver.execute_script("return jQuery.active", *args)
    return rt == 0

wait.until(lambda driver: is_ajax_complete(driver), "Ajax did not finish")

解决方案 3:

(1)你没有提到你遇到的错误

(2)既然你提到

...因为驱动程序(当前元素传递给此函数)...

我假设这实际上是一个 WebElement。在这种情况下,不要将对象本身传递给您的方法,只需传递找到该 WebElement 的选择器(在您的例子中是ul)。如果“驱动程序在 DOM 中丢失”,那么在循环内重新创建它while current_wait < limit:可以缓解问题

(3)是的,time.sleep()只能到此为止

(4)由于li动态加载的元素包含class=closed,而不是(By.XPATH, './/ul'),您可以尝试(By.CSS_SELECTOR, 'ul > li.closed')(有关CSS选择器的更多详细信息,请点击此处)

解决方案 4:

考虑到Mr.E.Arran的评论,我完全根据 CSS 选择器完成了列表遍历。棘手的部分在于我自己的列表结构和标记(更改类等),以及动态创建所需的选择器并在遍历期间将它们保存在内存中。

我通过搜索未处于加载状态的任何内容来处理等待几个元素。您也可以使用“:nth-child”选择器,如下所示:

#in for loop with enumerate for i    
selector.append(' > li:nth-child(%i)' % (i + 1))  # identify child <li> by its order pos

例如这是我的硬注释代码解决方案:

def parse_crippled_shifted_list(driver, frame, selector, level=1, parent_id=0, path=None):
    """
    Traversal of html list of special structure (you can't know if element has sub list unless you enter it).
    Supports start from remembered list element.

    Nested lists have classes "closed" and "last closed" when closed and "open" and "last open" when opened (on <li>).
    Elements themselves have classes "leaf" and "last leaf" in both cases.
    Nested lists situate in <li> element as <ul> list. Each <ul> appears after clicking <a> in each <li>.
    If you click <a> of leaf, page in another frame will load.

    driver - WebDriver; frame - frame of the list; selector - selector to current list (<ul>);
    level - level of depth, just for console output formatting, parent_id - id of parent category (in DB),
    path - remained path in categories (ORM objects) to target category to start with.
    """

    # Add current level list elements
    # This method selects all but loading. Just what is needed to exclude.
    selector.append(' > li > a:not([class=loading])')

    # Wait for child list to load
    try:
        query = WebDriverWait(driver, WAIT_LONG_TIME).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

    except TimeoutException:
        print "%s timed out" % ''.join(selector)

    else:
        # List is loaded
        del selector[-1]  # selector correction: delete last part aimed to get loaded content
        selector.append(' > li')

        children = driver.find_elements_by_css_selector(''.join(selector))  # fetch list elements

        # Walk the whole list
        for i, child in enumerate(children):

            del selector[-1]  # delete non-unique li tag selector
            if selector[-1] != ' > ul' and selector[-1] != 'ul.ltr':
                del selector[-1]

            selector.append(' > li:nth-child(%i)' % (i + 1))  # identify child <li> by its order pos
            selector.append(' > a')  # add 'li > a' reference to click

            child_link = driver.find_element_by_css_selector(''.join(selector))

            # If we parse freely further (no need to start from remembered position)
            if not path:
                # Open child
                try:
                    double_click(driver, child_link)
                except InvalidElementStateException:
                        print "

ERROR
", InvalidElementStateException.message(), '

'
                else:
                    # Determine its type
                    del selector[-1]  # delete changed and already useless link reference
                    # If <li> is category, it would have <ul> as child now and class="open"
                    # Check by class is priority, because <li> exists for sure.
                    current_li = driver.find_element_by_css_selector(''.join(selector))

                    # Category case - BRANCH
                    if current_li.get_attribute('class') == 'open' or current_li.get_attribute('class') == 'last open':
                        new_parent_id = process_category_case(child_link, parent_id, level)  # add category to DB
                        selector.append(' > ul')  # forward to nested list
                        # Wait for nested list to load
                        try:
                            query = WebDriverWait(driver, WAIT_LONG_TIME).until(
                                EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

                        except TimeoutException:
                            print "    " * level,  "%s timed out (%i secs). Failed to load nested list." %\n                                                 ''.join(selector), WAIT_LONG_TIME
                        # Parse nested list
                        else:
                            parse_crippled_shifted_list(driver, frame, selector, level + 1, new_parent_id)

                    # Page case - LEAF
                    elif current_li.get_attribute('class') == 'leaf' or current_li.get_attribute('class') == 'last leaf':
                        process_page_case(driver, child_link, level)
                    else:
                        raise Exception('Damn! Alien class: %s' % current_li.get_attribute('class'))

            # If it's required to continue from specified category
            else:
                # Check if it's required category
                if child_link.text == path[0].name:
                    # Open required category
                    try:
                        double_click(driver, child_link)

                    except InvalidElementStateException:
                            print "

ERROR
", InvalidElementStateException.msg, '

'

                    else:
                        # This element of list must be always category (have nested list)
                        del selector[-1]  # delete changed and already useless link reference
                        # If <li> is category, it would have <ul> as child now and class="open"
                        # Check by class is priority, because <li> exists for sure.
                        current_li = driver.find_element_by_css_selector(''.join(selector))

                        # Category case - BRANCH
                        if current_li.get_attribute('class') == 'open' or current_li.get_attribute('class') == 'last open':
                            selector.append(' > ul')  # forward to nested list
                            # Wait for nested list to load
                            try:
                                query = WebDriverWait(driver, WAIT_LONG_TIME).until(
                                    EC.presence_of_all_elements_located((By.CSS_SELECTOR, ''.join(selector))))

                            except TimeoutException:
                                print "    " * level, "%s timed out (%i secs). Failed to load nested list." %\n                                                     ''.join(selector), WAIT_LONG_TIME
                            # Process this nested list
                            else:
                                last = path.pop(0)
                                if len(path) > 0:  # If more to parse
                                    print "    " * level, "Going deeper to: %s" % ''.join(selector)
                                    parse_crippled_shifted_list(driver, frame, selector, level + 1,
                                                                parent_id=last.id, path=path)
                                else:  # Current is required
                                    print "    " * level,  "Returning target category: ", ''.join(selector)
                                    path = None
                                    parse_crippled_shifted_list(driver, frame, selector, level + 1, last.id, path=None)

                        # Page case - LEAF
                        elif current_li.get_attribute('class') == 'leaf':
                            pass
                else:
                    print "dummy"

        del selector[-2:]

解决方案 5:

这是我如何解决我想等到一定数量的帖子通过 AJAX 完成加载的问题

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

# create a new Chrome session
driver = webdriver.Chrome()

# navigate to your web app.
driver.get("http://my.local.web")

# get the search button
seemore_button = driver.find_element_by_id("seemoreID")

# Count the cant of post
seemore_button.click()

# Wait for 30 sec, until AJAX search load the content
WebDriverWait(driver,30).until(EC.visibility_of_all_elements_located(By.CLASS_NAME, "post"))) 

# Get the list of post
listpost = driver.find_elements_by_class_name("post")
相关推荐
  政府信创国产化的10大政策解读一、信创国产化的背景与意义信创国产化,即信息技术应用创新国产化,是当前中国信息技术领域的一个重要发展方向。其核心在于通过自主研发和创新,实现信息技术应用的自主可控,减少对外部技术的依赖,并规避潜在的技术制裁和风险。随着全球信息技术竞争的加剧,以及某些国家对中国在科技领域的打压,信创国产化显...
工程项目管理   1579  
  为什么项目管理通常仍然耗时且低效?您是否还在反复更新电子表格、淹没在便利贴中并参加每周更新会议?这确实是耗费时间和精力。借助软件工具的帮助,您可以一目了然地全面了解您的项目。如今,国内外有足够多优秀的项目管理软件可以帮助您掌控每个项目。什么是项目管理软件?项目管理软件是广泛行业用于项目规划、资源分配和调度的软件。它使项...
项目管理软件   1355  
  信创产品在政府采购中的占比分析随着信息技术的飞速发展以及国家对信息安全重视程度的不断提高,信创产业应运而生并迅速崛起。信创,即信息技术应用创新,旨在实现信息技术领域的自主可控,减少对国外技术的依赖,保障国家信息安全。政府采购作为推动信创产业发展的重要力量,其对信创产品的采购占比情况备受关注。这不仅关系到信创产业的发展前...
信创和国产化的区别   8  
  信创,即信息技术应用创新产业,旨在实现信息技术领域的自主可控,摆脱对国外技术的依赖。近年来,国货国用信创发展势头迅猛,在诸多领域取得了显著成果。这一发展趋势对科技创新产生了深远的推动作用,不仅提升了我国在信息技术领域的自主创新能力,还为经济社会的数字化转型提供了坚实支撑。信创推动核心技术突破信创产业的发展促使企业和科研...
信创工作   9  
  信创技术,即信息技术应用创新产业,旨在实现信息技术领域的自主可控与安全可靠。近年来,信创技术发展迅猛,对中小企业产生了深远的影响,带来了诸多不可忽视的价值。在数字化转型的浪潮中,中小企业面临着激烈的市场竞争和复杂多变的环境,信创技术的出现为它们提供了新的发展机遇和支撑。信创技术对中小企业的影响技术架构变革信创技术促使中...
信创国产化   8  
热门文章
项目管理软件有哪些?
云禅道AD
禅道项目管理软件

云端的项目管理软件

尊享禅道项目软件收费版功能

无需维护,随时随地协同办公

内置subversion和git源码管理

每天备份,随时转为私有部署

免费试用