Запуск headless selenium через docker с Python / Scrapy

Question

Запуск headless selenium через docker с Python / Scrapy

Я пытаюсь использовать Scrapy с Selenium на ноутбуке, где у меня установлен Kubuntu, но я использую только командную строку (не запускаю X-сервер).

мой первый вопрос: Нужен ли мне тогда Xvfb?

во всяком случае, то, что я делаю сейчас:

sudo service docker start
sudo service docker status
sudo docker run -it --rm --name chrome --shm-size=1024m -p=9222:9222 --cap-add=SYS_ADMIN  yukinying/chrome-headless-browser --enable-logging --v=10000

; Now docker is running, in a second SSH session I do now:

Xvfb :99 &
export DISPLAY=:99

; In the second SSH session now:

scrapy crawl weibospider

Теперь я получаю огромный список отладочных сообщений и параметров и т. д.

2017-07-09 18:37:23 [easyprocess] DEBUG: param: "['Xvfb', '-help']" 
2017-07-09 18:37:23 [easyprocess] DEBUG: command: ['Xvfb', '-help']
2017-07-09 18:37:23 [easyprocess] DEBUG: joined command: Xvfb -help
2017-07-09 18:37:24 [easyprocess] DEBUG: process was started (pid=5235)
2017-07-09 18:37:26 [easyprocess] DEBUG: process has ended
2017-07-09 18:37:26 [easyprocess] DEBUG: return code=0
2017-07-09 18:37:26 [easyprocess] DEBUG: stdout=
2017-07-09 18:37:26 [easyprocess] DEBUG: stderr=use: X [:<display>] [option]
-a #                   default pointer acceleration (factor)
-ac                    disable access control restrictions
...
2017-07-09 18:38:35 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
Unhandled error in Deferred:
2017-07-09 18:38:35 [twisted] CRITICAL: Unhandled error in Deferred:

2017-07-09 18:38:35 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/home/spidy/.local/lib/python3.5/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
    result = g.send(result)
  File "/home/spidy/.local/lib/python3.5/site-packages/scrapy/crawler.py", line 76, in crawl
    self.spider = self._create_spider(*args, **kwargs)
  File "/home/spidy/.local/lib/python3.5/site-packages/scrapy/crawler.py", line 99, in _create_spider
    return self.spidercls.from_crawler(self, *args, **kwargs)
  File "/home/spidy/.local/lib/python3.5/site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler
    spider = cls(*args, **kwargs)
  File "/home/spidy/var/scrapy/weibo/weibo/spiders/weibobrandspider.py", line 26, in __init__
    self.browser = webdriver.Firefox()
  File "/home/spidy/.local/lib/python3.5/site-packages/selenium/webdriver/firefox/webdriver.py", line 152, in __init__
    keep_alive=True)
  File "/home/spidy/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in __init__
    self.start_session(desired_capabilities, browser_profile)
  File "/home/spidy/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 188, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/home/spidy/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute
    self.error_handler.check_response(response)
  File "/home/spidy/.local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: connection refused

мое окружение:

Python 3.5.2
/usr/local/bin / geckodriver
Docker версия 17.03.1-ce, сборка c6d412e
Mozilla Firefox 54.0
Ubuntu 16.04.2 LTS

и скрипт:

from pyvirtualdisplay import Display

class WeiboSpider(scrapy.Spider):
name = "weibospider"

def __init__(self):
    display = Display(visible=0, size=(1200, 1000))
    display.start()

    # This is the problematic line:
    self.browser = webdriver.Firefox()

у меня закончились идеи - что я делаю неправильно или что я упускаю?

7

задан Chris

29.11.2022 10:15

1 ответ

90	54	27	63	5	7	9	5	4	18

Ваш ответ

Опубликуйте как Гость или авторизуйтесь

Имя

Email

Apple	$173,24	+0,81%
Amazon	$114,49	-1,94%
Microsoft	$325,19	+3,61%
Google	$123,44	+2,11%
Netflix	$364,74	-0,03%
Intel	$27,45	-5,34%
Facebook	$254,49	+2,11%
Tesla	$185,54	+1,44%
Tencent	$322,40	-3,01%

Запуск headless selenium через docker с Python / Scrapy

1 ответ

Ваш ответ

Опубликуйте как Гость или авторизуйтесь

Похожие вопросы про тегам: