[{"data":1,"prerenderedAt":1106},["ShallowReactive",2],{"post-en-playwright-scraping":3},{"id":4,"title":5,"body":6,"date":1093,"description":16,"excerpt":1094,"extension":1095,"meta":1096,"navigation":64,"path":1097,"readTime":98,"seo":1098,"slug":1099,"stem":1100,"tags":1101,"__hash__":1105},"en_blog/en/blog/playwright-scraping.md","Playwright as a Business Scraping Tool: Beyond E2E Testing",{"type":7,"value":8,"toc":1083},"minimark",[9,13,17,22,25,28,32,35,242,245,249,252,358,365,369,372,530,533,537,540,597,601,604,709,716,720,957,963,967,1068,1079],[10,11,5],"h1",{"id":12},"playwright-as-a-business-scraping-tool-beyond-e2e-testing",[14,15,16],"p",{},"The overwhelming majority of Playwright articles discuss end-to-end testing. That is its most visible use case, but far from its only one. When a business portal exposes no API — or one that is incomplete, poorly documented, or restricted to select partners — Playwright becomes a first-class automation tool. Here is how to use it seriously, outside a testing context.",[18,19,21],"h2",{"id":20},"the-concrete-problem-a-portal-with-no-usable-api","The Concrete Problem: A Portal With No Usable API",[14,23,24],{},"Some business portals offer a rich web interface but a limited or absent API. Data extraction, exports, form submission — everything goes through the browser. BeautifulSoup and requests stop there: they cannot handle JavaScript, single-page applications, or complex authentication flows involving MFA or OAuth2 redirections.",[14,26,27],{},"Playwright handles all of this natively.",[18,29,31],{"id":30},"scraper-architecture","Scraper Architecture",[14,33,34],{},"The goal is a scraper that authenticates reliably, navigates and extracts structured data, can be restarted without human intervention, and runs in a containerised environment.",[36,37,42],"pre",{"className":38,"code":39,"language":40,"meta":41,"style":41},"language-python shiki shiki-themes github-dark github-light","from playwright.async_api import async_playwright, Browser, Page\nfrom dataclasses import dataclass\n\n@dataclass\nclass ScraperConfig:\n    base_url: str\n    username: str\n    password: str\n    headless: bool = True\n    timeout: int = 30_000  # ms\n\nclass BusinessScraper:\n    def __init__(self, config: ScraperConfig):\n        self.config = config\n        self._browser: Browser | None = None\n        self._page: Page | None = None\n\n    async def __aenter__(self):\n        self._playwright = await async_playwright().start()\n        self._browser = await self._playwright.chromium.launch(\n            headless=self.config.headless,\n            args=[\"--no-sandbox\", \"--disable-dev-shm-usage\"]  # Required in Docker\n        )\n        context = await self._browser.new_context(\n            viewport={\"width\": 1280, \"height\": 800},\n            locale=\"en-GB\"\n        )\n        self._page = await context.new_page()\n        return self\n\n    async def __aexit__(self, *args):\n        await self._browser.close()\n        await self._playwright.stop()\n","python","",[43,44,45,53,59,66,72,78,84,90,96,102,108,113,119,125,131,137,143,148,154,160,166,172,178,184,190,196,202,207,213,219,224,230,236],"code",{"__ignoreMap":41},[46,47,50],"span",{"class":48,"line":49},"line",1,[46,51,52],{},"from playwright.async_api import async_playwright, Browser, Page\n",[46,54,56],{"class":48,"line":55},2,[46,57,58],{},"from dataclasses import dataclass\n",[46,60,62],{"class":48,"line":61},3,[46,63,65],{"emptyLinePlaceholder":64},true,"\n",[46,67,69],{"class":48,"line":68},4,[46,70,71],{},"@dataclass\n",[46,73,75],{"class":48,"line":74},5,[46,76,77],{},"class ScraperConfig:\n",[46,79,81],{"class":48,"line":80},6,[46,82,83],{},"    base_url: str\n",[46,85,87],{"class":48,"line":86},7,[46,88,89],{},"    username: str\n",[46,91,93],{"class":48,"line":92},8,[46,94,95],{},"    password: str\n",[46,97,99],{"class":48,"line":98},9,[46,100,101],{},"    headless: bool = True\n",[46,103,105],{"class":48,"line":104},10,[46,106,107],{},"    timeout: int = 30_000  # ms\n",[46,109,111],{"class":48,"line":110},11,[46,112,65],{"emptyLinePlaceholder":64},[46,114,116],{"class":48,"line":115},12,[46,117,118],{},"class BusinessScraper:\n",[46,120,122],{"class":48,"line":121},13,[46,123,124],{},"    def __init__(self, config: ScraperConfig):\n",[46,126,128],{"class":48,"line":127},14,[46,129,130],{},"        self.config = config\n",[46,132,134],{"class":48,"line":133},15,[46,135,136],{},"        self._browser: Browser | None = None\n",[46,138,140],{"class":48,"line":139},16,[46,141,142],{},"        self._page: Page | None = None\n",[46,144,146],{"class":48,"line":145},17,[46,147,65],{"emptyLinePlaceholder":64},[46,149,151],{"class":48,"line":150},18,[46,152,153],{},"    async def __aenter__(self):\n",[46,155,157],{"class":48,"line":156},19,[46,158,159],{},"        self._playwright = await async_playwright().start()\n",[46,161,163],{"class":48,"line":162},20,[46,164,165],{},"        self._browser = await self._playwright.chromium.launch(\n",[46,167,169],{"class":48,"line":168},21,[46,170,171],{},"            headless=self.config.headless,\n",[46,173,175],{"class":48,"line":174},22,[46,176,177],{},"            args=[\"--no-sandbox\", \"--disable-dev-shm-usage\"]  # Required in Docker\n",[46,179,181],{"class":48,"line":180},23,[46,182,183],{},"        )\n",[46,185,187],{"class":48,"line":186},24,[46,188,189],{},"        context = await self._browser.new_context(\n",[46,191,193],{"class":48,"line":192},25,[46,194,195],{},"            viewport={\"width\": 1280, \"height\": 800},\n",[46,197,199],{"class":48,"line":198},26,[46,200,201],{},"            locale=\"en-GB\"\n",[46,203,205],{"class":48,"line":204},27,[46,206,183],{},[46,208,210],{"class":48,"line":209},28,[46,211,212],{},"        self._page = await context.new_page()\n",[46,214,216],{"class":48,"line":215},29,[46,217,218],{},"        return self\n",[46,220,222],{"class":48,"line":221},30,[46,223,65],{"emptyLinePlaceholder":64},[46,225,227],{"class":48,"line":226},31,[46,228,229],{},"    async def __aexit__(self, *args):\n",[46,231,233],{"class":48,"line":232},32,[46,234,235],{},"        await self._browser.close()\n",[46,237,239],{"class":48,"line":238},33,[46,240,241],{},"        await self._playwright.stop()\n",[14,243,244],{},"The context manager ensures the browser closes cleanly even in the event of an exception — essential in production.",[18,246,248],{"id":247},"robust-authentication","Robust Authentication",[14,250,251],{},"Authentication is the most fragile part of any scraper. Portals change their UI, introduce additional security steps, or add delays. A few principles for making it reliable:",[36,253,255],{"className":38,"code":254,"language":40,"meta":41,"style":41},"async def login(self) -> bool:\n    page = self._page\n    await page.goto(f\"{self.config.base_url}/login\", wait_until=\"networkidle\")\n\n    # Wait for the specific element, not just page load\n    await page.wait_for_selector(\"#username\", state=\"visible\", timeout=10_000)\n    await page.fill(\"#username\", self.config.username)\n    await page.fill(\"#password\", self.config.password)\n\n    # Intercept the login response to detect auth failures precisely\n    async with page.expect_response(\n        lambda r: \"/api/auth\" in r.url and r.status in (200, 401, 403)\n    ) as response_info:\n        await page.click('[type=\"submit\"]')\n\n    response = await response_info.value\n    if response.status != 200:\n        raise AuthenticationError(f\"Login failed: HTTP {response.status}\")\n\n    await page.wait_for_url(f\"{self.config.base_url}/dashboard\", timeout=15_000)\n    return True\n",[43,256,257,262,267,272,276,281,286,291,296,300,305,310,315,320,325,329,334,339,344,348,353],{"__ignoreMap":41},[46,258,259],{"class":48,"line":49},[46,260,261],{},"async def login(self) -> bool:\n",[46,263,264],{"class":48,"line":55},[46,265,266],{},"    page = self._page\n",[46,268,269],{"class":48,"line":61},[46,270,271],{},"    await page.goto(f\"{self.config.base_url}/login\", wait_until=\"networkidle\")\n",[46,273,274],{"class":48,"line":68},[46,275,65],{"emptyLinePlaceholder":64},[46,277,278],{"class":48,"line":74},[46,279,280],{},"    # Wait for the specific element, not just page load\n",[46,282,283],{"class":48,"line":80},[46,284,285],{},"    await page.wait_for_selector(\"#username\", state=\"visible\", timeout=10_000)\n",[46,287,288],{"class":48,"line":86},[46,289,290],{},"    await page.fill(\"#username\", self.config.username)\n",[46,292,293],{"class":48,"line":92},[46,294,295],{},"    await page.fill(\"#password\", self.config.password)\n",[46,297,298],{"class":48,"line":98},[46,299,65],{"emptyLinePlaceholder":64},[46,301,302],{"class":48,"line":104},[46,303,304],{},"    # Intercept the login response to detect auth failures precisely\n",[46,306,307],{"class":48,"line":110},[46,308,309],{},"    async with page.expect_response(\n",[46,311,312],{"class":48,"line":115},[46,313,314],{},"        lambda r: \"/api/auth\" in r.url and r.status in (200, 401, 403)\n",[46,316,317],{"class":48,"line":121},[46,318,319],{},"    ) as response_info:\n",[46,321,322],{"class":48,"line":127},[46,323,324],{},"        await page.click('[type=\"submit\"]')\n",[46,326,327],{"class":48,"line":133},[46,328,65],{"emptyLinePlaceholder":64},[46,330,331],{"class":48,"line":139},[46,332,333],{},"    response = await response_info.value\n",[46,335,336],{"class":48,"line":145},[46,337,338],{},"    if response.status != 200:\n",[46,340,341],{"class":48,"line":150},[46,342,343],{},"        raise AuthenticationError(f\"Login failed: HTTP {response.status}\")\n",[46,345,346],{"class":48,"line":156},[46,347,65],{"emptyLinePlaceholder":64},[46,349,350],{"class":48,"line":162},[46,351,352],{},"    await page.wait_for_url(f\"{self.config.base_url}/dashboard\", timeout=15_000)\n",[46,354,355],{"class":48,"line":168},[46,356,357],{},"    return True\n",[14,359,360,361,364],{},"Network response interception (",[43,362,363],{},"expect_response",") is more reliable than waiting for a CSS selector after the click — it detects authentication failures without depending on how the error message happens to be rendered.",[18,366,368],{"id":367},"extracting-structured-data","Extracting Structured Data",[14,370,371],{},"Once authenticated, extraction must be deterministic. Playwright allows combining DOM navigation and network interception, depending on which is more stable:",[36,373,375],{"className":38,"code":374,"language":40,"meta":41,"style":41},"async def extract_certificates(self, period: str) -> list[dict]:\n    page = self._page\n    await page.goto(\n        f\"{self.config.base_url}/certificates?period={period}\",\n        wait_until=\"networkidle\"\n    )\n\n    # Strategy 1: intercept the underlying API call when available\n    async with page.expect_response(\n        lambda r: \"/api/certificates\" in r.url\n    ) as api_response:\n        await page.click(\"#load-certificates\")\n\n    data = await (await api_response.value).json()\n    return data.get(\"items\", [])\n\nasync def extract_table_data(self) -> list[dict]:\n    \"\"\"Strategy 2: extract directly from the DOM.\"\"\"\n    rows = await self._page.query_selector_all(\"table.data-grid tbody tr\")\n    results = []\n\n    for row in rows:\n        cells = await row.query_selector_all(\"td\")\n        values = [await cell.inner_text() for cell in cells]\n        results.append({\n            \"id\": values[0].strip(),\n            \"date\": values[1].strip(),\n            \"volume\": float(values[2].replace(\",\", \".\")),\n            \"status\": values[3].strip(),\n        })\n\n    return results\n",[43,376,377,382,386,391,396,401,406,410,415,419,424,429,434,438,443,448,452,457,462,467,472,476,481,486,491,496,501,506,511,516,521,525],{"__ignoreMap":41},[46,378,379],{"class":48,"line":49},[46,380,381],{},"async def extract_certificates(self, period: str) -> list[dict]:\n",[46,383,384],{"class":48,"line":55},[46,385,266],{},[46,387,388],{"class":48,"line":61},[46,389,390],{},"    await page.goto(\n",[46,392,393],{"class":48,"line":68},[46,394,395],{},"        f\"{self.config.base_url}/certificates?period={period}\",\n",[46,397,398],{"class":48,"line":74},[46,399,400],{},"        wait_until=\"networkidle\"\n",[46,402,403],{"class":48,"line":80},[46,404,405],{},"    )\n",[46,407,408],{"class":48,"line":86},[46,409,65],{"emptyLinePlaceholder":64},[46,411,412],{"class":48,"line":92},[46,413,414],{},"    # Strategy 1: intercept the underlying API call when available\n",[46,416,417],{"class":48,"line":98},[46,418,309],{},[46,420,421],{"class":48,"line":104},[46,422,423],{},"        lambda r: \"/api/certificates\" in r.url\n",[46,425,426],{"class":48,"line":110},[46,427,428],{},"    ) as api_response:\n",[46,430,431],{"class":48,"line":115},[46,432,433],{},"        await page.click(\"#load-certificates\")\n",[46,435,436],{"class":48,"line":121},[46,437,65],{"emptyLinePlaceholder":64},[46,439,440],{"class":48,"line":127},[46,441,442],{},"    data = await (await api_response.value).json()\n",[46,444,445],{"class":48,"line":133},[46,446,447],{},"    return data.get(\"items\", [])\n",[46,449,450],{"class":48,"line":139},[46,451,65],{"emptyLinePlaceholder":64},[46,453,454],{"class":48,"line":145},[46,455,456],{},"async def extract_table_data(self) -> list[dict]:\n",[46,458,459],{"class":48,"line":150},[46,460,461],{},"    \"\"\"Strategy 2: extract directly from the DOM.\"\"\"\n",[46,463,464],{"class":48,"line":156},[46,465,466],{},"    rows = await self._page.query_selector_all(\"table.data-grid tbody tr\")\n",[46,468,469],{"class":48,"line":162},[46,470,471],{},"    results = []\n",[46,473,474],{"class":48,"line":168},[46,475,65],{"emptyLinePlaceholder":64},[46,477,478],{"class":48,"line":174},[46,479,480],{},"    for row in rows:\n",[46,482,483],{"class":48,"line":180},[46,484,485],{},"        cells = await row.query_selector_all(\"td\")\n",[46,487,488],{"class":48,"line":186},[46,489,490],{},"        values = [await cell.inner_text() for cell in cells]\n",[46,492,493],{"class":48,"line":192},[46,494,495],{},"        results.append({\n",[46,497,498],{"class":48,"line":198},[46,499,500],{},"            \"id\": values[0].strip(),\n",[46,502,503],{"class":48,"line":204},[46,504,505],{},"            \"date\": values[1].strip(),\n",[46,507,508],{"class":48,"line":209},[46,509,510],{},"            \"volume\": float(values[2].replace(\",\", \".\")),\n",[46,512,513],{"class":48,"line":215},[46,514,515],{},"            \"status\": values[3].strip(),\n",[46,517,518],{"class":48,"line":221},[46,519,520],{},"        })\n",[46,522,523],{"class":48,"line":226},[46,524,65],{"emptyLinePlaceholder":64},[46,526,527],{"class":48,"line":232},[46,528,529],{},"    return results\n",[14,531,532],{},"Strategy 1 (network interception) is preferable when available: raw JSON data is cleaner and less sensitive to layout changes. Strategy 2 (DOM extraction) is the universal fallback.",[18,534,536],{"id":535},"handling-file-downloads","Handling File Downloads",[14,538,539],{},"Many portals offer Excel or CSV exports via a download button. Playwright handles this natively:",[36,541,543],{"className":38,"code":542,"language":40,"meta":41,"style":41},"async def download_export(self, output_path: str) -> str:\n    async with self._page.expect_download() as download_info:\n        await self._page.click(\"#export-button\")\n\n    download = await download_info.value\n\n    if download.failure():\n        raise ExportError(f\"Download failed: {download.failure()}\")\n\n    await download.save_as(output_path)\n    return output_path\n",[43,544,545,550,555,560,564,569,573,578,583,587,592],{"__ignoreMap":41},[46,546,547],{"class":48,"line":49},[46,548,549],{},"async def download_export(self, output_path: str) -> str:\n",[46,551,552],{"class":48,"line":55},[46,553,554],{},"    async with self._page.expect_download() as download_info:\n",[46,556,557],{"class":48,"line":61},[46,558,559],{},"        await self._page.click(\"#export-button\")\n",[46,561,562],{"class":48,"line":68},[46,563,65],{"emptyLinePlaceholder":64},[46,565,566],{"class":48,"line":74},[46,567,568],{},"    download = await download_info.value\n",[46,570,571],{"class":48,"line":80},[46,572,65],{"emptyLinePlaceholder":64},[46,574,575],{"class":48,"line":86},[46,576,577],{},"    if download.failure():\n",[46,579,580],{"class":48,"line":92},[46,581,582],{},"        raise ExportError(f\"Download failed: {download.failure()}\")\n",[46,584,585],{"class":48,"line":98},[46,586,65],{"emptyLinePlaceholder":64},[46,588,589],{"class":48,"line":104},[46,590,591],{},"    await download.save_as(output_path)\n",[46,593,594],{"class":48,"line":110},[46,595,596],{},"    return output_path\n",[18,598,600],{"id":599},"running-in-docker-and-openshift","Running in Docker and OpenShift",[14,602,603],{},"Playwright in a container requires Chromium's system dependencies:",[36,605,609],{"className":606,"code":607,"language":608,"meta":41,"style":41},"language-dockerfile shiki shiki-themes github-dark github-light","FROM python:3.12-slim\n\nRUN apt-get update && apt-get install -y \\\n    libnss3 libatk1.0-0 libatk-bridge2.0-0 \\\n    libcups2 libdrm2 libxkbcommon0 libxcomposite1 \\\n    libxdamage1 libxfixes3 libxrandr2 libgbm1 \\\n    libasound2 libpango-1.0-0 libcairo2 \\\n    && rm -rf /var/lib/apt/lists/*\n\nWORKDIR /app\n\nRUN chown -R 1001:0 /app && chmod -R g=u /app\n\nCOPY --chown=1001:0 requirements.txt .\nRUN pip install --no-cache-dir -r requirements.txt && playwright install chromium\n\nCOPY --chown=1001:0 . .\n\nUSER 1001\n\nCMD [\"python\", \"scraper.py\"]\n","dockerfile",[43,610,611,616,620,625,630,635,640,645,650,654,659,663,668,672,677,682,686,691,695,700,704],{"__ignoreMap":41},[46,612,613],{"class":48,"line":49},[46,614,615],{},"FROM python:3.12-slim\n",[46,617,618],{"class":48,"line":55},[46,619,65],{"emptyLinePlaceholder":64},[46,621,622],{"class":48,"line":61},[46,623,624],{},"RUN apt-get update && apt-get install -y \\\n",[46,626,627],{"class":48,"line":68},[46,628,629],{},"    libnss3 libatk1.0-0 libatk-bridge2.0-0 \\\n",[46,631,632],{"class":48,"line":74},[46,633,634],{},"    libcups2 libdrm2 libxkbcommon0 libxcomposite1 \\\n",[46,636,637],{"class":48,"line":80},[46,638,639],{},"    libxdamage1 libxfixes3 libxrandr2 libgbm1 \\\n",[46,641,642],{"class":48,"line":86},[46,643,644],{},"    libasound2 libpango-1.0-0 libcairo2 \\\n",[46,646,647],{"class":48,"line":92},[46,648,649],{},"    && rm -rf /var/lib/apt/lists/*\n",[46,651,652],{"class":48,"line":98},[46,653,65],{"emptyLinePlaceholder":64},[46,655,656],{"class":48,"line":104},[46,657,658],{},"WORKDIR /app\n",[46,660,661],{"class":48,"line":110},[46,662,65],{"emptyLinePlaceholder":64},[46,664,665],{"class":48,"line":115},[46,666,667],{},"RUN chown -R 1001:0 /app && chmod -R g=u /app\n",[46,669,670],{"class":48,"line":121},[46,671,65],{"emptyLinePlaceholder":64},[46,673,674],{"class":48,"line":127},[46,675,676],{},"COPY --chown=1001:0 requirements.txt .\n",[46,678,679],{"class":48,"line":133},[46,680,681],{},"RUN pip install --no-cache-dir -r requirements.txt && playwright install chromium\n",[46,683,684],{"class":48,"line":139},[46,685,65],{"emptyLinePlaceholder":64},[46,687,688],{"class":48,"line":145},[46,689,690],{},"COPY --chown=1001:0 . .\n",[46,692,693],{"class":48,"line":150},[46,694,65],{"emptyLinePlaceholder":64},[46,696,697],{"class":48,"line":156},[46,698,699],{},"USER 1001\n",[46,701,702],{"class":48,"line":162},[46,703,65],{"emptyLinePlaceholder":64},[46,705,706],{"class":48,"line":168},[46,707,708],{},"CMD [\"python\", \"scraper.py\"]\n",[14,710,711,712,715],{},"On OpenShift, ",[43,713,714],{},"--no-sandbox"," is mandatory: containers do not have the privileges required by Chromium's sandbox. This is not a security concern in this context — the sandbox protects against malicious web content, which does not apply to a scraper targeting a known internal portal.",[18,717,719],{"id":718},"orchestrating-with-a-kubernetes-cronjob","Orchestrating with a Kubernetes CronJob",[36,721,725],{"className":722,"code":723,"language":724,"meta":41,"style":41},"language-yaml shiki shiki-themes github-dark github-light","apiVersion: batch/v1\nkind: CronJob\nmetadata:\n  name: business-scraper\nspec:\n  schedule: \"0 6 * * 1-5\"\n  concurrencyPolicy: Forbid\n  jobTemplate:\n    spec:\n      template:\n        spec:\n          containers:\n            - name: scraper\n              image: registry.internal/business-scraper:latest\n              env:\n                - name: SCRAPER_USERNAME\n                  valueFrom:\n                    secretKeyRef:\n                      name: scraper-credentials\n                      key: username\n                - name: SCRAPER_PASSWORD\n                  valueFrom:\n                    secretKeyRef:\n                      name: scraper-credentials\n                      key: password\n          restartPolicy: OnFailure\n","yaml",[43,726,727,741,751,759,769,776,786,796,803,810,817,824,831,844,854,861,873,880,887,897,907,918,924,930,938,947],{"__ignoreMap":41},[46,728,729,733,737],{"class":48,"line":49},[46,730,732],{"class":731},"sZkSk","apiVersion",[46,734,736],{"class":735},"sQ3_J",": ",[46,738,740],{"class":739},"sg6BJ","batch/v1\n",[46,742,743,746,748],{"class":48,"line":55},[46,744,745],{"class":731},"kind",[46,747,736],{"class":735},[46,749,750],{"class":739},"CronJob\n",[46,752,753,756],{"class":48,"line":61},[46,754,755],{"class":731},"metadata",[46,757,758],{"class":735},":\n",[46,760,761,764,766],{"class":48,"line":68},[46,762,763],{"class":731},"  name",[46,765,736],{"class":735},[46,767,768],{"class":739},"business-scraper\n",[46,770,771,774],{"class":48,"line":74},[46,772,773],{"class":731},"spec",[46,775,758],{"class":735},[46,777,778,781,783],{"class":48,"line":80},[46,779,780],{"class":731},"  schedule",[46,782,736],{"class":735},[46,784,785],{"class":739},"\"0 6 * * 1-5\"\n",[46,787,788,791,793],{"class":48,"line":86},[46,789,790],{"class":731},"  concurrencyPolicy",[46,792,736],{"class":735},[46,794,795],{"class":739},"Forbid\n",[46,797,798,801],{"class":48,"line":92},[46,799,800],{"class":731},"  jobTemplate",[46,802,758],{"class":735},[46,804,805,808],{"class":48,"line":98},[46,806,807],{"class":731},"    spec",[46,809,758],{"class":735},[46,811,812,815],{"class":48,"line":104},[46,813,814],{"class":731},"      template",[46,816,758],{"class":735},[46,818,819,822],{"class":48,"line":110},[46,820,821],{"class":731},"        spec",[46,823,758],{"class":735},[46,825,826,829],{"class":48,"line":115},[46,827,828],{"class":731},"          containers",[46,830,758],{"class":735},[46,832,833,836,839,841],{"class":48,"line":121},[46,834,835],{"class":735},"            - ",[46,837,838],{"class":731},"name",[46,840,736],{"class":735},[46,842,843],{"class":739},"scraper\n",[46,845,846,849,851],{"class":48,"line":127},[46,847,848],{"class":731},"              image",[46,850,736],{"class":735},[46,852,853],{"class":739},"registry.internal/business-scraper:latest\n",[46,855,856,859],{"class":48,"line":133},[46,857,858],{"class":731},"              env",[46,860,758],{"class":735},[46,862,863,866,868,870],{"class":48,"line":139},[46,864,865],{"class":735},"                - ",[46,867,838],{"class":731},[46,869,736],{"class":735},[46,871,872],{"class":739},"SCRAPER_USERNAME\n",[46,874,875,878],{"class":48,"line":145},[46,876,877],{"class":731},"                  valueFrom",[46,879,758],{"class":735},[46,881,882,885],{"class":48,"line":150},[46,883,884],{"class":731},"                    secretKeyRef",[46,886,758],{"class":735},[46,888,889,892,894],{"class":48,"line":156},[46,890,891],{"class":731},"                      name",[46,893,736],{"class":735},[46,895,896],{"class":739},"scraper-credentials\n",[46,898,899,902,904],{"class":48,"line":162},[46,900,901],{"class":731},"                      key",[46,903,736],{"class":735},[46,905,906],{"class":739},"username\n",[46,908,909,911,913,915],{"class":48,"line":168},[46,910,865],{"class":735},[46,912,838],{"class":731},[46,914,736],{"class":735},[46,916,917],{"class":739},"SCRAPER_PASSWORD\n",[46,919,920,922],{"class":48,"line":174},[46,921,877],{"class":731},[46,923,758],{"class":735},[46,925,926,928],{"class":48,"line":180},[46,927,884],{"class":731},[46,929,758],{"class":735},[46,931,932,934,936],{"class":48,"line":186},[46,933,891],{"class":731},[46,935,736],{"class":735},[46,937,896],{"class":739},[46,939,940,942,944],{"class":48,"line":192},[46,941,901],{"class":731},[46,943,736],{"class":735},[46,945,946],{"class":739},"password\n",[46,948,949,952,954],{"class":48,"line":198},[46,950,951],{"class":731},"          restartPolicy",[46,953,736],{"class":735},[46,955,956],{"class":739},"OnFailure\n",[14,958,959,962],{},[43,960,961],{},"concurrencyPolicy: Forbid"," is critical: if one execution takes longer than expected, you do not want two scrapers authenticating simultaneously with the same account.",[18,964,966],{"id":965},"playwright-vs-the-alternatives","Playwright vs the Alternatives",[968,969,970,989],"table",{},[971,972,973],"thead",{},[974,975,976,980,983,986],"tr",{},[977,978,979],"th",{},"Criterion",[977,981,982],{},"requests + BS4",[977,984,985],{},"Selenium",[977,987,988],{},"Playwright",[990,991,992,1006,1019,1030,1043,1057],"tbody",{},[974,993,994,998,1001,1004],{},[995,996,997],"td",{},"SPAs / JavaScript",[995,999,1000],{},"No",[995,1002,1003],{},"Yes",[995,1005,1003],{},[974,1007,1008,1011,1013,1016],{},[995,1009,1010],{},"Network interception",[995,1012,1000],{},[995,1014,1015],{},"Partial",[995,1017,1018],{},"Native",[974,1020,1021,1024,1026,1028],{},[995,1022,1023],{},"Native async",[995,1025,1000],{},[995,1027,1000],{},[995,1029,1003],{},[974,1031,1032,1035,1038,1041],{},[995,1033,1034],{},"CI/CD stability",[995,1036,1037],{},"Good",[995,1039,1040],{},"Fragile",[995,1042,1037],{},[974,1044,1045,1048,1051,1054],{},[995,1046,1047],{},"Docker support",[995,1049,1050],{},"Simple",[995,1052,1053],{},"Complex",[995,1055,1056],{},"Reasonable",[974,1058,1059,1062,1064,1066],{},[995,1060,1061],{},"Modern API",[995,1063,1000],{},[995,1065,1000],{},[995,1067,1003],{},[14,1069,1070,1071,1074,1075,1078],{},"For simple static sites, ",[43,1072,1073],{},"requests"," and ",[43,1076,1077],{},"BeautifulSoup"," remain faster to set up and lighter to operate. However, as soon as complex authentication, dynamic JavaScript, or user interactions are involved — Playwright is the most robust open-source option available today.",[1080,1081,1082],"style",{},"html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html pre.shiki code .sZkSk, html code.shiki .sZkSk{--shiki-dark:#85E89D;--shiki-default:#22863A}html pre.shiki code .sQ3_J, html code.shiki .sQ3_J{--shiki-dark:#E1E4E8;--shiki-default:#24292E}html pre.shiki code .sg6BJ, html code.shiki .sg6BJ{--shiki-dark:#9ECBFF;--shiki-default:#032F62}",{"title":41,"searchDepth":55,"depth":55,"links":1084},[1085,1086,1087,1088,1089,1090,1091,1092],{"id":20,"depth":55,"text":21},{"id":30,"depth":55,"text":31},{"id":247,"depth":55,"text":248},{"id":367,"depth":55,"text":368},{"id":535,"depth":55,"text":536},{"id":599,"depth":55,"text":600},{"id":718,"depth":55,"text":719},{"id":965,"depth":55,"text":966},"2024-01-03",null,"md",{},"/en/blog/playwright-scraping",{"title":5,"description":16},"playwright-scraping","en/blog/playwright-scraping",[988,1102,1103,1104],"Python","Scraping","Automation","oQUUW112w2iMAmal0nq2gp-lej3gpyVIgTM7tHGKHWs",1774645635824]