scrape by brightdata | skilld

[Skip to main content](#main-content)

[skilld](https://skilld.dev/)

[Skills](https://skilld.dev/skills) [Collections](https://skilld.dev/collections) [People](https://skilld.dev/people)

[GitHub repository (opens in new tab)](https://github.com/harlan-zw/skilld)

[All skills](https://skilld.dev/skills)

[![brightdata avatar](https://github.com/brightdata.png?size=96)brightdata profile](https://skilld.dev/orgs/brightdata)

# scrape

[brightdata](https://skilld.dev/orgs/brightdata)

Scrape web content as clean markdown/HTML/JSON via the Bright Data CLI (`bdata scrape`). Use when the user wants to fetch a page, extract content from a list of URLs, or crawl paginated listings. Hands off to `data-feeds` for supported platforms (Amazon, LinkedIn, TikTok, Instagram, YouTube, Reddit, etc.) and to `search` when URLs must be discovered first. Requires the Bright Data CLI; proactively guides install + login if missing.

Community skill from brightdata, source updated 2 days ago.

22 68 11 Updated 2 days ago First seen 3 months agoactive·No curators yetSign in to curate

## Install

skilld

skills.sh

`npx -y skilld add gh:brightdata/skills -s scrape`

Works with Claude Code · Codex · Cursor · Copilot · Gemini CLI

[GitHub](https://github.com/brightdata/skills) [skills.sh](https://skills.sh/brightdata/scrape) [Raw](https://skilld.dev/api/skills-raw/brightdata/scrape)

## Skill content

Copy as markdown

Preview

Markdown

# Bright Data — Scrape

Get clean content (markdown, HTML, JSON, screenshot) from one or more URLs via the Bright Data CLI. This skill owns the "fetch raw or lightly-structured content" job. For platform-specific structured data (Amazon, LinkedIn, TikTok, etc.), **stop and use `data-feeds` instead** — you'll get clean JSON without selector logic.

## Setup gate (run first)

Before any scrape, verify the CLI is installed and authenticated:

```bash
if ! command -v bdata >/dev/null 2>&1; then
    echo "bdata CLI not installed — see bright-data-best-practices/references/cli-setup.md"
elif ! bdata zones >/dev/null 2>&1; then
    echo "bdata not authenticated — run: bdata login  (or: bdata login --device for SSH)"
fi
```

If either check fails, halt and route the user to `skills/bright-data-best-practices/references/cli-setup.md`. Do not attempt the legacy `curl` fallback silently — ask the user first.

## Pick your path

| Situation | Action |
| --- | --- |
| Single URL | `bdata scrape <url> -f markdown` |
| Small list (≤ ~20 URLs) | shell loop, 1 at a time (see `references/patterns.md`) |
| Larger list (dozens+) | `xargs -P 4` with parallelism cap (see `references/patterns.md`) |
| Paginated listing | scrape page 1 → extract next-page URL → append → repeat (see `references/examples.md`) |
| JS-heavy / login-gated / interaction-required | escalate to `bdata browser` (see `brightdata-cli` skill) |
| Amazon, LinkedIn, TikTok, Instagram, YouTube, Reddit, … | **stop — hand off to `data-feeds`** |
| No URL yet, just a topic | **hand off to `search`** |

## Action

Core commands:

```bash
# Clean markdown (default)
bdata scrape "https://example.com/article" -f markdown -o article.md

# Raw HTML (when you need the DOM)
bdata scrape "https://example.com" -f html -o page.html

# Structured JSON (when the Unlocker returns parsed fields)
bdata scrape "https://example.com" -f json --pretty -o page.json

# Visual snapshot (saves PNG)
bdata scrape "https://example.com" -f screenshot -o page.png

# Geo-targeted (override the exit country)
bdata scrape "https://example.com" --country de -f markdown
```

Full flag reference: [`references/flags.md`](https://skilld.dev/references/flags.md).

## Verification gate (run before claiming success)

1. **Non-empty output:** `test -s "$out_path"` — or, for stdout, at least 200 bytes of content.
2. **Not a block page** — grep the output for any of these signatures (case-insensitive):
   - `Access Denied`
   - `Just a moment`
   - `Attention Required`
   - `Checking your browser`
   - `captcha`
   - `cf-browser-verification`
   - `cloudflare` _(with < 2KB total body)_
3. **Expected markers present** for the task: e.g., a product page should contain a price pattern ( `\$\d`); an article should contain at least one `<h1>` or `#` heading.
4. **On failure, escalation ladder:**
   - Retry with a different `--country` (e.g., `--country de` if the origin site is US)
   - Escalate to `bdata browser` for full JS rendering (hand off to `brightdata-cli` skill)

Do not report success until all checks above pass.

## Red flags

- Claiming success without inspecting the output.
- Silencing errors with `2>/dev/null` — you'll miss auth failures and rate-limit errors.
- Running `bdata scrape` on Amazon/LinkedIn/TikTok/Instagram/YouTube/Reddit URLs — these are supported by `data-feeds` and return structured data directly. Scraping loses the structure.
- Scraping the same URL repeatedly in the same task — cache the first result.
- Looping `bdata scrape` sequentially for large lists instead of using `xargs -P 4` (or similar) with a parallelism cap.
- Using `curl` against `api.brightdata.com` directly — legacy path; only when the CLI isn't available.

## References

- [`references/flags.md`](https://skilld.dev/references/flags.md) — every flag with when-to-use notes.
- [`references/patterns.md`](https://skilld.dev/references/patterns.md) — shell-loop batching, `xargs` parallelism, pagination recipe, retry/backoff, block-page recovery chain, legacy `curl` fallback.
- [`references/examples.md`](https://skilld.dev/references/examples.md) — (1) single page → markdown, (2) batch a list of URLs with parallelism cap, (3) paginated listing, (4) block-page recovery.

Source: [SKILL.md on GitHub](https://github.com/brightdata/skills/blob/main/scrape/SKILL.md)

## Why curators picked this

No curator note yet. [Be the first to add yours](https://skilld.dev/collections/new?skill=scrape&skillsOwner=brightdata&skillsRepo=skills) — one line on why you reach for this skill.

## Install

skilld

skills.sh

`npx -y skilld add gh:brightdata/skills -s scrape`

Works with Claude Code · Codex · Cursor · Copilot · Gemini CLI

[GitHub](https://github.com/brightdata/skills) [skills.sh](https://skills.sh/brightdata/scrape) [Raw](https://skilld.dev/api/skills-raw/brightdata/scrape)

## Receipts

Indexed from [github.com/brightdata/skills](https://github.com/brightdata/skills) on branch `main`.

<dl>

<dt>SKILL.md</dt>
<dd>[skills/scrape/SKILL.md](https://github.com/brightdata/skills/blob/main/skills/scrape/SKILL.md)</dd>

<dt>History</dt>
<dd>[View commits](https://github.com/brightdata/skills/commits/main/skills/scrape/SKILL.md)</dd>

</dl>

## Related skills

From brightdata/skills

Other by brightdata

[![brightdata avatar](https://github.com/brightdata.png?size=48) bright-data-mcp brightdata](https://skilld.dev/skills/brightdata/bright-data-mcp) [![brightdata avatar](https://github.com/brightdata.png?size=48) data-feeds brightdata](https://skilld.dev/skills/brightdata/data-feeds) [![brightdata avatar](https://github.com/brightdata.png?size=48) bright-data-best-practices brightdata](https://skilld.dev/skills/brightdata/bright-data-best-practices)

[Stats](https://skilld.dev/skills/stats) [Accessibility](https://skilld.dev/accessibility)

[GitHub repository (opens in new tab)](https://github.com/harlan-zw/skilld)

Built by [Harlan Wilton](https://harlanzw.com)