Automate dataset collection with Peacock Data

Peacock Data searches the world wide web, crawls and analyzes the semantic content of pages, and builds a specialized dataset of relevant online content for your application.

Get Started For Free ->

Explore Docs

AI-Enhanced Web Crawling

Targeted Web Crawling

Crawl the web based on search queries or specific URLs to collect data for.

Semantic Filtering

Smart content extraction and filtering ensure your crawl datasets only include relevant content, saving time and money.

Captcha Bypass

Automatic captcha detection and resolution help to procure the highest quality data from your crawl processes.

Scheduled Execution

Schedule crawl subjects to run only once, or on a regular schedule to ensure your data stays current.

Integrated directly with Python

Import the peacockdata package to schedule and load your crawl datasets straight from your python environment.

Crawl Data Notebooks

Build peacockdata right into your pipelines to analyze web crawl data.

Download crawl artifacts
Replay web pages
Run SQL queries over crawl metadata

Use your crawl data to build a vector database for RAG applications, fine tune an LLM, or monitor the web for events.

Find a plan that's right for you

Get started with a free plan to experiment with the API and start testing out your ideas. Scale up at low cost as you grow.

Free Bird

$ 0 /mo

For developers to experiment with developing crawl data pipelines.

Get Started ->

Features include:

24 hours of crawling per month
10 GiB of crawl data storage
Unlimited captcha bypass
Up to 1,000 crawl subjects

Get started with Peacock Data

It only takes a few minutes to get your first crawl dataset going.

Start Free Bird Plan ->