Scrapy remains the most powerful open-source framework for structured web scrapping in Python in 2026. This updated guide shows how to create a clean, modern "spider" (crawler) using Scrapy 2.14+, Python 3.11–3.13, and current best practices including async support.
What is a Scrapy Spider?
A spider defines how to crawl a site (start URLs), how to parse pages, and what data to extract. In 2026, spiders are often written with async methods for better performance.
Step 1 – Project Setup (2026 style)
pip install scrapy
scrapy startproject classy_spider_2026
cd classy_spider_2026
Step 2 – Create the Spider (modern async style)
# classy_spider_2026/spiders/quotes.py
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = ["https://quotes.toscrape.com/"]
async def start(self):
# Modern async entry point (Scrapy 2.13+)
# You can also keep start_requests() for compatibility
pass
async def parse(self, response):
for quote in response.css("div.quote"):
yield {
"text": quote.css("span.text::text").get(),
"author": quote.css("small.author::text").get(),
"tags": quote.css("div.tags a.tag::text").getall(),
}
next_page = response.css("li.next a::attr(href)").get()
if next_page:
yield response.follow(next_page, self.parse)
Step 3 – Run the Spider
scrapy crawl quotes -o quotes2026.json
# or CSV: -o quotes2026.csv
2026 Best Practices & Tips
- Use
async def parse()andstart()for better concurrency - Set
DOWNLOAD_DELAY = 1.5andCONCURRENT_REQUESTS_PER_DOMAIN = 2by default - Add Playwright integration for JS sites:
pip install scrapy-playwright - Use Item Loaders or Pydantic for clean data validation
- Always respect robots.txt and add realistic User-Agent rotation
Last updated: March 19, 2026 – Scrapy 2.14 brings native asyncio runners and better coroutine support. This makes spiders more efficient than ever.