Selectors with CSS refer to using CSS selectors to select and extract specific elements or data from HTML or XML documents. CSS selectors are a powerful and flexible way to extract data from web pages, and they are supported by Scrapy, a Python-based web scraping framework.
Here are some examples of CSS selectors and their usage in Scrapy:
div elements: div
from scrapy import Selectorhtml = '<div><p>First paragraph.</p></div><div><p>Second paragraph.</p></div>'selector = Selector(text=html)# Select all div elementsdivs = selector.css('div') |
In this example, we use the css() method to select all div elements in the HTML document.
example: .example
from scrapy import Selectorhtml = '<div class="example">This is an example.</div><p>This is not an example.</p>'selector = Selector(text=html)# Select all elements with class name "example"elements = selector.css('.example') |
In this example, we use the css() method to select all elements with class name example in the HTML document.
p element inside a div element: div p:first-of-type::text
from scrapy import Selectorhtml = '<div><p>First paragraph.</p><p>Second paragraph.</p></div>'selector = Selector(text=html)# Select the text content of the first p element inside a div elementtext = selector.css('div p:first-of-type::text').get() |
In this example, we use the css() method to select the text content of the first p element inside a div element in the HTML document. The ::text pseudo-element is used to select the text content of the element, rather than the element itself.
CSS selectors can also be combined with other CSS selectors, as well as pseudo-classes and pseudo-elements, to create more complex selections. For more information on CSS selectors, you can refer to the W3C specification at https://www.w3.org/TR/selectors-3/.