Once you have selected elements from a HTML or XML document using a Scrapy selector, you will have a SelectorList object that contains the selected elements. To extract data from the SelectorList, you can use a variety of methods provided by Scrapy.
Here are some of the most commonly used methods for extracting data from a SelectorList:
get(): This method returns the text content of the first selected element as a string. If there are no selected elements, it returns None.
titles = selector.xpath('//title/text()')
title = titles.get()
|
getall(): This method returns a list of the text content of all selected elements as strings. If there are no selected elements, it returns an empty list.
paragraphs = selector.css('p::text')
text = paragraphs.getall()
|
extract(): This method returns the HTML or XML content of the first selected element as a string. If there are no selected elements, it returns None.
links = selector.css('a')
link = links.extract()
|
extract_first(): This method returns the HTML or XML content of the first selected element as a string. If there are no selected elements, it returns None.
images = selector.xpath('//img/@src')
image = images.extract_first()
|
Note that all of these methods return a single value for the first selected element, or a list of values for all selected elements. If you want to extract data from multiple elements in a more structured way, you can use Scrapy item loaders and items, which provide a more flexible and reusable way to parse data from web pages.