To set up a Selector in Scrapy, you first need to import the Selector class from the scrapy module. Once you have the Selector class, you can create an instance of it by passing in the HTML or XML document you want to parse.
Here is an example:
from scrapy import Selectorhtml = """<html> <head> <title>My Web Page</title> </head> <body> <div class="container"> <h1>Welcome to my web page!</h1> <p class="intro">This is the introductory paragraph.</p> <p class="content">This is the main content of the page.</p> </div> </body></html>"""selector = Selector(text=html) |
In this example, we have defined an HTML document as a string and assigned it to the variable html. We then create a Selector instance by passing the HTML string to the Selector constructor, with the text argument.
Once you have created the Selector instance, you can use various selector methods to extract data from the document. For example, to extract the text of the title element, you can use the XPath selector //title/text(), like this:
title = selector.xpath('//title/text()').get()print(title) |
This will print the text "My Web Page", which is the value of the title element in the HTML document.
You can also use CSS selectors to extract data from the document. For example, to extract the text of the introductory paragraph, you can use the CSS selector p.intro::text, like this:
intro = selector.css('p.intro::text').get()print(intro) |
This will print the text "This is the introductory paragraph."