TL, DR
Scrapy is one of the top frameworks for web scraping in Python. Often you need to select some element where you already know part of its text. This post shows you how to select elements containing a certain text with CSS and XPATH selectors.
How to select elements containing a certain text in Scrapy?
Scrapy is the tool of choice for many webscraping projects in Python. Its elements are quite useful also for just parsing HTML text recovered using other ways, like plain Requests calls.
You can install Scrapy using pip, it’s just:pip install scrapy
and you are good to go.
Sometimes you need to select some elements where you already know part of the text. There is a simple way to do it with CSS and XPATH selectors.
import requests
from scrapy import TextResponse
response = requests.get("your_target_link")
parsed = TextResponse(response.url, body=response.text, encoding="utf-8")
parsed.css("tag:contains('your target text')").get()
parsed.xpath('.//span[contains(text(),"your target text")]').get()
There are a few steps that require some further explanation. The first two lines import the Requests library and TextResponse from Scrapy
import requests
from scrapy import TextResponse
This commands instead fetches a webpage using Requests:
response = requests.get("your_target_link")
Then we convert the data we obtained in a TextResponse object, enabling the use of Scrapy selectors.
parsed = TextResponse(response.url, body=response.text, encoding="utf-8")
Finally, we use CSS and XPATH selectors to extract an element which contains a certain text:
parsed.css("tag:contains('your target text')").get()
parsed.xpath('.//span[contains(text(),"your target text")]').get()
That’s it, now you can make use of the element you got, or chain further selectors, or extract a particular attribute. I will publish further tutorials on webscraping, you can find them at this link (also reported below).
Related links
Do you like our content? Check more of our posts in our blog!