Scrapy: Select elements containing a specific text

TL, DR

Scrapy is one of the top frameworks for web scraping in Python. Often you need to select some element where you already know part of its text. This post shows you how to select elements containing a certain text with CSS and XPATH selectors.

How to select elements containing a certain text in Scrapy?

scrapy select elements containing text — Scrapy is a webscraper best friend 🙂

Scrapy is the tool of choice for many webscraping projects in Python. Its elements are quite useful also for just parsing HTML text recovered using other ways, like plain Requests calls.

You can install Scrapy using pip, it’s just:
pip install scrapy
and you are good to go.

Sometimes you need to select some elements where you already know part of the text. There is a simple way to do it with CSS and XPATH selectors.

import requests
from scrapy import TextResponse

response = requests.get("your_target_link")

parsed = TextResponse(response.url, body=response.text, encoding="utf-8")

parsed.css("tag:contains('your target text')").get()
parsed.xpath('.//span[contains(text(),"your target text")]').get()

There are a few steps that require some further explanation. The first two lines import the Requests library and TextResponse from Scrapy

import requests
from scrapy import TextResponse

This commands instead fetches a webpage using Requests:

response = requests.get("your_target_link")

Then we convert the data we obtained in a TextResponse object, enabling the use of Scrapy selectors.

parsed = TextResponse(response.url, body=response.text, encoding="utf-8")

Finally, we use CSS and XPATH selectors to extract an element which contains a certain text:

parsed.css("tag:contains('your target text')").get()
parsed.xpath('.//span[contains(text(),"your target text")]').get()

That’s it, now you can make use of the element you got, or chain further selectors, or extract a particular attribute. I will publish further tutorials on webscraping, you can find them at this link (also reported below).

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

TL, DR

How to select elements containing a certain text in Scrapy?

Related links