Check out our new Proxy Tester
Blog
XPath Contains Use Cases: Examples and Tips
Explainers

XPath Contains Use Cases: Examples and Tips

XPaths.webp

Without the right approach, extracting the right information from a webpage can be difficult. Let's say, you're automating tests, scraping web content, working with XML data, or locating elements, you might have realized how hard it is to do manually.

Fortunately, XPath comes to the save by helping you locate element names, attributes, or text with ease. The best part is that it is supported by prominent automation tools and even browsers through the developer tools console.

If you’re new to XPath, it is an expression language used to navigate and extract data from XML and HTML documents. XPath has a wide collection of functions curated for different purposes. In this article, let's focus on the most crucial XPath contains().

I will explain how XPath contains() works, when to use it, provide real-world examples, and also alternatives so you’re all covered when dealing with complex scenarios like dynamic web pages, inconsistent attributes, or unpredictable text content.

Why XPath contains() Is Important?

Based on our implemented research, XPath’s contains() function is widely used for several purposes. It comes in handy with automation frameworks like Selenium, web scraping tools like Scrapy, and data extraction engines. Here is why XPath contains() is important and a must learn function.

  • Locating elements with dynamic attributes that change with each page load.
  • Extracting content from partially known text values, making it useful when exact text is unavailable.
  • Handling websites with inconsistent markup, where attributes or text can vary slightly across elements.

Now that we understand how important XPath contains() is, let's start learning by understanding its syntax and how it differs from other functions of XPath.

Understanding XPath Contains

XPath functions play a crucial role in locating and extracting data from structured documents. Learning how to use them makes tasks like web scraping and automation more efficient. While they might seem complex, you can get a quick hold of them by understanding their syntax.

The basic syntax of XPath contains() follows this structure:

//*[contains(attribute, 'substring')]

//* - Selects all elements. You can also replace this with the element attribute.

attribute: This refers to the HTML attribute or text content you want to search within. It could be:

  • @class: Looks inside the class name
  • @id: Searches within the ID attribute
  • @href: Finds a match within the link URL
  • text(): Checks the visible text of an element

substring: This is the partial value you’re looking for within the attribute. It doesn’t have to be an exact match.

Use Cases of XPath Contains

XPath’s contains() function lets you find the dynamically changing elements such as text, class names, or IDs. This flexibility is handy, especially when dealing with product listings, search results, or interactive elements without worrying about exact values.

The use cases of XPath aren’t limited to a few and go beyond depending on how you use it. To better understand, I will explain a few crucial scenarios where contains() helps locate elements efficiently through Amazon’s product listing page.

Find Elements by Class Name using contains()

Class names in HTML arguably have multiple values. As a result, it often turns hard to find elements with an exact match. In such a scenario, the contains() function solves this by allowing you to search for a specific word inside the class name. Through this, you can still locate the element even if the class list changes.

For example, if elements of the page always have "search" somewhere in its class name, we can find it like this:

//*[contains(@class, 'class_name')]
Find Elements by Class Name using XPath Contains

Locate Elements by ID using contains()

Websites sometimes generate dynamic IDs that change with each page load. Instead of searching for the full ID, contains() helps locate elements based on a consistent part of the ID. It is suitable for locating buttons, form fields, and dynamic page elements.

For example, an "Add to Cart" button might have different IDs. In this scenario, you can locate one ID and find the rest with ease.

//*[contains(@id, 'id_value')]
Locate Elements by ID using XPath Contains

Extract Links by URL using contains()

Many websites consistently categorize their URLs. The best example is e-commerce platforms that structure their products, followed by the URL. In this case, you can avoid the hassle of searching for the full link and use contains() to filter links based on a specific keyword that appears in their URLs.

For example, Amazon product pages always have "/dp/" in their URLs, which acts as a unique identifier for products. Rather than searching for a full product URL, we can extract all product links by using:

//*[contains(@href, 'url_value')]
Extract Links by URL with XPath Contains

Find Elements by Partial Text Match using contains()

Webpages often change the button labels, headings, or descriptions slightly without changing the entire meaning. In this scenario, matching the exact text becomes unreliable and contains() helps grab elements even when the text varies.

Let's say you’re looking for the term Dual SIM or Dual Camera, you can search for the term Dual using the contains text function.

//*[contains(text(), 'text_value')]
Find Elements by Partial Text Match using XPath Contains

Using contains() for Nested Elements (Parent-Child Relationship)

Sometimes you need to extract entire sections of a webpage, not just a single element. It comes in handy when elements are grouped together in a structured way, such as product listings, article sections, or navigation menus.

Let's open the Computers and Tablets category on Amazon. From the available list, let's say we want to find product sections that include the <h2> title mentioning "Samsung". Instead of just grabbing the <h2>, we can select the entire container using:

//div[contains(.//h2, 'title_value')]
Extract Sections using XPath Contains

Practical Examples

With an understanding of how XPath contains() helps locate and select elements, let’s find out how it comes in handy in actual situations. According to the analysis aggregated by Ping Proxies, XPath contains() goes a long way, playing a crucial role in interacting with dynamic elements in testing frameworks to filtering data in web scraping.

Here is a quick look at how it is often used:

Using XPath Contains with Selenium for Testing

Finding elements is one of the biggest challenges while performing automated testing, especially when dealing with dynamic attributes, changing IDs, or varying text labels. This is where XPath contains() turns out to be a great use, by helping locate elements even if their values aren’t static.

Let’s say you're trying to find a Login button in Selenium, but its ID keeps changing:

<button id="btn-login-123">Login</button>
<button id="btn-login-456">Login</button>

In this case, writing multiple XPath queries isn’t the best choice. Instead, you can use the contains() function to select the login button no matter how the ID changes:

//button[contains(@id, 'btn-login')]

Here’s how this can be implemented in Selenium (Python) through the sample script below. This script makes sure to find the ID of the button regardless of the change.

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://example.com")

login_button = driver.find_element("xpath", "//button[contains(@id, 'btn-login')]")
login_button.click()

XPath Contains for Web Scraping Scenarios

If you have been doing web scraping, you might have a hard time extracting elements that don’t have fixed attributes. Some elements have dynamically generated class names, inconsistent IDs, or varying text content, making it difficult to locate them using exact matches.

XPath contains() function turns out to be the best fit in this scenario as it allows you to match elements even on partial values. This makes your scraper more efficient and less prone to failure.

For example, if there’s an e-commerce website where product titles are wrapped in <h2> tags with dynamic class names like below.

<h2 class="title-product-new">Smartphones and Wearables</h2>
<h2 class="title-product-featured">Laptops and Tablets</h2>

If you notice, the class names change slightly. Instead of writing multiple XPath queries needed for an exact match, you can use contains() to grab all product titles.

//h2[contains(@class, 'title-product')]

In Scrapy (Python), this would look like:

import scrapy

class ProductSpider(scrapy.Spider):
    name = "products"
    start_urls = ["https://example.com"]

    def parse(self, response):
        titles = response.xpath("//h2[contains(@class, 'title-product')]/text()").getall()
        print(titles)

Examples with Different Programming Languages

The best part about XPath is that it can be used with multiple programming languages, including JavaScript, Java, XML Schema, PHP, Python, C, C++, and more. While the implementation might vary, the syntax remains almost similar.

Here are some code snippets showing how contains() works across different automation tools and programming languages.

Note: The snippets below are pseudo codes. They are strictly for reference and must be updated according to your needs.

Python (LXML / Scrapy)

The below code helps extract all buttons containing "Buy" from an HTML page. This helps locate all the buy buttons, even if the wording varies.

from lxml import html
import requests

response = requests.get("https://example.com")
tree = html.fromstring(response.content)

buttons = tree.xpath("//button[contains(text(), 'Buy')]/text()")
print(buttons) 

JavaScript (Puppeteer)

The below code helps select the first <a> tag that contains "login" in its URL. You can use it to identify login links without needing an exact match.

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://example.com');

    const links = await page.$x("//a[contains(@href, 'login')]");
    console.log(await (await links[0].getProperty('href')).jsonValue());

    await browser.close();
})();

Java (Selenium WebDriver)

The code below helps you find a button whose ID includes the word "submit." This makes sure buttons like "submit-form" or "submit-btn" are correctly detected.

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class XPathExample {
 public static void main(String[] args) {
        WebDriver driver = new ChromeDriver();
        driver.get("https://example.com");

        WebElement button = driver.findElement(By.xpath("//button[contains(@id, 'submit')]"));
        System.out.println(button.getText());

        driver.quit();
    }
}

Confused about choosing between Selenium and Puppeteer for automated testing? Check out our detailed comparison of Puppeteer vs Selenium.

Best Practices in Building XPath Queries

XPath is a powerful tool, but using it effectively requires the right approach. Based on our implemented research, a well-structured XPath query can make element selection faster and more reliable.

Here are a few tips for using XPath, along with examples to help you avoid common mistakes that can lead to errors or inefficiencies.

  • Use Relative XPath Over Absolute XPath: Avoid long, rigid paths (/html/body/div/...). Instead, use shorter, more adaptable queries (//div[contains(@class, 'product')]).
  • Use contains() for Dynamic Elements: If attributes or text change frequently, use contains() to match partial values instead of exact strings.
  • Use Starts-with() for Structured Attributes: When identifiers follow a pattern, starts-with() is more efficient than contains(), e.g., //input[starts-with(@id, 'search-')].
  • Combine Multiple Conditions: Use AND (and) and OR (or) operators to refine selections, e.g., //div[contains(@class, 'product') and contains(@id, 'item')].
  • Use Text-Based Selection Wisely: While contains(text(), 'example') is useful, if the text varies across elements, target child elements instead (//div[h2[contains(text(), 'Laptop')]]).
  • Optimize for Performance: Avoid //* (wildcard searches). Instead, target specific elements (//button[contains(@class, 'buy-now')]) to speed up queries.
  • Test Queries in Developer Tools: Use Chrome DevTools ($x("XPath") in the Console) or online XPath testers to validate queries before implementation.
  • Ignoring Case Sensitivity: XPath contains is case-sensitive. Use functions like translate() if you need case-insensitive matching.
  • Using Contains() Incorrectly: The contains(@class, 'btn') can match "btn-primary" and "cancel-btn". Use space-wrapping for exact class matches.
  • Know Your XPath Version: XPath 2.0+ has stricter rules and better data handling than XPath 1.0. In 1.0, contains() works on node sets but only considers the first node’s string, while 2.0+ requires explicit selection (//*[text()[contains(., 'target string')]]).

XPath Vs CSS Selectors

If XPath doesn’t fit your needs, what is the next best choice? Our data suggests that most recommend and prefer using CSS Selectors, but why is that? While both methods help locate elements on a web page, they have different capabilities. Here’s a quick comparison to help you decide when to use each.

XPath vs CSS Selectors
Factor XPath CSS Selectors
Element Selection Can locate elements based on structure, attributes, or text, useful when standard attributes are missing. Directly selects elements using ID, class, name, or attributes but lacks positional flexibility.
Ease of Reading Often longer and harder to read, especially with absolute paths. Shorter and more readable, making queries cleaner.
Use Cases Can select elements by text content, which CSS Selectors cannot do. Limited to selecting elements by attributes; cannot match text content directly.
Navigation Supports bidirectional navigation (parent-child and child-parent selection). Supports only parent-to-child traversal.
Difficulty Level More powerful but requires deeper understanding to use effectively. Simpler and easier to use but less flexible for complex queries.

Wrapping Up

XPath contains() is great for working with dynamic elements. It helps you with complex tasks like automating tests, scraping data, or extracting elements. However, it isn’t the only function with crucial capabilities.

Other XPath functions like starts-with(), last(), normalize-space(), position(), substring(), translate(), etc are equally useful. The key to using XPath efficiently is choosing the right function for your needs.

Also, keep in mind that not every issue needs XPath. Sometimes, CSS Selectors do the job faster. And if you’re scraping data at scale, ethically sourced proxy solutions like Ping Proxies help you stay undetected and avoid restrictions. Hence, refine your approach, test your queries, and always choose the right tools for the job.

Residential Proxies
  • 35 million+ real residential IPs

  • ISP & City Targeting

  • Less than 0.5 second connect times

cookies
Use Cookies
This website uses cookies to enhance user experience and to analyze performance and traffic on our website.
Explore more