9951 explained code solutions for 126 technologies


python-regexHow to match HTML tags with regex in Python?


Matching HTML tags with regex in Python can be done using the re module. The re.findall() function can be used to find all occurrences of a pattern in a string. For example, the following code will find all HTML tags in a string:

import re

html_string = "<p>This is a paragraph</p><h1>This is a heading</h1>"

tags = re.findall(r"<[^>]*>", html_string)

print(tags)

Output example

['<p>', '</p>', '<h1>', '</h1>']

The code works by using the re.findall() function to search for all occurrences of a pattern in a string. The pattern used is r"<[^>]*>", which matches any HTML tag. The [^>] part of the pattern means that any character that is not a > character can be matched.

The output of the code is a list of all the HTML tags found in the string.

Code explanation

  • re.findall(): This function searches for all occurrences of a pattern in a string.
  • r"<[^>]*>": This is the pattern used to match HTML tags. The [^>] part of the pattern means that any character that is not a > character can be matched.

Helpful links

Edit this code on GitHub