python-regexHow to match HTML tags with regex in Python?
Matching HTML tags with regex in Python can be done using the re
module. The re.findall()
function can be used to find all occurrences of a pattern in a string. For example, the following code will find all HTML tags in a string:
import re
html_string = "<p>This is a paragraph</p><h1>This is a heading</h1>"
tags = re.findall(r"<[^>]*>", html_string)
print(tags)
Output example
['<p>', '</p>', '<h1>', '</h1>']
The code works by using the re.findall()
function to search for all occurrences of a pattern in a string. The pattern used is r"<[^>]*>"
, which matches any HTML tag. The [^>]
part of the pattern means that any character that is not a >
character can be matched.
The output of the code is a list of all the HTML tags found in the string.
Code explanation
re.findall()
: This function searches for all occurrences of a pattern in a string.r"<[^>]*>"
: This is the pattern used to match HTML tags. The[^>]
part of the pattern means that any character that is not a>
character can be matched.
Helpful links
More of Python Regex
- How to replace all using Python regex?
- How to match zero or one occurence in Python regex?
- How to get a group from a regex in Python?
- How to get all matches from a regex in Python?
- How to validate an IP using Python regex?
- How to perform a zero length match with Python Regex?
- How to use word boundaries in Python Regex?
- How to match a hex number with regex in Python?
- How to match a float with regex in Python?
See more codes...