Regular expression represents as "re" [ Source from https://pymotw.com/2/re/) ]
What is Regular Expression? (re)
What is RAW python strings?
When writing regular expression in Python, it is recommended that you use raw strings instead of regular Python strings. Raw strings begin with a special prefix (r) and signal Python not to interpret backslashes and special meta characters in the string, allowing you to pass them through directly to the regular expression engine.
This means that a pattern like "\n\w" will not be interpreted and can be written as r"\n\w" instead of "\\n\\w" as in other languages, which is much easier to read.
Regular expression Methods:
It will match to the Beginning pattern of string.
Usage:
re.match(pattern, string, flags=0)
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.
Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.
Examples:
re.match(r'c', "abcdef") >> No match
re.match(r'cat', 'dog cat dog') >> No Match
re.match(r'dog', 'dog cat dog') >> Match
From the above examples, it is evident that re.match() method tried to find the given pattern at beginning of the string. If it matches, then it returns the Match object else NONE.
re.search() - Matches at Anywhere
The search() method is similar to match(), but search() doesn’t restrict us to only finding matches at the beginning of the string, so searching for ‘cat’ in below example string finds a match:
Examples #1:
re.search(r'cat', 'dog cat dog') >> Match
Usage:
re.search(pattern, string, flags=0) Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
Tip:
If you want to locate a match anywhere in string, use search() instead match().
Examples #2:
re.search(r'c', "abcdef") >> Match
Regular expressions beginning with '^' can be used with search() to restrict the match at the beginning of the string:
Examples #3:
re.search("^c", "abcdef") >> No Match
re.search("^a", "abcdef") >> Match
How about Multi line Matches?
no match: that
Program #3
findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.
What is Regular Expression? (re)
- Regular Expressions are generally described as regex, regexp.
- These are mainly used for matching the text patterns.
- A large number of parsing problems are easier to solve with a regular expression than by creating a special-purpose lexer and parser.
- Expressions can include literal text matching, repetition, pattern-composition, branching, and other sophisticated rules.
- Unix Tools such as SED, grep, awk uses regular expressions internally for finding the particular pattern.
How to use Regular expressions in python?
Step 1: Import the Regular Expression module as - import re
Step 2: Design the Regular Expression to be used for your application.
Step 3: Use the appropriate Regular expression method to parse the text or string.
What is RAW python strings?
When writing regular expression in Python, it is recommended that you use raw strings instead of regular Python strings. Raw strings begin with a special prefix (r) and signal Python not to interpret backslashes and special meta characters in the string, allowing you to pass them through directly to the regular expression engine.
This means that a pattern like "\n\w" will not be interpreted and can be written as r"\n\w" instead of "\\n\\w" as in other languages, which is much easier to read.
Regular expression Methods:
Generally there are three types of methods which are been used more in the regular expressions.
- re.match()
- re.search()
- re.findall()
It will match to the Beginning pattern of string.
Usage:
re.match(pattern, string, flags=0)
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.
Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.
Examples:
re.match(r'c', "abcdef") >> No match
re.match(r'cat', 'dog cat dog') >> No Match
re.match(r'dog', 'dog cat dog') >> Match
From the above examples, it is evident that re.match() method tried to find the given pattern at beginning of the string. If it matches, then it returns the Match object else NONE.
re.search() - Matches at Anywhere
The search() method is similar to match(), but search() doesn’t restrict us to only finding matches at the beginning of the string, so searching for ‘cat’ in below example string finds a match:
Examples #1:
re.search(r'cat', 'dog cat dog') >> Match
Usage:
re.search(pattern, string, flags=0) Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
Tip:
If you want to locate a match anywhere in string, use search() instead match().
Examples #2:
re.search(r'c', "abcdef") >> Match
Regular expressions beginning with '^' can be used with search() to restrict the match at the beginning of the string:
Examples #3:
re.search("^c", "abcdef") >> No Match
re.search("^a", "abcdef") >> Match
How about Multi line Matches?
In MULTI LINE mode match() only matches at the beginning of the string. It means, re.match() will only match at the beginning of the string and not at the beginning of each line.
Whereas using search() with a regular expression beginning with '^' will match at the beginning of each line.
re.match('X', 'A\nB\nX', re.MULTILINE) # No match
re.search('^X', 'A\nB\nX', re.MULTILINE) # Match
re.search('^X', 'A\nB\nX', re.MULTILINE) # Match
re.findall() - All Matching Objects
re.findall() Shall return the list of all matching patterns in the string.
>>> re.findall(r'dog', 'dog cat dog') >> Returns the list of strings that matches to the pattern
['dog', 'dog']
['dog', 'dog']
>>> re.findall(r'cat', 'dog cat dog')
['cat']
['cat']
Note:
re.search() and re.match() return the single instances of literal text strings.
Whereas re.findall() method returns all of the sub strings of the input that match the pattern without overlapping
re.compile() : Compile the Regular Expression Pattern
Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods
pattern="test"
string="testpatterntest"
prog = re.compile(pattern)
result = prog.match(string)
or
re.match(pattern, string)
Using re.compile() shall make reuse of regular expression more efficient, when the expression used several times in a single program.
Python program Examples
re.search() :
The basic rules of regular expression search for a pattern within a string are:
re.compile() : Compile the Regular Expression Pattern
Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods
pattern="test"
string="testpatterntest"
prog = re.compile(pattern)
result = prog.match(string)
or
re.match(pattern, string)
Using re.compile() shall make reuse of regular expression more efficient, when the expression used several times in a single program.
Python program Examples
re.search() :
The basic rules of regular expression search for a pattern within a string are:
- The search proceeds through the string from start to end, stopping at the first match found
- All of the pattern must be matched, but not all of the string
- If
match = re.search(pat, str)
is successful, match is not None and in particular match.group() is the matching text
import re patterns = ["this", "that"] text= "Does this text match this string" for pattern in patterns: match= re.search(pattern, text) if match is not None: print( "Found Match :", match.group()) else: print('no match:', pattern )
Output:
Found Match : this
no match
Program #2 : In this example, re.search() returns the match object if the pattern is matched or found.
From matchObject , you can get StartIndex, endIndex, string, pattern. Check out for more in program.
From matchObject , you can get StartIndex, endIndex, string, pattern. Check out for more in program.
import re
patterns = ["this", "that"]
text= "Does this text match this string"
for pattern in patterns:
matchObject= re.search(pattern, text)
if matchObject is not None:
startIndex = matchObject.start()
endIndex = matchObject.end()
print('Found "%s" in "%s" from %d to %d ("%s")' %
(matchObject.re.pattern, matchObject.string, startIndex,
endIndex, text[startIndex:endIndex]))
else:
print('no match:', pattern )
Output:
Found "this" in "Does this text match this string" from 5 to 9 ("this")no match: that
Program #3
findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.
import re
testPattern ="abc"str="abcbbbabcbbbbabc"
listStr = re.findall('abc', str)
print (listStr)
Output:
['abc', 'abc', 'abc']
Program #4
findall() - use "for" loop to display the strings.
import re
testPattern ="abc"str="abcbbbabcbbbbabc"
listStr = re.findall('abc', str)
for match in listStr:
print ("The Match string :", match)
Output:
The Match string : abc
The Match string : abc
The Match string : abc
Program #5
findall() - With Files
For files, you may be in the habit of writing a loop to iterate over the
lines of the file, and you could then call findall() on each line.
Instead, let findall() do the iteration for you -- much better! Just
feed the whole file text into findall() and let it return a list of all
the matches in a single step (recall that f.read() returns the whole
text of a file in a single string):
import re
testPattern="ab"
# Open file
fp = open('testfile.txt', 'r')
# Feed the file text into findall();
# it returns a list of all the found strings
listStr = re.findall(testPattern, fp.read())
print(listStr)
Output:
['ab', 'ab', 'ab', 'ab', 'ab', 'ab', 'ab', 'ab', 'ab', 'ab', 'ab']
Program #6
Use finditer() rather than findall()
finditer() returns an iterator that produces Match instances
instead of the strings returned by findall().
import re testPattern="muni"
text="munixxxmungggmunixxxmuniaaamuni"
for matchIter in re.finditer(testPattern, text): startIndex= matchIter.start() endIndex= matchIter.end()
print("StartIndex:", startIndex, "EndIndex:", endIndex, matchIter.group())
Output:
StartIndex: 0 EndIndex: 4 muni
StartIndex: 13 EndIndex: 17 muni
StartIndex: 20 EndIndex: 24 muni
StartIndex: 27 EndIndex: 31 muni
Program #7 - re.compile
import re
regex_compiled_object = re.compile("this")
text= "Does this text match the pattern?"
if regex_compiled_object.search(text):
print("found a match!")
else:
print ("no match")
Output:
found a match!