|
The command egrep is like grep -E and interprets the pattern like a
extended regular expression.
Shows all the lines that begin with From or Subject in the Inbox file.
cat Inbox|egrep '^(From|Subject)'
^ matches at the begining of a line (caret symbol)
$ matches at the end of a line (dollar symbol)
Command
|
Matching lines
|
| cat file.txt | egrep
'^sorgonet.com' |
sorgonet.com is nice
|
| cat file.txt | egrep
'sorgonet.com$' |
I like sorgonet.com
|
| cat file.txt | egrep
'^sorgonet.com$' |
sorgonet.com
|
You can express a list of characters by using [...]
The [...] is called a character class.
cat file.txt | egrep 'I have [Aa] website'
cat file.txt | egrep 'try in gr[ae]y color'
A range of characters [a-z]
cat file.txt | egrep ' the three digit code is [0-9][0-9][0-9]'
Multiple ranges are also allowed:
[a-zA-Z0-9] will match any letter or number in that position.
Be carefull, because [^a-z] inside a character class indicates
negation, and will match any character outside the a-z range, so:
cat file.txt | egrep '^[^a-zA-Z]
will match lines where it's first character is not a letter.
The dot . matches any character (if you use it outside a character
class)
egrep '.a' file.txt
Will match lines like: 1234a, jjjjjjaaa, uuauuuu
But not: ammm
The symbol | means or
egrep '^(david|john|philip)' file.txt
Matches any line begining with david or john or philip.
When you want to match a word, the \< \> symbols comes handly, it
detects "automagically" a single word by checking the boundary
characters.
egrep '\<[dD]avid\>' file.txt
Quantifiers
Metacharacters + , * and ? are called quantifiers
+ will match one or more times the preceding item
* will match one or more times the preceding item, but 0 times is also
allowed
The character ? means optional and is used after the character that
could or could not be there.
egrep 'encyclopa?edia' file.txt
egrep -i ignores case characters, this is not a part of the regular
expression language, but is handful to know.
Using Perl with regular
expressions.
Perl will allow to use much more complex regular expressions (regex)
than egrep, and there are sligthy differences in notation.
Sample code in Perl:
if ($answer =~ m/^[a-zA-Z]+$/) {
print "only letters\n";
} else {
print "not only letters\n";
}
The surrounding m/..../ means to attempt a regular experssion match,
and the slashes delimit the regular expression itself.
The operator =~ links the string to be searched with the regular
expression. You can read the operator =~ as "matches"
$1 $2 $3 and so on, in Perl represents a special variables that are the
matching parts of a regex between (), example:
if ($result=~m/([a-h][0-9])(a|p)/) {
print $1; #will print the letter and the number, first parentheses
print $2; #will print the last parentheses, letter a or letter p
}
The operator =~ means match.
The operator !~ means don't match.
Character Classes are slighty different in regex and Perl.
Remember that character classes are enclosed between []
[\t] matchs a TAB
[\n] matchs a newline
[\b] a whitespace
\b means word boundary in regex, but it's nosense within a character
class, so it represents a whitespace if it's inside a character class
Perl has a metacharacter \s it means "whitespace character", this
includes among others, space, tab, newline and carriage return.
Usefull shorthands that Perl provides us:
\w the same as [a-zA-Z0-9_] to match a word
\W anything not \w
\d the same as [0-9] a digit
\D anything not a digit [^0-9]
Modifiers.
Modifiers are placed after the m/..../
$result =~ m/[a-z]/i
/i tells Perl to do the match in a case-insensitive manner. It's not
part of the regex, but part of the m/.../ syntactic packaging.
Replacing text using regex.
Instead of using m/.../ we can use s/.../.../ that is:
s/stringtosearchandreplace/stringtoreplacewith/
$result =~ s/Sorgo/Sorgonet/;
That will search for the string Sorgo and replace the first ocurrence
by the string Sorgonet.
Adding a /g will mean globally match and will change all the strings in
that text instead of only the first occurrence.
$result =~ s/Sorgo/Sorgonet/g;
A tricky example is:
$result =~ s/SoRGo/Sorgonet/ig
That will replace all the strings like sorgo,sorGO,SoRgo,SoRgO, by the
string Sorgonet as it is. Case doesn't matter on the first string, but
it'll be replaced by exactly the string Sorgonet with only it's first
letter S in uppercase. I added /ig to introduce this syntax where you
can combine /ig to match globally and case insensitive.
We can replace a string easily with only one line in Perl
% perl -p -i -e 's/Sorgo/Sorgonet/g' file
-e indicates that the entire Perl code follows the command line and -i
-p is for working with the given file.
Intervals
Intervals are like a "counting quantifier" where you specify the
minimum number of matches you need and the maximum number to allow.
[a-z]{3} matches exactly 3 times a lowercase letter
[a-z]{1,5} matches min. one time max. five times a lowercase letter
The use of parentheses
To match a From: line in a email you can use this regex: m/^From: /
but if you want to use later on your program who is it from, you better
use: m/^From: (.*)/ thats because in Perl the variable $1 will contain
the string that comes after From: (the dot means anycharacter and
the star means 0 or more times.
print $1; #in Perl will print who is sending the email.
Regular Expressions Examples:
To match a IP adress. We need 4 numbers separated by a dot and only
from 0 to 255. Numbers can be one digit, two or three.
\d|\d\d|[01]\d\d|1[0-4]\d|25[0-5]
or we can do it shorter like:
[01]?\d\d?|2[0-4]\d|25[0-5]
This lines matches a number from 0 to 255 and you need it 4 times to
have a complete IP adress, like
^([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])$
The caret at the begining means that it must start with that number,
and the dollar at the end that it must finish with that last number.
The dot must be escaped \. else it will mean any character.
|