Even if this is homework - I'll explain it as it may benefit the community here too.
Both are exactly the same.
-w matches the whole word
\b matches
any word boundary. So that could be a start, or end boundary.
So surrounding any word with the \b boundaries e.g. "\bword\b" is pretty much exactly the same as the -w (whole word) option.
In regex syntax, you also have \< which matches the start of word boundary and \> which matches the end of word boundary. Which means you could search for words that begin or end with a particular pattern.
So if you consider a file with the following lines:
Code:
endanger
endearing
endurance
lands-end
legend
Pete Townsend
the end
weekend
Here are some egrep commands, using the various word boundary options above and the output we'll see (with the matching parts in
bold):
Using
egrep
with the
-w
/ whole word operator
would output:
The apostraphe - counts as a word boundary, so the line
lands-end
is matched, because it contains the whole word
end
.
Likewise, using the
\b
operators:
would yield:
Exactly the same as the
-w
option.
Now let's take a look at the start/end boundary operators:
If we use both of them together, we'll get the same as the
-w
and
\b
operators:
As expected, yields:
So what are the point of the boundary operators?
Well, if you use them carefully, you can build more powerful search patterns.
So for example, if we just used the start boundary operator, we can find all lines which have the word "end" at the start of the word boundary.
So only using the start boundary operator to find lines containing words that start with "end":
This yields the following:
So now we've got any lines containing words that start with "end".
Likewise, the end boundary operator:
Would yield lines containing words that ended with "end":
OK, now lets take things a bit further.
Consider this text file:
Code:
The endurance of olympic athletes.....
He finally reached Lands-End.
He was a legend.
The frontman of The Who was Pete Townsend.
Therein, he realised how he could end his problems.
The end......
..... of the weekend!
And now let's try to find ALL lines that have "the" at the start of a word boundary
and that have the sequence "end" at the end of a word boundary, but with any number of characters in between those two conditions. And we'll do this with a case-insensitive search:
Bash:
egrep -i "\<the.*end\b" ./file2
That will output the following lines (matching parts in
BOLD):
In that previous
egrep
I could have explicitly used the end-of-word-boundary operator
\>
at the end, but
\b
was slightly less effort to type! Ha ha!
Also if we only wanted to see the matching parts of the line instead of the whole line, we could have added the
-o
option.
Like I say, all of those boundary operator variants have their uses. And when combined with other extended regex operators, you can build extremely powerful regexes.
Sometimes you might want/need to be explicit in whether a boundary is a start-of-word, or end-of-word boundary, so you'd use
\<
and/or
\>
.
Other times, you might not care and just use
\b
.
And if you're just looking for lines containing a simple, whole word, you'd probably just use the
-w
option.