Regular expressions in Linux, meaning and difference between ".*" and ".+"?

Joined
Apr 16, 2023
Messages
149
Reaction score
16
Credits
1,460
I'm practicing regex in linux to advance my word processing skills. I wonder what's the meaning of .* and .+?
What I already know?
* matches 0 or more characters.
+ matches 1 or more characters.
What's "dot" doing there? What'd be the difference if we didn't use the dot?
 


The dot matches any single character, but not a new line. The plus matches one or more instance of a preceding regular expression. However, not all symbols mean the same in all linux and unix programs. The dot is pretty consistent. The plus varies.

Check out for instance Chapter 32 here: https://doc.lagout.org/operating system /linux/Unix Power Tools.pdf
 
Last edited:
The dot matches any single character, but not a new line. The plus matches one or more preceding characters. However, not all symbols mean the same in all linux and unix programs. The dot is pretty consistent. The plus doesn't work in grep but does in egrep.

Check out for instance Chapter 32 here: https://doc.lagout.org/operating system /linux/Unix Power Tools.pdf
I found my question unanswered. What does the dot matches? Any single character? Can you share an example?
 
I found my question unanswered. What does the dot matches? Any single character? Can you share an example?
The plus matches one or more preceding regular expressions, which are characters of course. You caught my post just before I edited it to make it clearer.

As I mentioned, not all linux programs use the same symbols the same way. In the following, grep doesn't respond to +, but egrep does and produces the strings with "hell" since the dot is any character, and the plus repeats that "any character" selecting the character after the character the dot determined, then those two characters had to be preceded by the l specified in the command, so only strings with the l are selected.

That's a bit convoluted in expression, but if you go through it one character at a time it'll gel.

Code:
[flip@flop ~]$ cat file1
hello2
goodbye
arriverderci
bonjour
hello
goodbye
arriverderci
bonjour
hello
goodbye
arriverderci
bonjour
[flip@flop ~]$ grep .+l file1
[flip@flop ~]$ grep +. file
[flip@flop ~]$ egrep .+l file1
hello2
hello
hello

If you run the above in a terminal and your grep/egrep is configured to show colours, it will be clearer.
 
Last edited:
Here's an answer from bard for your consideration:
Code:
The .* and .+ symbols in regular expressions in Linux mean the same thing:
they match any number of characters, including zero characters.
The only difference is that .* is a non-greedy quantifier, while .+ is a greedy quantifier.

A non-greedy quantifier will match the fewest number of characters possible,
while a greedy quantifier will match the most number of characters possible.

For example, the regular expression .* will match the following strings:

    "" (the empty string)
    "a"
    "ab"
    "abc"
    "abcd"

The regular expression .+ will also match all of these strings, but it will also match
the following strings:

    "aaaaa"
    "aaaaaaaaaaaaaaaa"

The difference is that the regular expression .* will stop matching characters as soon
as it finds a non-matching character, while the regular expression .+ will continue
matching characters until it reaches the end of the string.

Note the seeming contradiction: they mean the same thing, but then there is a difference!
It's worth keeping in mind the other comments earlier.
 
Last edited:

Members online


Top