I apologize for doing too many articles in a short amount of time. Chriastmas vacation is over and I have to go back to work Monday
Regular Expressions Part 1: Introduction & Basics
Regular expressions (regex) are powerful pattern-matching tools used throughout Linux and programming. They allow you to search, match, and manipulate text based on patterns rather than exact strings.
What is a Regular Expression?
A regular expression is a sequence of characters that defines a search pattern. Instead of searching for the exact text "error", you could search for any line containing "error" or "Error" or "ERROR", or even more complex patterns like "any word starting with 'err'".
Where Regex is Used
Regular expressions appear in many Linux tools and programming languages:
Command-line tools:
Programming languages:
Understanding POSIX Flavors
POSIX (Portable Operating System Interface) defines two main regex flavors for Unix/Linux systems:
Basic Regular Expressions (BRE):
Extended Regular Expressions (ERE):
We'll cover both flavors in detail in upcoming posts. For now, understand that the same pattern might require different syntax depending on which tool you're using.
Basic Pattern Elements
Literal characters match themselves:
Matches the exact text "error"
The dot (.) matches any single character:
Matches "error", "e5ror", "e ror", etc.
Character classes [ ] match any one character inside:
Matches "error" or "Error"
Matches any single digit
Matches any lowercase letter
Matches any letter or digit
Negated character classes [^ ] match anything NOT listed:
Matches any character that's not a digit
Anchors
Anchors don't match characters—they match positions:
^ matches start of line:
Matches "Error" only at the beginning of a line
$ matches end of line:
Matches "Error" only at the end of a line
Combined:
Matches lines containing only "Error" (nothing before or after)
Quantifiers (How Many Times)
* matches zero or more of the preceding character:
Matches "errr", "error", "eroor", "errrrr", etc.
Note: In BRE (basic grep/sed), you use * directly. Other quantifiers require special handling, which we'll cover in later parts.
Escape Character
The backslash \ makes special characters literal:
Matches an actual period (not "any character")
Matches a dollar sign (not "end of line")
Simple Examples
Find lines containing "error" (case-insensitive):
Find lines starting with a number:
Find email-like patterns:
Find empty lines:
Find lines containing "error" or "warning":
Common Predefined Character Classes
Many regex flavors support shorthand for common patterns:
Used inside character classes:
Matches any line containing a digit
Testing Your Regex
Before using regex in scripts, test it:
Interactive testing with grep:
Show matching part with color:
Count matches:
Next in This Series
In Part 2, we'll dive deeper into POSIX Basic Regular Expressions (BRE), including:
Regular Expressions Part 1: Introduction & Basics
Regular expressions (regex) are powerful pattern-matching tools used throughout Linux and programming. They allow you to search, match, and manipulate text based on patterns rather than exact strings.
What is a Regular Expression?
A regular expression is a sequence of characters that defines a search pattern. Instead of searching for the exact text "error", you could search for any line containing "error" or "Error" or "ERROR", or even more complex patterns like "any word starting with 'err'".
Where Regex is Used
Regular expressions appear in many Linux tools and programming languages:
Command-line tools:
Code:
grepsedawklessvim
Programming languages:
Code:
PerlPythonJavaScriptPHPRubyJava
Understanding POSIX Flavors
POSIX (Portable Operating System Interface) defines two main regex flavors for Unix/Linux systems:
Basic Regular Expressions (BRE):
- Default in grep and sed
- Requires backslashes for special characters
- More verbose syntax
- Limited feature set
Extended Regular Expressions (ERE):
- Used with grep -E (or egrep) and awk
- Cleaner syntax (fewer backslashes)
- More intuitive
- Additional operators
We'll cover both flavors in detail in upcoming posts. For now, understand that the same pattern might require different syntax depending on which tool you're using.
Basic Pattern Elements
Literal characters match themselves:
Code:
error
The dot (.) matches any single character:
Code:
e.ror
Character classes [ ] match any one character inside:
Code:
[Ee]rror
Code:
[0-9]
Code:
[a-z]
Code:
[A-Za-z0-9]
Negated character classes [^ ] match anything NOT listed:
Code:
[^0-9]
Anchors
Anchors don't match characters—they match positions:
^ matches start of line:
Code:
^Error
$ matches end of line:
Code:
Error$
Combined:
Code:
^Error$
Quantifiers (How Many Times)
* matches zero or more of the preceding character:
Code:
erro*r
Note: In BRE (basic grep/sed), you use * directly. Other quantifiers require special handling, which we'll cover in later parts.
Escape Character
The backslash \ makes special characters literal:
Code:
.
Code:
$
Simple Examples
Find lines containing "error" (case-insensitive):
Code:
grep -i error logfile.txt
Find lines starting with a number:
Code:
grep '^[0-9]' file.txt
Find email-like patterns:
Code:
grep '[a-zA-Z0-9]@[a-zA-Z0-9].' file.txt
Find empty lines:
Code:
grep '^$' file.txt
Find lines containing "error" or "warning":
Code:
grep 'error|warning' file.txt
Common Predefined Character Classes
Many regex flavors support shorthand for common patterns:
Code:
[:alnum:] Alphanumeric characters[:alpha:] Alphabetic characters[:digit:] Digits 0-9[:lower:] Lowercase letters[:upper:] Uppercase letters[:space:] Whitespace (space, tab, newline)[:punct:] Punctuation characters
Used inside character classes:
Code:
grep '[[:digit:]]' file.txt
Testing Your Regex
Before using regex in scripts, test it:
Interactive testing with grep:
Code:
echo "test string" | grep 'pattern'
Show matching part with color:
Code:
grep --color 'pattern' file.txt
Count matches:
Code:
grep -c 'pattern' file.txt
Next in This Series
In Part 2, we'll dive deeper into POSIX Basic Regular Expressions (BRE), including:
- Why certain characters need escaping
- Grouping and backreferences
- Repetition patterns
- Real-world examples with grep and sed

