Grep everything between two strings and save that to a file?

Brief-Wishbone9091 · Jul 11, 2023

Example Scenario:

1) Count the number of lines between /start/beginTransaction to /end/endTransaction

2) Grep everything between that.

3) Save to a file.

My attempt.

grep -A{value of count of number of lines between string1 and string2} file_name_to_grep > output_file_name

Similar un-answered question:

Grep something that is between two known strings

I have a large potentially zipped log file and I can identify which line number some text I'm interested in is on using: find . -name "*" -exec zgrep -C 1 -n -i -H TextToFind {} \; But in a second

unix.stackexchange.com

JasKinasis · Jul 12, 2023

The -A option for grep will show the matching line PLUS a particular number of lines AFTER it. So I don't think that's going to help you.

e.g.

Bash:

grep -A 3 pattern /path/to/file

Any time "pattern" is found in /path/to/file, grep will output the line containing the pattern and the next three lines.

Normally, you'd just grep the entire file for the pattern you're looking for. If you're wanting to look for a pattern that occurs between two other patterns - I don't think grep is the best tool for the job.

You could perhaps use sed to output everything between a start and end pattern and then use another sed filter to find any matches for your main search pattern in that chunk. So a sed script would probably work. An awk script might also be an option.

Unfortunately, I haven't really got time to come up with a solution right now. But when I get a chance, I'll have a think and a play and will see what I can come up with.
I imagine you'll want to know where the matches are in the original document. So you'd need a mechanism to keep track of the line numbers in the original file. Hmmm, I'll come back to this when I have some free time!

MikeRocor · Jul 15, 2023

Lacking any even slightly elegant solution, this script should do the trick, though it might be slow if the input file is large.

Code:

#!/bin/sh
# bludgeon - no doubt there's a better way to do this.     2023-07-15 03:27
#
# assumes "start/beginTransaction" and "end/endTransaction" will be unique
#   otherwise, remove the break command.
# assumes "start/beginTransaction" and "end/endTransaction" will each constitute
#   an entire line (no other text, no indentation, etc) otherwise uncomment the two
#   lines with the grep commands and comment the the line below each of those.

IN_FILE="/tmp/far"
OUT_FILE="/tmp/boo"
TAG1="/start/beginTransaction"
TAG2="/end/endTransaction"
#
: >"${OUT_FILE}"
TAGFLAG=""
cat "${IN_FILE}" |while read REC ; do
  # echo "${REC}" |grep -q "${TAG1}" && TAGFLAG="1"
  [ "${REC}" = "${TAG1}" ] && TAGFLAG="1"

  # echo "${REC}" |grep -q "${TAG2}" && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break
  [ "${REC}" = "${TAG2}" ] && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break

  [ -n "${TAGFLAG}" ]      &&               echo "${REC}" >> "${OUT_FILE}"
done
ls -l "${OUT_FILE}"

wendy-lebaron · Jul 15, 2023

Code:

sed -n -e '/begintransaction/,/endtransaction/p' testfile.txt > /tmp/sedtmp
grep "blahblah" /tmp/sedtmp

Try on this output:

Code:

not to be captured!
outside of bounds
begintransaction
89 point blahblah
this forum is cool
iyem just testing
even more blahblah
do not take this seriously LOL
endtransaction
document is over
stop reading
this is just an example...

grep must always work with a file, which is why a temporary file needs to be created. But this doesn't exclude the two "guard" lines, which have to be as simple as possible.

MikeRocor · Jul 15, 2023

See - I just knew there would be an "even slightly elegant" solution. My sed expertise is a bit limited but I'm on friendly terms with grep...

grep does want a file for input, but the file can be stdin, so why not

Code:

sed -n -e '/begintransaction/,/endtransaction/p' /tmp/testfile.txt | grep "blahblah"
89 point blahblah
even more blahblah

? Although this does not leave the full sed results available in a file as per

3) Save to a file.

And if that line count mentioned in

1) Count the number of lines between /start/beginTransaction to /end/endTransaction

is important then we've both dropped the ball. Also, I (perhaps mistakenly) read

2) Grep everything between that.

as A) "use grep to output everything between that" (between the begin and end tags (inclusive))

instead of as B) "search (grep) everything between that (for some search term)"

There might be a way to A using just grep but I'm not on quite -that- friendly a footing with grep, so I really like your sed command.

So maybe...

Code:

sed -n -e '/begintransaction/,/endtransaction/p' testfile.txt > /tmp/sedtmp ; grep "blahblah" /tmp/sedtmp ; echo "" ; wc -l /tmp/sedtmp ; echo "" ; cat /tmp/sedtmp
89 point blahblah
even more blahblah

7 /tmp/sedtmp

begintransaction
89 point blahblah
this forum is cool
iyem just testing
even more blahblah
do not take this seriously LOL
endtransaction

or, w/o the grep blahblah...

Code:

sed -n -e '/begintransaction/,/endtransaction/p' testfile.txt > /tmp/sedtmp ; wc -l /tmp/sedtmp ; echo "" ; cat /tmp/sedtmp
7 /tmp/sedtmp

begintransaction
89 point blahblah
this forum is cool
iyem just testing
even more blahblah
do not take this seriously LOL
endtransaction

Brief-Wishbone9091 · Jul 16, 2023

you guys are great. I'll try each of them.

Brief-Wishbone9091 · Jul 17, 2023

hello guys. could you explain your code word by word? I tried to find logs between 10AM to 14PM but I could not determine how to do using your given commands.

MikeRocor · Jul 18, 2023

NB: For those of you not invested in this thread this, is waaayyyyy tldr. @ Brief-Wishbone9091 - you-did ask for it. Sorry if I went overboard.

I'll go back up to the script I posted earlier because I'm -still- not really a sed guy to the point where I can explain that part well (and I'm too lazy to look it up right now).

Reposting the "bludgeon" script (but I'll probably make some changes to it and list it again further down):

Code:

#!/bin/sh
# bludgeon - no doubt there's a better way to do this.     2023-07-15 03:27
#
# assumes "start/beginTransaction" and "end/endTransaction" will be unique
#   otherwise, remove the break command.
# assumes "start/beginTransaction" and "end/endTransaction" will each constitute
#   an entire line (no other text, no indentation, etc) otherwise uncomment the two
#   lines with the grep commands and comment the the line below each of those.

IN_FILE="/tmp/far"
OUT_FILE="/tmp/boo"
TAG1="/start/beginTransaction"
TAG2="/end/endTransaction"
#
: >"${OUT_FILE}"
TAGFLAG=""
cat "${IN_FILE}" |while read REC ; do
  # echo "${REC}" |grep -q "${TAG1}" && TAGFLAG="1"
  [ "${REC}" = "${TAG1}" ] && TAGFLAG="1"

  # echo "${REC}" |grep -q "${TAG2}" && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break
  [ "${REC}" = "${TAG2}" ] && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break

  [ -n "${TAGFLAG}" ]      &&               echo "${REC}" >> "${OUT_FILE}"
done
ls -l "${OUT_FILE}"

The lines beginning with '#" are comments. The only one of these that matters to the code is the very first one which is a specially formatted comment on the very first line of the script and beginning with "#!" and followed immediately with the path to the interpreter that will interpret (run) the script. In this case, interpreter will be /bin/sh which is the command shell.

If you are saving this in your home directory, note that it will not be seen as an executable program until you issue the command

Code:

chmod u+x ~/bludgeon

which says, "for the user (owner), add execute permission to the file ~/bludgeon". Then also note that your home directory is probably not listed in your command search path so, even if your present working directory is your home directory, you have to specify where to find it every time you run it. Just saying bludgeon (or whatever you called it) won't do the trick. You'll have to say ~/bludgeon

Meawhile, back inside the script, the next four lines after the beginning comment block:

Code:

IN_FILE="/tmp/far"
OUT_FILE="/tmp/boo"
TAG1="/start/beginTransaction"
TAG2="/end/endTransaction"

are initializing some variables.

You would want to set IN_FILE to the pathname of your source file
You would want to set OUT_FILE to whatever file you want to leave your output in
TAG1 is the marker that indicates the start of the section you are interested in
TAG2 is the marker that indicates the end of the section you are interested in
It looks like you would want to set TAG1 to something like "10AM" and TAG2 to something like "14PM".
Note that this script will start its output with the -first- instance of the TAG1 value in the input and will stop its output with the -first- instance of the TAG2 value in the input.

Code:

: >"${OUT_FILE}"
TAGFLAG=""

The first of these lines says, "do nothing (the colon) and redirect the output of that (which, surprisingly enough, is nothing) to the output file. This is slightly different from "touch $OUT_FILE" in that touch creates an empty file if the file does not already exist but only updates the date/time of a file that -does- already exist. This command creates it if it does not exist and makes it empty if it does exist.

TAGFLAG will be an indicator that we have or have not encountered TAG1 in the input. We initialize it to "" (empty string) to indicate "not found". We'll set it to "1" when we find the first instance of TAG1 and we'll reset it to empty when we find the first instance of TAG2

Now for the good stuff:

Code:

cat "${IN_FILE}" |while read REC ; do
  # echo "${REC}" |grep -q "${TAG1}" && TAGFLAG="1"
  [ "${REC}" = "${TAG1}" ] && TAGFLAG="1"

  # echo "${REC}" |grep -q "${TAG2}" && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break
  [ "${REC}" = "${TAG2}" ] && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break

  [ -n "${TAGFLAG}" ]      &&               echo "${REC}" >> "${OUT_FILE}"
done

The first line of this part says "print out the entire source file but, instead of dumping it to the terminal screen, send it as input to ("pipe" it to) the the loop that will read each record (when the read command fails, it means we have reached the end of the input and the while loop will terminate) and decide what to do with it. The loop body ends with "done" (the counterpoint to "do") and there are only three non-empty, non-comment lines inside the loop.

For the processing inside the loop, REC will contain the line most recently read from the input.

(Note that '[' is simply the "test" command with the only difference being that '[' requires a closing ']' whereas "test" does not. If the tested condition is true then test "succeeds" otherwise, test "fails")

So...
test whether the value of REC is exactly the same as the value of TAG1. AND, if that succeeded (&&), set TAGFLAG to "1"

test whether the value of REC is exactly the same as the value of TAG2. And if that succeeded (&&), reset TAGFLAG to "". AND, if that succeeded (which it always will), echo the value of REC onto the end of the output file. AND, if that succeeded (which it will unless your disk is full, in which case you've got bigger fish to fry), "break" out of the loop because we're done. (Note that resetting TAGFLAG to "" here is extraneous because we're going to "break" anyway.)

test whether TAGFLAG in non-empty ( -n ). AND, if that succeeded, echo the value of REC onto the end of the output file.

But what about those two lines inside the loop that are commented out? Each of those is a possible substitute for the line below it because...

The values of TAG1 and TAG2 may or may not appear as a complete line in the original (source) file. If they don't each appear -alone- on a line, then those places (above) where we tested if REC is -exactly- the same TAG1 (or TAG2) will always fail and we'll get no output. In that case we can un-comment (remove the leading '#' from) the two commented lines and comment (add a leading '#' to) the line immediately below each of the currently commented lines). That way, instead testing for whether or not each incoming REC matches either of TAG1 or TAG2, we can test whether each incoming REC -contains- TAG1 or TAG2:

first line in loop...

Code:

echo "${REC}" |grep -q "${TAG1}" && TAGFLAG="1"

echo the value of REC but pipe the output to grep -q (quiet, no output, just succeed or fail) ${TAG1}. AND, if that succeeded (TAG1 value was found anywhere in the record), set TAGFLAG to "1"

that other line that was originally commented inside the loop...

Code:

echo "${REC}" |grep -q "${TAG2}" && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break

echo the value of REC but pipe the output to grep -q (quiet, no output, just succeed or fail) ${TAG2}. AND, if that succeeded (TAG2 value was found anywhere in the record), reset TAGFLAG to "". AND, if that succeeded, echo the value of REC onto the end of the output file. AND, if that succeeded, "break" out of the loop because we're done. (Note that resetting TAGFLAG to "" here is extraneous because we're going to "break" anyway.)

The final line of the script, after the loop ends, just shows a directory listing ( ls -l ) for the output file so you can see if it's empty or not. In the listings below, I'll print a line count ( wc -l instead ). I'll also throw out the bit about resetting TAGFLAG.

---

Code:

#!/bin/sh
# exact match version - begin and end tags appear alone on their lines

IN_FILE="/tmp/far"
OUT_FILE="/tmp/boo"
TAG1="/start/beginTransaction"
TAG2="/end/endTransaction"

: >"${OUT_FILE}"
TAGFLAG=""
cat "${IN_FILE}" |while read REC ; do
  [ "${REC}" = "${TAG1}" ] && TAGFLAG="1"
  [ "${REC}" = "${TAG2}" ] && echo "${REC}" >> "${OUT_FILE}" && break
  [ -n "${TAGFLAG}" ]      && echo "${REC}" >> "${OUT_FILE}"
done
wc -l "${OUT_FILE}"

Code:

#!/bin/sh
# inexact match version - begin and end tags appear anywhere on their lines

IN_FILE="/tmp/far"
OUT_FILE="/tmp/boo"
TAG1="/start/beginTransaction"
TAG2="/end/endTransaction"

: >"${OUT_FILE}"
TAGFLAG=""
cat "${IN_FILE}" |while read REC ; do
  echo "${REC}" |grep -q "${TAG1}" && TAGFLAG="1"
  echo "${REC}" |grep -q "${TAG2}" && echo "${REC}" >> "${OUT_FILE}" && break
  [ -n "${TAGFLAG}" ] &&              echo "${REC}" >> "${OUT_FILE}"
done
wc -l "${OUT_FILE}"

---

The following run results are based on the following input file:

Code:

not to be captured!
outside of bounds
/start/beginTransaction
89 point blahblah
this forum is cool
iyem just testing
even more blahblah
do not take this seriously LOL
/end/endTransaction
document is over
stop reading
this is just an example...

---

Note: I saved the bludgeon script in ~/bin, which -is- listed in my PATH, so I can just name it to run it w/o having to specify its location.

Code:

tc@box:~$ bludgeon
7 /tmp/boo

tc@box:~$ cat /tmp/boo
/start/beginTransaction
89 point blahblah
this forum is cool
iyem just testing
even more blahblah
do not take this seriously LOL
/end/endTransaction

---

On this input data, the inexact-match version (using grep) produces the same results

Code:

tc@box:~$ bludgeon_grep
7 /tmp/boo

tc@box:~$ cat /tmp/boo
/start/beginTransaction
89 point blahblah
this forum is cool
iyem just testing
even more blahblah
do not take this seriously LOL
/end/endTransaction

wendy-lebaron · Jul 18, 2023

One disadvantage with sed is that it lacks variables. There are a few things where grep or a "big" interpreter such as Python could shine brighter. Otherwise it's hard to beat for search and replacement and other quick-and-dirty duties inside text files.

This document is fun to read in places and helped me out with this tool. A lot of information is outdated, and the last update to the text file seems to have been in 2003, so be aware of that. EDIT: you want to go for any information about "GNU sed v4.05" or alike, and sometimes "supersed". The others are older and/or less capable.

Frequently-Asked Questions about sed, the stream editor

MikeRocor · Jul 19, 2023

Thanks for the link - I'll dig into that the next time I'm looking for an evening "characterized by much sitting and little physical exercise."

Grep everything between two strings and save that to a file?

Brief-Wishbone9091

Member

Grep something that is between two known strings

JasKinasis

Super Moderator

MikeRocor

Active Member

wendy-lebaron

Active Member

MikeRocor

Active Member

Brief-Wishbone9091

Member

Brief-Wishbone9091

Member

MikeRocor

Active Member

wendy-lebaron

Active Member

MikeRocor

Active Member

Staff online

Members online

Latest posts