NB: For those of you not invested in this thread this, is waaayyyyy tldr. @ Brief-Wishbone9091 - you-did ask for it. Sorry if I went overboard.
I'll go back up to the script I posted earlier because I'm -still- not really a sed guy to the point where I can explain that part well (and I'm too lazy to look it up right now).
Reposting the "bludgeon" script (but I'll probably make some changes to it and list it again further down):
Code:
#!/bin/sh
# bludgeon - no doubt there's a better way to do this. 2023-07-15 03:27
#
# assumes "start/beginTransaction" and "end/endTransaction" will be unique
# otherwise, remove the break command.
# assumes "start/beginTransaction" and "end/endTransaction" will each constitute
# an entire line (no other text, no indentation, etc) otherwise uncomment the two
# lines with the grep commands and comment the the line below each of those.
IN_FILE="/tmp/far"
OUT_FILE="/tmp/boo"
TAG1="/start/beginTransaction"
TAG2="/end/endTransaction"
#
: >"${OUT_FILE}"
TAGFLAG=""
cat "${IN_FILE}" |while read REC ; do
# echo "${REC}" |grep -q "${TAG1}" && TAGFLAG="1"
[ "${REC}" = "${TAG1}" ] && TAGFLAG="1"
# echo "${REC}" |grep -q "${TAG2}" && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break
[ "${REC}" = "${TAG2}" ] && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break
[ -n "${TAGFLAG}" ] && echo "${REC}" >> "${OUT_FILE}"
done
ls -l "${OUT_FILE}"
The lines beginning with '#" are comments. The only one of these that matters to the code is the very first one which is a specially formatted comment on the very first line of the script and beginning with "#!" and followed immediately with the path to the interpreter that will interpret (run) the script. In this case, interpreter will be /bin/sh which is the command shell.
If you are saving this in your home directory, note that it will not be seen as an executable program until you issue the command
which says, "for the user (owner), add execute permission to the file ~/bludgeon". Then also note that your home directory is probably not listed in your command search path so, even if your present working directory is your home directory, you have to specify where to find it every time you run it. Just saying
bludgeon (or whatever you called it) won't do the trick. You'll have to say
~/bludgeon
Meawhile, back inside the script, the next four lines after the beginning comment block:
Code:
IN_FILE="/tmp/far"
OUT_FILE="/tmp/boo"
TAG1="/start/beginTransaction"
TAG2="/end/endTransaction"
are initializing some variables.
- You would want to set IN_FILE to the pathname of your source file
- You would want to set OUT_FILE to whatever file you want to leave your output in
- TAG1 is the marker that indicates the start of the section you are interested in
- TAG2 is the marker that indicates the end of the section you are interested in
- It looks like you would want to set TAG1 to something like "10AM" and TAG2 to something like "14PM".
- Note that this script will start its output with the -first- instance of the TAG1 value in the input and will stop its output with the -first- instance of the TAG2 value in the input.
Code:
: >"${OUT_FILE}"
TAGFLAG=""
The first of these lines says, "do nothing (the colon) and redirect the output of that (which, surprisingly enough, is nothing) to the output file. This is slightly different from "touch $OUT_FILE" in that touch creates an empty file if the file does not already exist but only updates the date/time of a file that -does- already exist. This command creates it if it does not exist and makes it empty if it does exist.
TAGFLAG will be an indicator that we have or have not encountered TAG1 in the input. We initialize it to "" (empty string) to indicate "not found". We'll set it to "1" when we find the first instance of TAG1 and we'll reset it to empty when we find the first instance of TAG2
Now for the good stuff:
Code:
cat "${IN_FILE}" |while read REC ; do
# echo "${REC}" |grep -q "${TAG1}" && TAGFLAG="1"
[ "${REC}" = "${TAG1}" ] && TAGFLAG="1"
# echo "${REC}" |grep -q "${TAG2}" && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break
[ "${REC}" = "${TAG2}" ] && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break
[ -n "${TAGFLAG}" ] && echo "${REC}" >> "${OUT_FILE}"
done
The first line of this part says "print out the entire source file but, instead of dumping it to the terminal screen, send it as input to ("pipe" it to) the the loop that will read each record (when the read command fails, it means we have reached the end of the input and the while loop will terminate) and decide what to do with it. The loop body ends with "done" (the counterpoint to "do") and there are only three non-empty, non-comment lines inside the loop.
For the processing inside the loop, REC will contain the line most recently read from the input.
(Note that '[' is simply the "test" command with the only difference being that '[' requires a closing ']' whereas "test" does not. If the tested condition is true then test "succeeds" otherwise, test "fails")
So...
test whether the value of REC is exactly the same as the value of TAG1. AND, if that succeeded (&&), set TAGFLAG to "1"
test whether the value of REC is exactly the same as the value of TAG2. And if that succeeded (&&), reset TAGFLAG to "". AND, if that succeeded (which it always will), echo the value of REC onto the end of the output file. AND, if that succeeded (which it will unless your disk is full, in which case you've got bigger fish to fry), "break" out of the loop because we're done. (Note that resetting TAGFLAG to "" here is extraneous because we're going to "break" anyway.)
test whether TAGFLAG in non-empty ( -n ). AND, if that succeeded, echo the value of REC onto the end of the output file.
But what about those two lines inside the loop that are commented out? Each of those is a possible substitute for the line below it because...
The values of TAG1 and TAG2 may or may not appear as a complete line in the original (source) file. If they don't each appear -alone- on a line, then those places (above) where we tested if REC is -exactly- the same TAG1 (or TAG2) will always fail and we'll get no output. In that case we can un-comment (remove the leading '#' from) the two commented lines and comment (add a leading '#' to) the line immediately below each of the currently commented lines). That way, instead testing for whether or not each incoming REC matches either of TAG1 or TAG2, we can test whether each incoming REC -contains- TAG1 or TAG2:
first line in loop...
Code:
echo "${REC}" |grep -q "${TAG1}" && TAGFLAG="1"
echo the value of REC but pipe the output to grep -q (quiet, no output, just succeed or fail) ${TAG1}. AND, if that succeeded (TAG1 value was found anywhere in the record), set TAGFLAG to "1"
that other line that was originally commented inside the loop...
Code:
echo "${REC}" |grep -q "${TAG2}" && TAGFLAG="" && echo "${REC}" >> "${OUT_FILE}" && break
echo the value of REC but pipe the output to grep -q (quiet, no output, just succeed or fail) ${TAG2}. AND, if that succeeded (TAG2 value was found anywhere in the record), reset TAGFLAG to "". AND, if that succeeded, echo the value of REC onto the end of the output file. AND, if that succeeded, "break" out of the loop because we're done. (Note that resetting TAGFLAG to "" here is extraneous because we're going to "break" anyway.)
The final line of the script, after the loop ends, just shows a directory listing ( ls -l ) for the output file so you can see if it's empty or not. In the listings below, I'll print a line count ( wc -l instead ). I'll also throw out the bit about resetting TAGFLAG.
---
Code:
#!/bin/sh
# exact match version - begin and end tags appear alone on their lines
IN_FILE="/tmp/far"
OUT_FILE="/tmp/boo"
TAG1="/start/beginTransaction"
TAG2="/end/endTransaction"
: >"${OUT_FILE}"
TAGFLAG=""
cat "${IN_FILE}" |while read REC ; do
[ "${REC}" = "${TAG1}" ] && TAGFLAG="1"
[ "${REC}" = "${TAG2}" ] && echo "${REC}" >> "${OUT_FILE}" && break
[ -n "${TAGFLAG}" ] && echo "${REC}" >> "${OUT_FILE}"
done
wc -l "${OUT_FILE}"
Code:
#!/bin/sh
# inexact match version - begin and end tags appear anywhere on their lines
IN_FILE="/tmp/far"
OUT_FILE="/tmp/boo"
TAG1="/start/beginTransaction"
TAG2="/end/endTransaction"
: >"${OUT_FILE}"
TAGFLAG=""
cat "${IN_FILE}" |while read REC ; do
echo "${REC}" |grep -q "${TAG1}" && TAGFLAG="1"
echo "${REC}" |grep -q "${TAG2}" && echo "${REC}" >> "${OUT_FILE}" && break
[ -n "${TAGFLAG}" ] && echo "${REC}" >> "${OUT_FILE}"
done
wc -l "${OUT_FILE}"
---
The following run results are based on the following input file:
Code:
not to be captured!
outside of bounds
/start/beginTransaction
89 point blahblah
this forum is cool
iyem just testing
even more blahblah
do not take this seriously LOL
/end/endTransaction
document is over
stop reading
this is just an example...
---
Note: I saved the bludgeon script in ~/bin, which -is- listed in my PATH, so I can just name it to run it w/o having to specify its location.
Code:
tc@box:~$ bludgeon
7 /tmp/boo
tc@box:~$ cat /tmp/boo
/start/beginTransaction
89 point blahblah
this forum is cool
iyem just testing
even more blahblah
do not take this seriously LOL
/end/endTransaction
---
On this input data, the inexact-match version (using grep) produces the same results
Code:
tc@box:~$ bludgeon_grep
7 /tmp/boo
tc@box:~$ cat /tmp/boo
/start/beginTransaction
89 point blahblah
this forum is cool
iyem just testing
even more blahblah
do not take this seriously LOL
/end/endTransaction