[SOLVED] How to store lines between two patterns in array?

  • Thread starter Deleted member 143446
  • Start date
D

Deleted member 143446

Guest
Hi, I need command line help.
I have a file whose content is like that :


HEAD: Chest1
apple
banana
orange
corriander
HEAD: Chest2
mapple
kakoa
papaya
lavander
HEAD: Chest3

..


First I need to get data blocks between pattern "HEAD". I did it by using sed -n and p (I think). T
Then I need to store each block into a variable. To make it I tried to create an blank array and store into it. I prepared this code :

Bash:
blocks=()

cat $filename | sed -n '/HEAD:/, /HEAD:/p' $blocks

count=1
for block in ${blocks[@]}; do
  count=$(($count+1))
  echo -e "Content of chest  $count  is : $block\n"
done

But it did not work. Can anybody help me to correct my code ? Thanks.
 


Hi, I need command line help.
I have a file whose content is like that :


HEAD: Chest1
apple
banana
orange
corriander
HEAD: Chest2
mapple
kakoa
papaya
lavander
HEAD: Chest3

..


First I need to get data blocks between pattern "HEAD". I did it by using sed -n and p (I think). T
Then I need to store each block into a variable. To make it I tried to create an blank array and store into it. I prepared this code :

Bash:
blocks=()

cat $filename | sed -n '/HEAD:/, /HEAD:/p' $blocks

count=1
for block in ${blocks[@]}; do
  count=$(($count+1))
  echo -e "Content of chest  $count  is : $block\n"
done

But it did not work. Can anybody help me to correct my code ? Thanks.
Firstly, there's no need to cat the file and then pipe it to sed.
Simply pass the file to sed!
Secondly if you want to capture the "blocks" as you call them in a variable called $blocks - you'll need to do something like this:
Bash:
blocks=$(sed -n "/HEAD:/,, /HEAD:/p $filename")
However, I don't think that will quite work either as it will capture all of the output from sed as a variable.

Thirdly, the sed command you're using will list everything from HEAD: Chest 1, to HEAD: Chest 2.
But you'll be missing the items under Chest 2.
Then it will list everything between Head: Chest 3 and HEAD: Chest 4.
And you'll be missing the items for Head: Chest4 too.
So you'll only end up getting the contents of every other "chest" with the "Head:" of the next set included at the end.

And that kind of pattern will continue throughout the file. So you'll be missing the contents of every other "Chest".

So, there are a number of problems with your approach here.
I don't think sed will do quite what you want - not without writing a much more involved sed script.

So you may need to find another way of reading these records.
Perhaps do some kind of readline loop that will read each line from the file and then if it finds a line that starts with "HEAD:" it creates a new "Chest" and all following lines are added to that chest. So it sounds like you'll need some kind of two dimensional array, or an array of arrays, or something?!

Failing that, is this a personal project? A work project? or a homework question? The reason I ask is - is there any way the format of the file could be changed? If you perhaps added some kind of 'END:' tag to the end of each "chest" /"block"?
Just wondering what sort of scope there is for change in any of this.

I don't have time to work on anything right now, but I might fire up a terminal and have a play a little later in the week, if I get time!
 
@KGIII , actually it is not a real homework, nobody assign it to me, but it is my own homework for learning Linux command shells.
I want to increase my ability in shell coding. I did similar coding in Java to get my street's planned water shortages from city's water department's web page since they are announced regularly. I tried to do the same thing in shell coding, which seems less expensive than writing a whole Java coding(creating lots of codes for gui etc.).
@JasKinasis , I just understood the problem you mentioned. Pattern is exactly as you mentioned, there is no end "tag" to close pattern. Like a set having [a, b) limits i.e. [closed end-open end). Maybe first thing to do is to end an end "tag = HEAD", to the end of the file then, or read until the end of the file.
Therefore, maybe line by line reading is more clever. I am just looking a way to sed to store the lines between the patterns to buffer as mentioned in https://www.educative.io/answers/what-is-sed-pattern-buffer (by supressing the printing of lines in the buffer)
So I understood that I have to change my code. I will do it and if I can solve it, I will share the solution here. Thanks.
 
but it is my own homework for learning Linux command shells.

Excellent, thanks.

We don't do homework - but we *will* point folks in the right direction, so that they can figure it out on their own.

In this case, you've got Jas already working on it - so you really aren't going to get anyone much more adept at this sort of thing. He's far more adept than I am.
 
Saying that, I've had a bit of a play with awk and come up with this little script, which does the following:
1. Ensures it is passed a single parameter
2. Ensures the parameter is a path to a valid file
3. If 1 or 2 are not met, it exits with an error
4. Creates a temporary directory in /tmp/
5. Moves into the temporary directory
6. Runs the input file through awk, which extracts everything between each line matching HEAD:, or the end of the file.
And outputs each group of results to a separate file in the temporary directory e.g. /tmp/tmp.wiorueru/chest1.txt /tmp/tmp.wiorueru/chest2.txt etc etc.
7. Moves back to the original working directory
8. Displays the number of chests and the content of each chest.
9. Cleans up by removing the temporary directory.

The only thing left to do is to modify the code that displays the number of chests and their contents, so that the content of each chest is read into an array, or an array of arrays.
I've just ran out of time for that part! It's getting late here and I have work in the morning! Ha ha!

Without further ado here's the script I have so far:
getchests.sh
Bash:
#!/usr/bin/env bash

# handle errors
die()
{
    echo "ERROR: $1"
    echo "Usage: getchests.sh /path/to/file"
    exit 1
}

# ensure we have a single parameter which is a path to a valid file
if [ $# -ne 1 ]; then
    die "Incorrect number of parameters!"
elif [ ! -f $1 ]; then
    die "$1 does not exist, or is NOT a valid file!"
fi

# Get the absolute path to the passed-in file
inFile="$(realpath $1)"

# Create a temporary directory in /tmp/
tempDir=$(mktemp -d)
#echo "tempDir=$tempDir" # debug - shows the path to created temp directory

# cd into the temporary directory
cd "$tempDir"

# Process the input file with awk - get everything between each HEAD: tag and write each chest out to a separate .txt file (in the temporary directory)
awk 'BEGIN{open=0; num=0;}{if($0 ~ /HEAD:/){open=1;num=num+1;}else if($0 ~ /HEAD:/){open=0;}else{if(open==1){print $0 >> "chest"num".txt";}}}' $inFile

# cd back to the original working directory
cd -

# count the number of Chest-files in the temporary directory
echo "$inFile contains $(find $tempDir -type f | wc -l) Chests"

# This just loops through each file and cats it to the screen
# What the OP needs to do here is read each line of each file into a separate array - so will need an array of arrays, or something.
count=1;
for chest in "$tempDir"/*
do
    echo "chest $count contains;"
    cat "$chest"
    echo
    count=$((count+1))
done

# clean up - remove the temporary directory and temporary files
rm -r "$tempDir"

As always - ensure it has executable permissions with chmod +x ./getchests.sh (assuming you're in the directory containing the script)
And run it as:
Bash:
./getchests.sh /path/to/inputfile
If you forget to pass it a path to a file, or pass it an invalid path to a file - it will just exit with an error message.

I hope this helps!
I've used quite a few little shell-scripting tricks in various places in the script. If you have any questions about any of it - feel free to fire away!

The awk part was the bit that took the longest. That was a little tricky! I couldn't find a clean way to pass awk the path to the temporary directory, so it could write the temporary files out there.

So instead of wasting tons of time trying to get that working, I just made the script cd into the temporary directory to make that the current working directory. And then ran awk in there and got it to write the "chest" files out to the new current working directory.

But this meant that I had to run realpath on the filename passed into the script, to ensure that we akways had an absolute path to the script.

So for example, if you were in the directory ~/someProject/ and you ran the script passing the path to the input file using a relative path like: ./file.txt:
Because the script changes the working directory to the temporary directory, then the relative path ./file.txt would not work, because the file is not in the empty, temporary directory the script just moved into. It's in a sub-directory of your home directory. So what we had to do before moving into the temporary directory was to use realpath to get the fully qualified, absolute path to the input file.
e.g. it would resolve to the absolute file-path/name /home/yourname/someProject/file.txt.
That way, after moving into the temporary directory - awk could still process the file because it had a fully qualified, absolute path to the file.

Anyway, I hope this helps.
I'm off to bed now!
 
I run your code with my example @JasKinasis thanks! That worked! No need to store anything to array!
Yesterday there was a water shortage and waters came at 03:00AM today! So I started with two good news today !
 
@JasKinasis , can I use some part of code in my future shell codes? For instance die function for argument checking ?
 

Members online


Latest posts

Top