Creating a grouped tab separated file with a flat list

subvtech · Feb 16, 2023

Crosspost: https://www.nixcraft.com/t/converting-a-list-into-a-tab-separated-file-grouped-by-values/4517

Hello Linux.org,

This is my first post here and I'm hoping the community is inviting to novice shell scripters. So let me share.

I have a text file with a list of values. Here is a snip:

Code:

|01BFRUITS|
^banana
^apple
^orange
^pear
|01AELECTRONICS|
^television
^radio
^dishwasher
^computer
|01AANIMAL|
^bear
^cat
^dog
^elephant
|01ASHAPE|
^circle
^square
^diamond
^star

Values starting with a PIPE can be considered headers and values with a CARET are values to the header it is under.

My goal is to create a 'tab separated value' file with the headers on the left. So after much googling, man files, and some other forum help... I've managed to come up with two commands.. both output differently in my efforts.

First command:

Code:

cat test.txt | awk -v OFS='\t' '/^\|/{ c1=$0; gsub(/\|/,"",c1) } /^\^/{ c2=$0; sub(/^\^/,"",c2); print c1,OFS,c2 }' | sed -z s/\\r\\t\\t//g

Which produced:

Code:

01BFRUITS       banana
01BFRUITS       apple
01BFRUITS       orange
01BFRUITS       pear
01AELECTRONICS  television
01AELECTRONICS  radio
01AELECTRONICS  dishwasher
01AELECTRONICS  computer
01AANIMAL       bear
01AANIMAL       cat
01AANIMAL       dog
01AANIMAL       elephant
01ASHAPE        circle
01ASHAPE        square
01ASHAPE        diamond
01ASHAPE        star

The second command is:

Code:

cat test.txt | sed -z 's/\r\n\^/\t/g' | tr -d '|'

Which produced:

Code:

01BFRUITS       banana  apple   orange  pear
01AELECTRONICS  television      radio   dishwasher      computer
01AANIMAL       bear    cat     dog     elephant
01ASHAPE        circle  square  diamond star

Now my list has unique values in my test run. My new list has duplicates like so:

Code:

|01BFRUITS|
^banana
^apple
^orange
^pear
^banana
^apple
^orange
^pear
|01AELECTRONICS|
^television
^radio
^dishwasher
^computer
^television
^radio
^dishwasher
^computer
^television
^radio
^dishwasher
^computer
|01AANIMAL|
^bear
^cat
^dog
^elephant
^bear
^cat
^dog
^elephant
^bear
^cat
^dog
^elephant
^bear
^cat
^dog
^elephant
|01ASHAPE|
^circle
^square
^diamond
^star
^circle
^square
^diamond
^star
^circle
^square
^diamond
^star
^circle
^square
^diamond
^star
^circle
^square
^diamond
^star

And the desired output I am looking for is this:

Code:

01BFRUITS    banana    banana          
01BFRUITS    apple    apple          
01BFRUITS    orange    orange          
01BFRUITS    pear    pear          
01AELECTRONICS    television    television    television      
01AELECTRONICS    radio    radio    radio      
01AELECTRONICS    dishwasher    dishwasher    dishwasher      
01AELECTRONICS    computer    computer    computer      
01AANIMAL    bear    bear    bear    bear  
01AANIMAL    cat    cat    cat    cat  
01AANIMAL    dog    dog    dog    dog  
01AANIMAL    elephant    elephant    elephant    elephant  
01ASHAPE    circle    circle    circle    circle    circle
01ASHAPE    square    square    square    square    square
01ASHAPE    diamond    diamond    diamond    diamond    diamond
01ASHAPE    star    star    star    star    star

My intention is to group all values of the same value together and maintain the left header. I have no idea how to approach it with awk, sed, or tr. I did manage a way in excel, but the processing power it takes on my old computer is annoying. I think the cli will speed things up greatly.

My hardware is a RPi 4 running raspbian.

Any help or solutions will be greatly appreciated

subvtech · Feb 17, 2023

I was able to get this resolved using ChatGPT:

Code:

cat test.txt | sed -z 's/\r\n\^/,/g' | tr -d '|' | awk -F, "{ key = \$1; for (i=2; i<=NF; i++) { values[key][\$i] = values[key][\$i]\",\"\$i } } END { for (key in values) { for (value in values[key]) { printf \"%s%s\n\", key, values[key][value] } } }"

I had to change the file from tab separated to tab.

mrcrossroads · Feb 17, 2023

subvtech said:
I was able to get this resolved using ChatGPT:

Code:

cat test.txt | sed -z 's/\r\n\^/,/g' | tr -d '|' | awk -F, "{ key = \$1; for (i=2; i<=NF; i++) { values[key][\$i] = values[key][\$i]\",\"\$i } } END { for (key in values) { for (value in values[key]) { printf \"%s%s\n\", key, values[key][value] } } }"

I had to change the file from tab separated to tab.

Welcome to the forum!

Creating a grouped tab separated file with a flat list

subvtech

New Member

subvtech

New Member

mrcrossroads

Well-Known Member

Members online

Latest posts