Could someone help me understand this command behavior?

soupmagnet

New Member
Joined
Nov 2, 2022
Messages
1
Reaction score
0
Credits
25
So, to explain the problem, I have about a hundred or so .tsv (tab separated values) files that are being used to store information on vendors for certain client accounts. I am trying to parse these files in order to generate a pipe delimited string of each of the individual columns for an import to a different platform. Several of the files in particular are displaying a strange behavior that I can't seem to make sense of.

One such .tsv file is similarly as follows...

vendor purpose website username password additional_info
Constant Contact Email/SMTP https://login.constantcontact.com/login/ [redacted] [redacted] empty
Digital Ocean Hosting Web Hosting https://cloud.digitalocean.com/login [redacted] [redacted] empty
Google Analytics Misc. http://www.google.com/analytics/ empty empty [redacted]
Nexcess Portal Web Hosting https://my.nexcess.net empty empty request invitaiton
Nexcess Siteworx Web Hosting [redacted] [redacted] [redacted] empty
Twitter Social Media https://twitter.com/login [redacted] [redacted] Twitter Profile: https://twitter.com/[redacted]

So, let's say I want to focus on the sixth column, "additional_info", while dropping the header line, I can run the following...
> awk -F "\t" '{print $6}' "./Client1234.tsv" | tail -n+2

which results in...
empty
empty
[redacted]
request invitation
empty
Twitter Profile: https://twitter.com/[redacted]
..as is expected

Now, I want to use that and turn it into a pipe delimited string by using the following...
> awk -F "\t" '{print $6}' "./Client1234.tsv" | tail -n+2 | paste -sd '|'

which results in...
whereas what I would expect to see is this...
empty|empty|[redacted]|request invitation|empty|Twitter Profile: https://twitter.com/[redacted]


Whats strange is that if I perform the same task on any of the other columns in the file, it results in the expected behavior...

> awk -F "\t" '{print $2}' "./Client1234.tsv" | tail -n+2 | paste -sd '|'
Constant Contact|Digital Ocean Hosting|Google Analytics|Nexcess Portal|Nexcess Sitworx|Twitter
> awk -F "\t" '{print $3}' "./Client1234.tsv" | tail -n+2 | paste -sd '|'

I decided to use the 'cat' command to display all of the hidden characters to make sure that I am working with "tab" separated values and not just a bunch of space characters in a file, and sure enough the format is correct, as ^I represents the tab character in the cat command output.
> cat -A Client1234.tsv
vendor^Ipurpose^Iwebsite^Iusername^Ipassword^Iadditional_info^M$
Constant Contact^IEmail/SMTP^Ihttps://login.constantcontact.com/login/^[redacted]^I[redacted]^Iempty^M$
Digital Ocean Hosting^IWeb Hosting^Ihttps://cloud.digitalocean.com/login^I[redacted]^I[redacted]^Iempty^M$
Google Analytics^IMisc.^Ihttp://www.google.com/analytics/^Iempty^Iempty^I[redacted]^M$
Nexcess Portal^IWeb Hosting^Ihttps://my.nexcess.net^Iempty^Iempty^Irequest invitation^M$
Nexcess Siteworx^IWeb Hosting^I[redacted]^I[redacted]^[redacted]^Iempty^M$
Twitter^ISocial Media^Ihttps://twitter.com/login^I[redacted]^I[redacted]^ITwitter Profile: https://twitter.com/[redacted]^M$
^M$
**Where it says "empty" in the file is the literal string "empty", used as a placeholder for an empty cell because I thought maybe the string "null" might have been causing this problem in earlier versions. Also, I don't think that it matters, but I'm using zsh to run the commands as opposed to a traditional bash shell

There are several other .tsv files in the list that seem to display the same behavior, but not all of them. I can't figure out why it is or what the difference is, but if anyone could help me understand what's going on here it would be most appreciated. Thanks!
 
Last edited:

Members online


Latest posts

Top