soupmagnet
New Member
So, to explain the problem, I have about a hundred or so .tsv (tab separated values) files that are being used to store information on vendors for certain client accounts. I am trying to parse these files in order to generate a pipe delimited string of each of the individual columns for an import to a different platform. Several of the files in particular are displaying a strange behavior that I can't seem to make sense of.
One such .tsv file is similarly as follows...
So, let's say I want to focus on the sixth column, "additional_info", while dropping the header line, I can run the following...
which results in...
Now, I want to use that and turn it into a pipe delimited string by using the following...
which results in...
Whats strange is that if I perform the same task on any of the other columns in the file, it results in the expected behavior...
I decided to use the 'cat' command to display all of the hidden characters to make sure that I am working with "tab" separated values and not just a bunch of space characters in a file, and sure enough the format is correct, as ^I represents the tab character in the cat command output.
There are several other .tsv files in the list that seem to display the same behavior, but not all of them. I can't figure out why it is or what the difference is, but if anyone could help me understand what's going on here it would be most appreciated. Thanks!
One such .tsv file is similarly as follows...
vendor purpose website username password additional_info
Constant Contact Email/SMTP https://login.constantcontact.com/login/ [redacted] [redacted] empty
Digital Ocean Hosting Web Hosting https://cloud.digitalocean.com/login [redacted] [redacted] empty
Google Analytics Misc. http://www.google.com/analytics/ empty empty [redacted]
Nexcess Portal Web Hosting https://my.nexcess.net empty empty request invitaiton
Nexcess Siteworx Web Hosting [redacted] [redacted] [redacted] empty
Twitter Social Media https://twitter.com/login [redacted] [redacted] Twitter Profile: https://twitter.com/[redacted]
So, let's say I want to focus on the sixth column, "additional_info", while dropping the header line, I can run the following...
> awk -F "\t" '{print $6}' "./Client1234.tsv" | tail -n+2
which results in...
..as is expected
Now, I want to use that and turn it into a pipe delimited string by using the following...
> awk -F "\t" '{print $6}' "./Client1234.tsv" | tail -n+2 | paste -sd '|'
which results in...
whereas what I would expect to see is this...|Twitter Profile: https://twitter.com/[redacted]
empty|empty|[redacted]|request invitation|empty|Twitter Profile: https://twitter.com/[redacted]
Whats strange is that if I perform the same task on any of the other columns in the file, it results in the expected behavior...
> awk -F "\t" '{print $2}' "./Client1234.tsv" | tail -n+2 | paste -sd '|'
Constant Contact|Digital Ocean Hosting|Google Analytics|Nexcess Portal|Nexcess Sitworx|Twitter
> awk -F "\t" '{print $3}' "./Client1234.tsv" | tail -n+2 | paste -sd '|'
I decided to use the 'cat' command to display all of the hidden characters to make sure that I am working with "tab" separated values and not just a bunch of space characters in a file, and sure enough the format is correct, as ^I represents the tab character in the cat command output.
> cat -A Client1234.tsv
**Where it says "empty" in the file is the literal string "empty", used as a placeholder for an empty cell because I thought maybe the string "null" might have been causing this problem in earlier versions. Also, I don't think that it matters, but I'm using zsh to run the commands as opposed to a traditional bash shellvendor^Ipurpose^Iwebsite^Iusername^Ipassword^Iadditional_info^M$
Constant Contact^IEmail/SMTP^Ihttps://login.constantcontact.com/login/^[redacted]^I[redacted]^Iempty^M$
Digital Ocean Hosting^IWeb Hosting^Ihttps://cloud.digitalocean.com/login^I[redacted]^I[redacted]^Iempty^M$
Google Analytics^IMisc.^Ihttp://www.google.com/analytics/^Iempty^Iempty^I[redacted]^M$
Nexcess Portal^IWeb Hosting^Ihttps://my.nexcess.net^Iempty^Iempty^Irequest invitation^M$
Nexcess Siteworx^IWeb Hosting^I[redacted]^I[redacted]^[redacted]^Iempty^M$
Twitter^ISocial Media^Ihttps://twitter.com/login^I[redacted]^I[redacted]^ITwitter Profile: https://twitter.com/[redacted]^M$
^M$
There are several other .tsv files in the list that seem to display the same behavior, but not all of them. I can't figure out why it is or what the difference is, but if anyone could help me understand what's going on here it would be most appreciated. Thanks!
Last edited: