Purchase Linux CDs / DVDs / Flash Drives at OSDisc.com

Welcome to Our Community

While Linux.org has been around for a while, we recently changed management and had to purge most of the content (including users). If you signed up before April 23rd, 2017 please sign up again. Thanks!

  1. Note: we recently updated out site software, please report any unseen issues - we do this often to insure your information is secure.
    Dismiss Notice

Binary file, wc and pipes

Discussion in 'Command Line' started by Violett, Oct 19, 2017.

  1. Violett

    Violett New Member

    Joined:
    Oct 19, 2017
    Messages:
    2
    Likes Received:
    2
    I have a binary file. (backup)
    My task is to list the number of names in the file using wc and piping.
    I have researched various ways of doing so but none of which provide the desired output.
    Following a numbered output, I am to list the first 5 and last 5 of the names.

    ls -l of file:
    -rw-r--r-- . 1 root root 3252373504 Jan 21 22:04 /mnt/tape/backup

    Currently I have the numbered output:
    strings /mnt/tape/backup | wc -w
    10123456

    The example for the first 5 names is:
    lib
    lib64
    usr/lib64/libgcc_s-4.8.5-20150702.so.1
    usr/lib64/libgcc_s.so.1​

    I see the result is a list of files that are backed up. However, I have not determined the proper command to specifically display these path names as the backup.





    Thank you.
     
    #1 Violett, Oct 19, 2017
    Last edited: Oct 19, 2017
  2. atanere

    atanere Moderator
    Gold Supporter

    Joined:
    Apr 6, 2017
    Messages:
    1,167
    Likes Received:
    1,158
    Hi @Violett, and welcome to the site!

    I'm not a programmer, so I can't help too much... but I don't think you've explained the problem well enough. What kind of "binary file" are you using? For example, a JPG image is a binary file... and maybe the image shows 20 lines of text... but I don't think command line tools can parse the image to extract that text. But again, I'm not a programmer so maybe I am wrong about this, and I'm sure someone will correct me.

    I also don't know what you mean by "names".... when you say to "list the number of names" and to "list the first 5 and last 5 of the names." Do you mean "lines" instead? Your example shows the wc -l command which does indeed accurately count lines in a text file, but it seems to give unpredictable results in a binary LibreOffice document (it outputs a number, but it does not match the number of actual lines).

    So maybe someone will be better able to help if you can give us a bit more info on your project. Thanks!

    Cheers
     
    wizardfromoz and Violett like this.
  3. Violett

    Violett New Member

    Joined:
    Oct 19, 2017
    Messages:
    2
    Likes Received:
    2
    Thank you,
    Your answer suggested a syntax error.
    As a result, I found the wc -w command that gave the required result.

    I have edited the question to be more clear.
     
    #3 Violett, Oct 19, 2017
    Last edited: Oct 19, 2017
    atanere and wizardfromoz like this.
  4. atanere

    atanere Moderator
    Gold Supporter

    Joined:
    Apr 6, 2017
    Messages:
    1,167
    Likes Received:
    1,158
    Hi again! That does help to clarify the task, but since this isn't my thing, it still takes me a bit to register everything. Sorry. Maybe others will join in soon with better understanding... but until then, I'll just try to muddle along and see if I can learn something here too.

    So, you say that wc -w is the correct syntax, but I am only having success with your code on a text file (not on any binary or .tar files that I have tried). The code below gives me an accurate word count:

    Code:
    strings mytextfile.txt | wc -w
    But this output is a number (word count)... and in your case above you show over 10 million "words" in your large backup file. But this code doesn't list the words inside like your example shows the filenames inside the backup. Are you using the strings command alone? (That works for me, but only on a text file.)

    Your example of a list of filenames also shows one filename per line... which makes me think that your first syntax of wc -l might count the files more accurately. For example, this list below (in a text file) shows 3 filenames, but it has 10 words. These are verified using the wc - l and the wc -w commands respectively. Your example did not show any examples of filenames with spaces as part of the name which creates this difference in numbering.

    My homework assignment.txt
    More-homework.odt
    How to get the best count.pdf


    OK, so if the first step of your task is to number the output, you will choose which of these numbering methods is most appropriate for you.

    And now the second step of your task... to list the first 5 and last 5 names (filenames) in the large backup file. Again, I'm back to questioning how you showed the list of the first 5 in the example above? The strings command will list the contents of a text file for me, but I can't get it to cleanly show the contents of a binary or .tar archive.... and maybe this is really where I'm goofing up. But also, at this point, I can't see how to separate the first 5 and last 5 with only the commands that you have offered.... are you permitted to use other commands? Without digging online, this part of your task seems like a job for sed or awk to help you out.

    OK, I've rambled on enough... much of this has just been me thinking out loud, but trying to edit for clarity. I'm sure I am not all that clear though. :confused::confused:
     
  5. JasKinasis

    JasKinasis Well-Known Member

    Joined:
    Apr 25, 2017
    Messages:
    244
    Likes Received:
    440
    Assuming strings gives you a line for each entry in your binary file and that each entry is a filename then the command:
    Code:
    strings /path/to/binfile | wc -l
    Should tell you how many file-names are in there. Disclaimer: I'm not familiar with the strings command. Currently on holiday, so I don't have access to a Linux terminal... However if the file in question is a text file, I'd just use cat instead of strings.... But for now, I'll just go with strings.

    To see only the first or last five entries, you would need to use the head or tail commands.
    Here's how you'd use head to see the first five file-names:
    Code:
    strings /path/to/file | head -n 5
    Tail works in exactly the same way and will show the last file-names. I'll leave you to work that out...
     
    #5 JasKinasis, Oct 24, 2017
    Last edited: Oct 24, 2017
    atanere likes this.

Share This Page