I have a wordlist that currently is 40 gb in size. I must convert it to UTF 8 and not sure of how to do so. I assume it can be done through the command line. Can someone help me out and provide me a command line string that I may try. Thank you!
iconv -t UTF8 /path/to/inputfile -o /path/to/convertedFile
bash-5.0$ iconv --help
Usage: iconv [OPTION...] [FILE...]
Convert encoding of given files from one encoding to another.
Input/Output format specification:
-f, --from-code=NAME encoding of original text
-t, --to-code=NAME encoding for output
iconv --verbose -f UTF8 "/media/sf_Share/Newest Version/Full-Master-8-16-20b.txt" -o "/media/sf_Share/Newest Version/finished.txt"
iconv --verbose -f UTF8 /media/sf_Share/Newest\ Version/Full-Master-8-16-20b.txt -o /media/sf_Share/Newest\ Version/finished.txt
Each one says the same error "iconv: illegal input sequence at position 1"
-t UTF8
-f UTF8
Dude, first allow me to say thank you for your help. Often when someone puts out this amount of effort to help someone in a forum they are very often not provided the respect and the thank you's that they deserve. So I wanted to make sure you knew how much I appreciate you.
OK, let me update. Yes you are correct. However I misspoke. This list is supposed to be asc2. I think it is currently UTF8. I have been in contact with the team that wrote this software I am attempting to run this list through. I know that there shouldn't be any difference between the two because they were initially created to support English charset only and not any accents such as ç, ñ, á, ü, ç, etc. They think my list could be in unicode because of the way their software reacts when I run it through. They are adamant about no unicode.
At the moment I am not sure exactly what encoding the list may be in. I know I have made you role your eyes a bit here. In my line of work I have never faced an issue where I needed to know some of this because most of my tools are pre-built. I have done much work for this team and when they asked me to help them with some issues I was happy to do so. Once we were done they wanted to know just how this software would run under certain heavy conditions such as running my list. I explained it was a bit out of my area but would be glad to do what I could. I love to learn new things so I wanted to take it on.
At this point I guess I need to learn what encoding this list is before I continue down the path I was on in this post. I started this morning to research how to determain what encodeing this is in between taking phone calls. So this is where I am. Can you tell I am confused? lol
At this point I guess I need to learn what encoding this list is
file -bi <filename>