Print bookmarks.html with the formatting

fixit7 · Mar 20, 2019

Is there a way to print my Firefox bookmarks without all the html formatting.

For example, I would want just "http://resources.hewitt.com/centerpoint/" to be printed.

<DT><A HREF="http://resources.hewitt.com/centerpoint/" ADD_DATE="1352487818"

wizardfromoz · Mar 20, 2019

G'day @fixit7

You seem to have a penchant for placing Threads in Ubuntu, whereas some of the subject matter is more General.

I will move this to Command Line, because I am guessing a certain Member or two will spot it, and suggest a script.

Cheers, and I will watch with interest.

Wizard

fixit7 · Mar 20, 2019

wizardfromoz said:
G'day @fixit7

You seem to have a penchant for placing Threads in Ubuntu, whereas some of the subject matter is more General.

I will move this to Command Line, because I am guessing a certain Member or two will spot it, and suggest a script.

Cheers, and I will watch with interest.

Wizard

Sorry about that. I forgot that some topics are not specific to ubuntu.

kenJackson · Mar 20, 2019

fixit7 said:
Is there a way to print my Firefox bookmarks without all the html formatting.

For example, I would want just "http://resources.hewitt.com/centerpoint/" to be printed.

Try this:

Code:

lynx -dump http://resources.hewitt.com/centerpoint/

Though that page has moved. This will be more interesting:

Code:

lynx -dump "https://aura.alight.com/proxypu/servlet/02017_auth?linkId=FRAUD"

BTW, if you want to use the clipboard, install xclip. Then you can right click and "copy link address" and then use this command:

Code:

lynx -dump "$(xclip -sel clip -o)"

JasKinasis · Mar 20, 2019

And if you don't have lynx and xclip, you can use the standard GNU Unix toolset.

After a tiny bit of trial and error - This one-liner worked in cygwin - to extract the links from the bookmarks.html I exported from Firefox on my Windows PC at work.

Code:

\grep HREF= ~/bookmarks.html | awk '{print $2}' > ~/bookmarks.txt; sed -i -- '/place:/d; s/HREF=//g; s/"//g' ~/bookmarks.txt

Assuming that the Linux version exports the bookmarks in the same format, the above one-liner should work in Linux too.

The grep command searches for lines in ~/bookmarks.html that contain the string "HREF=".

Matching lines are piped to awk, where we print the 2nd field, which should contain the HREF= property containing a website URL. Awk outputs that to a new file called ~/bookmarks.txt.

Then we use sed on ~/bookmarks.txt to filter out the HREF= tags and the double quotes that enclose the URLs/links for our bookmarks.

We're also ignoring lines that contain "place:".
URLS with "place:" in them are used internally by firefox and contain metadata about any folders/subfolders you have in firefox's bookmark manager. So we want to exclude those URLS from our final output too.

The -i option to sed means that sed will edit the input file in place. So any changes are made directly to ~/bookmarks.txt.
So in the end, we should just end up with a text file with a bunch of website URLs.

Job done! Hopefully?!

And before anybody says anything - yes, I do have to use Windows at work - but I don't get any choice about that. But I try to use as much free-software as possible. Sometimes Cygwin is the only thing that keeps me sane!

But at home, I'm 100% Linux and free-software! XD

kenJackson · Mar 20, 2019

JasKinasis said:
And if you don't have lynx and xclip, you can use the standard GNU Unix toolset. ...

Code:

\grep HREF= ~/bookmarks.html | awk '{print $2}' > ~/bookmarks.txt; sed -i -- '/place:/d; s/HREF=//g; s/"//g' ~/bookmarks.txt

I think this does the same, but only uses awk in one pass.

Code:

awk '/place:/{next}; /HREF=/{gsub(/HREF=|"/,"",$2); print $2}' bookmarks.html > bookmarks.txt

The problem is, both depends on a very particular HTML format. If you're only using it on bookmark pages--great. But if you then decide to try it on some website page you found--not so great.

BTW, there's another package you can install to do this, python-html2text or python3-html2text. But I found it to be temperamental.

fixit7 · Mar 20, 2019

Thanks to all for your help.

Found this too.

lynx --dump ./bookmarks.html > file.txt

wizardfromoz · Mar 20, 2019

wizardfromoz said:
I will move this to Command Line, because I am guessing a certain Member or two will spot it, and suggest a script.

Wizard lucks in

Ken, meet Jas. Jas, meet Ken.

Cheers

Wizard

Vrai · Mar 21, 2019

fixit7 said:
Is there a way to print my Firefox bookmarks without all the html formatting.

For example, I would want just "http://resources.hewitt.com/centerpoint/" to be printed.

<DT><A HREF="http://resources.hewitt.com/centerpoint/" ADD_DATE="1352487818"

I just tried this script from the support.mozilla.org site and it worked!
Firefox 65.0.1 on Linux Mint 19 Cinnamon
https://support.mozilla.org/en-US/questions/1132893

kenJackson · Mar 21, 2019

Vrai said:
I just tried this script from the support.mozilla.org site and it worked!

Javascript. Gross. But it's an interesting approach.

JasKinasis · Mar 21, 2019

kenJackson said:
I think this does the same, but only uses awk in one pass.

Code:

awk '/place:/{next}; /HREF=/{gsub(/HREF=|"/,"",$2); print $2}' bookmarks.html > bookmarks.txt

The problem is, both depends on a very particular HTML format. If you're only using it on bookmark pages--great. But if you then decide to try it on some website page you found--not so great.

BTW, there's another package you can install to do this, python-html2text or python3-html2text. But I found it to be temperamental.

Thanks Ken.
Yes, I agree your awk one-liner is a much more elegant solution than my horrible hack!

In my defense, I was writing my post at the end of the work day yesterday, whilst one of my backup scripts was running. I had to shut my machine down and run out to catch my bus home as soon as my backups were finished. So I was in a bit of a rush and just posted what I had. I did mean to write a little more, but ran out of time.

I also meant to revisit this thread when I got home yesterday evening to explain that my solution was a bit of a hack and to try to come up with a more elegant one liner using only sed or awk. But I fell asleep last night shortly after eating, so it didn't happen!

With sed and awks built-in pattern matching capabilities, there was no need for me to use grep. And I also know that generally speaking, if you find yourself using sed and awk together, it usually means that you really only need awk. So I did break a few of the golden rules of scripting there. But again - it was a quick and dirty hack, off the top of my head - composed after a quick look at the formatting of the links in Firefox's bookmarks.html.

So it was very specific to firefox's bookmarks.html and a bit of a hideous hack. But it did the job! XD
Thanks again for the awk one liner... Much better than my initial effort!

Print bookmarks.html with the formatting

fixit7

Member

wizardfromoz

Administrator

fixit7

Member

kenJackson

Member

JasKinasis

Super Moderator

kenJackson

Member

fixit7

Member

wizardfromoz

Administrator

Vrai

Well-Known Member

kenJackson

Member

JasKinasis

Super Moderator

Members online

Latest posts