Bash script to output site status out of Google Safe Browsing page?

P

postcd

Guest
Hello,

i would like bash script to extract the site status message at

https://www.google.com/transparencyreport/safebrowsing/diagnostic/?#url=yahoo.com

You see:

Current status:
Not dangerous

Safe Browsing has not recently seen malicious content on yahoo.com.

The "Not dangerous , Safe Browsing has not recently seen malicious content on yahoo.com." appears to be dynamically generated (or how to name that, is it jquery?), because when i use:

curl URL_ADDRESS

lynx --dump URL_ADDRESS

it outputs whole page texts which contains all possible statuses.

Which Linux command to use to get real status of the webpage (example: "Not dangerous")?

Thank you
 


I can't do this from work (proxy issues). curl or a wget should return the web page. It may have all the values - but only one of them would be displayed. I would think you could parse / grep for the proper HTML tags?

This sounds like a Perl script to me to get the page, parse it, and return the value.
 
I took a look at the output.

This site is calling a JavaScript procedure and passing a parameter - the JavaScript procedure is stored at https://www.google.com/transparencyreport/safebrowsing/diagnostic/js/main.js?hl=en. This is a somewhat messy JS program; it does some analysis and passes parameters to include the output in that "Status of:" <div>. This procedure has no line breaks in it, and you would need to download and parse it every time; I would think the code would need to know how to parse JavaScript, find the calls and structures, capture the parameters it creates, etc. Basically, you would need to reverse engineer the code a bit to parse it out. You could do a wget to get the js first, then the output from the call, and compare results.

Either way, this is not a simple process as it turns out.
 

Members online


Latest posts

Top