python - how to determine if webpage has been modified -


I have a snapshot of several webpages taken at 2 times, to determine which web pages have been modified, a reliable What is the law?

I can not trust anything like an RSS feed, and I need to ignore the slight noise like the date text.

Ideally I'm looking for a Python solution, but an intuitive algorithm would also be great.

Thank you!

OK, first you have to decide what the noise is and what does not you use a HTML parser. Can you remove the noise, print the result beautifully, and compare it as a string.

If you are looking for an automatic solution, then you page, calculate it and compare it with thresholds.


Comments

Popular posts from this blog

asp.net - Javascript/DOM Why is does my form not support submit()? -

sockets - Delphi: TTcpServer, connection reset when reading -

javascript - Classic ASP "ExecuteGlobal" statement acting differently on two servers -