python - how to determine if webpage has been modified -


I have a snapshot of several webpages taken at 2 times, to determine which web pages have been modified, a reliable What is the law?

I can not trust anything like an RSS feed, and I need to ignore the slight noise like the date text.

Ideally I'm looking for a Python solution, but an intuitive algorithm would also be great.

Thank you!

OK, first you have to decide what the noise is and what does not you use a HTML parser. Can you remove the noise, print the result beautifully, and compare it as a string.

If you are looking for an automatic solution, then you page, calculate it and compare it with thresholds.


Comments

Popular posts from this blog

MySql variables and php -

url rewriting - How to implement the returnurl like SO in PHP? -

Which Python client library should I use for CouchdB? -