python - how to determine if webpage has been modified -
I have a snapshot of several webpages taken at 2 times, to determine which web pages have been modified, a reliable What is the law?
I can not trust anything like an RSS feed, and I need to ignore the slight noise like the date text.
Ideally I'm looking for a Python solution, but an intuitive algorithm would also be great.
Thank you!
OK, first you have to decide what the noise is and what does not you use a HTML parser. Can you remove the noise, print the result beautifully, and compare it as a string.
If you are looking for an automatic solution, then you page, calculate it and compare it with thresholds.
Comments
Post a Comment