multithreading - How check if a task is already in python Queue? -
I am writing a simple crawler in Python using threading and Q modules. I get a page, check the link and place them in a queue, when a certain thread process ends the page, it grabs the next one from the queue, I use an array for those pages I am looking to filter the links that I have already added to the queue, but if there are multiple threads and they get the same link on different pages Qatar put in duplicate links I've already exists in the URL line that I can find out how to do it or not, to avoid putting it again?
If you do not care about the order in which the items are processed, then I set
:
class setQueue (queue): def _init (self , Maxsize): self.maxsize = maxsize self.queue = set () def_put (self, item): self.queue.add (item) def _get (self): Paul McGuire told that to add this "duplicate item" The latter has been removed from the "to-b-processed" set and has not yet been added, To resolve this on your own KQPop ()
set of "processed", you can store both sets in the line
example , But when you are checking in a large number that the item has been processed, you can also go back to the queue
which will order the requests properly.
class setQueue (line): def _init (auto, max): QE (self, max) self.all_items = set () def _put (auto, item): if The item is not in the self. All_items: Qi._put (self, items) self.all_items.add (item)
Using a set aside, unlike its profit, is that line < / Code> methods are thread-safe, so that you do not need extra locking to check the other set.
Comments
Post a Comment