python, lxml and xpath - html table parsing -
I am new to lxml, new to python and the following can not be found: I have to import some tables with 3 columns and an undefined number of rows starting 3 lines. When the second column of any row is empty, then this row is removed and processing of the table is aborted. The following code fixes the data in the table (but I can not reuse the data later): to lxml.html import parse def Process_row (line): for cell in row.xpath ('./td'): print cell.text_content () produce cell.text_content () def process_table (table): [process_row (line) for line return table.xpath ('./tr')] doc = parse (url) .getroot () tbl = doc.xpath ("/ html // table [2]") [0] data = process_table (tbl) This prints the first column only: ( In the data I: print i.next () The following are only the third Import line, and not later tbl = doc.xpath ("// body / table [2] // tr [position ()> gt; 2]") [0] Anyone can know a fancy solution to get all the data f...