This book analyses the shortcoming of existing methods for extracting information from web pages. Our analysis shows that existing methods use high level information from these web pages inefficiently, which ultimately degrades their objective performance. We develop a series of optimized extraction techniques which improve on the state of the art. Experimental tests show that our techniques can perform better than the existing techniques on a wide range of data records.
Dr. David Hong Jer Lang received the BSc Comp Sc degree from University of Nottingham and PhD degree from Monash University. He is an author of 18 papers, 3 books, reviewers for 14 conferences/journals, and PhD examiner for 2 PhD theses. He has won student travel grants during his PhD study and obtained four sponsorships for overseas travel.