To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness.
减少分布式程序的执行时间,是网格调度系统需要解决的重要问题。因分布式程序常建模为DAG图,故该问题又称异构DAG调度问题。提出的置换调度蚁群PSACS(Permutation Scheduling Ant Colony System)将DAG调度方案表示为任务置换列表,使用标准蚁群搜索技术探索解空间。实验表明,该算法明显优于遗传算法和粒子群算法,能够一次求出大部分(65%)同构DAG调度问题的最优解并获得非常好的异构DAG调度方案。