nekohtml 用法

gcgmh

浏览: 349341 次
性别:
来自: 北京

最近访客更多访客>>

kevin.shi

12697459

Yan_Sunny

leoeco2000

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Parser_html

XHTML HTML XML

//nekohtml结合xpath用法
DOMParser parser = new DOMParser();   
    try {   
           //设置网页的默认编码   
           parser.setProperty("http://cyberneko.org/html/properties/default-encoding","gb2312");   
           /*The Xerces HTML DOM implementation does not support namespaces   
           and cannot represent XHTML documents with namespace information.   
           Therefore, in order to use the default HTML DOM implementation with NekoHTML's   
           DOMParser to parse XHTML documents, you must turn off namespace processing.*/  
           parser.setFeature("http://xml.org/sax/features/namespaces", false);   
  
           String strURL = "http://product.dangdang.com/product.aspx?product_id=9317290";   
           BufferedReader in = new BufferedReader(   
                   new InputStreamReader(   
                           new URL(strURL).openStream()));   
           parser.parse(new InputSource(in));   
           in.close();   
          } catch (Exception e) {   
           e.printStackTrace();   
          }   
          Document doc = parser.getDocument();   
          // tags should be in upper case   
          String productsXpath = "/HTML/BODY/DIV[2]/DIV[4]/DIV[2]/DIV/DIV[3]/UL[@class]/LI[9]";   
          NodeList products;   
          try {   
              products = XPathAPI.selectNodeList(doc, productsXpath);   
              System.out.println("found: " + products.getLength());   
              Node node = null;   
              for(int i=0; i< products.getLength();i++)   
              {   
                  node = products.item(i);   
                  System.out.println( i + ":\n" + node.getTextContent());   
              }   
          }catch (TransformerException e) {   
              e.printStackTrace();   
          }

分享到：

httpclient 设置user-agent | solr 排序

2009-09-21 15:02
浏览 2166
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

nekohtml 用法

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

nekohtml 用法

评论

发表评论

相关推荐

HtmlCleanner结合xpath用法

java 正则表达式

htmlparser获取网页上所有有用链接的方法

htmlparser解析自定义标签功能

nekohtml使用笔记

htmlparser使用例子（全）

nekohtml经典小例子一个

nekohtml的2个小例子

htmlparser提取正文

通过百度获取天气预报

一个很好的htmlparser的学习blog

httpclient htmlparser来查询手机号相关信息

htmlparser 精确提取的一些代码

获取meta里的keywords及description的方法

最近访客更多访客>>