当前位置: 首页 > news >正文

【parser】stanford-parser demo使用

测试站点:

http://nlp.stanford.edu:8080/parser/index.jsp

先贴点代码,是stanfor-parser的demo:

import java.util.Collection;
import java.util.List;
import java.io.StringReader;

import edu.stanford.nlp.process.TokenizerFactory;
import edu.stanford.nlp.process.CoreLabelTokenFactory;
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.process.PTBTokenizer;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.ling.Sentence;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;

class ParserDemo {

  /**
   * The main method demonstrates the easiest way to load a parser.
   * Simply call loadModel and specify the path, which can either be a
   * file or any resource in the classpath.  For example, this
   * demonstrates loading from the models jar file, which you need to
   * include in the classpath for ParserDemo to work.
   */
  public static void main(String[] args) {
    LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
    if (args.length > 0) {
      demoDP(lp, args[0]);
    } else {
      demoAPI(lp);
    }
  }

  /**
   * demoDP demonstrates turning a file into tokens and then parse
   * trees.  Note that the trees are printed by calling pennPrint on
   * the Tree object.  It is also possible to pass a PrintWriter to
   * pennPrint if you want to capture the output.
   */
  public static void demoDP(LexicalizedParser lp, String filename) {
    // This option shows loading and sentence-segmenting and tokenizing
    // a file using DocumentPreprocessor.
    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    // You could also create a tokenizer here (as below) and pass it
    // to DocumentPreprocessor
    for (List<HasWord> sentence : new DocumentPreprocessor(filename)) {
      Tree parse = lp.apply(sentence);
      parse.pennPrint();
      System.out.println();

      GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
      Collection tdl = gs.typedDependenciesCCprocessed();
      System.out.println(tdl);
      System.out.println();
    }
  }

  /**
   * demoAPI demonstrates other ways of calling the parser with
   * already tokenized text, or in some cases, raw text that needs to
   * be tokenized as a single sentence.  Output is handled with a
   * TreePrint object.  Note that the options used when creating the
   * TreePrint can determine what results to print out.  Once again,
   * one can capture the output by passing a PrintWriter to
   * TreePrint.printTree.
   */
  public static void demoAPI(LexicalizedParser lp) {
    // This option shows parsing a list of correctly tokenized words
    String[] sent = { "This", "is", "an", "easy", "sentence", "." };
    List<CoreLabel> rawWords = Sentence.toCoreLabelList(sent);
    Tree parse = lp.apply(rawWords);
    parse.pennPrint();
    System.out.println();

    // This option shows loading and using an explicit tokenizer
    String sent2 = "This is another sentence.";
    TokenizerFactory<CoreLabel> tokenizerFactory =
      PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
    List<CoreLabel> rawWords2 =
      tokenizerFactory.getTokenizer(new StringReader(sent2)).tokenize();
    parse = lp.apply(rawWords2);

    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
    List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
    System.out.println(tdl);
    System.out.println();

    TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
    tp.printTree(parse);
  }

  private ParserDemo() {} // static methods only

}

结果:

Your query
猴子喜欢吃香蕉。
Segmentation
猴子
喜欢
吃
香蕉
。
Tagging
猴子/NR
喜欢/VV
吃/VV
香蕉/NN
。/PU
Parse

(ROOT
  (IP
    (NP (NR 猴子))
    (VP (VV 喜欢)
      (IP
        (VP (VV 吃)
          (NP (NN 香蕉)))))
    (PU 。)))

Typed dependencies

nsubj(喜欢-2, 猴子-1)
root(ROOT-0, 喜欢-2)
ccomp(喜欢-2, 吃-3)
dobj(吃-3, 香蕉-4)

Typed dependencies, collapsed

nsubj(喜欢-2, 猴子-1)
root(ROOT-0, 喜欢-2)
ccomp(喜欢-2, 吃-3)
dobj(吃-3, 香蕉-4)

 

转载于:https://www.cnblogs.com/549294286/archive/2013/05/08/3067534.html

相关文章:

  • 程序环境基于 IO密集 CPU密集考量 SAN NAS 选择的一点建议
  • SQL Server修改标识列方法
  • Uva 10085 - The most distant state
  • java数据类型与Sql server数据类型对应关系
  • IT职场人生系列之十六:入职(新手篇)
  • 不用+和-实现变量加1的代码
  • Java之线程(0) - 序
  • Scrolling DIV and Canvas flicker on iPhone/iPad touch
  • 使用PPRevealSideViewController创建抽屉式导航
  • Postfix+Dovecot+LAMP+Extmail搭建web邮件系统(三)
  • 个位数为6且被3整除的五位数有多少?
  • Postfix+Dovecot+LAMP+Extmail搭建web邮件系统(四)
  • 输入状态HDU 2577 动态规划(DP) How to Type
  • 优化网站设计(三十二):使favicon.ico文件尽可能小并且可以缓存
  • 电脑城 Ghost XP SP3 笔记本专用版 2012.10
  • HTTP中的ETag在移动客户端的应用
  • JDK 6和JDK 7中的substring()方法
  • Linux学习笔记6-使用fdisk进行磁盘管理
  • orm2 中文文档 3.1 模型属性
  • PaddlePaddle-GitHub的正确打开姿势
  • React中的“虫洞”——Context
  • Vim Clutch | 面向脚踏板编程……
  • 测试开发系类之接口自动化测试
  • 翻译:Hystrix - How To Use
  • 高度不固定时垂直居中
  • 基于遗传算法的优化问题求解
  • 普通函数和构造函数的区别
  • 让你的分享飞起来——极光推出社会化分享组件
  • 如何正确配置 Ubuntu 14.04 服务器?
  • 实现菜单下拉伸展折叠效果demo
  • 思否第一天
  • 小李飞刀:SQL题目刷起来!
  • ​LeetCode解法汇总2583. 二叉树中的第 K 大层和
  • # 睡眠3秒_床上这样睡觉的人,睡眠质量多半不好
  • #vue3 实现前端下载excel文件模板功能
  • (145)光线追踪距离场柔和阴影
  • (八)Spring源码解析:Spring MVC
  • (二) Windows 下 Sublime Text 3 安装离线插件 Anaconda
  • (二)【Jmeter】专栏实战项目靶场drupal部署
  • (附源码)python房屋租赁管理系统 毕业设计 745613
  • (附源码)基于SSM多源异构数据关联技术构建智能校园-计算机毕设 64366
  • (三)elasticsearch 源码之启动流程分析
  • (五)IO流之ByteArrayInput/OutputStream
  • (转)C语言家族扩展收藏 (转)C语言家族扩展
  • .NET 8.0 中有哪些新的变化?
  • .NET Core实战项目之CMS 第一章 入门篇-开篇及总体规划
  • .net core使用ef 6
  • .NET 的静态构造函数是否线程安全?答案是肯定的!
  • .NET 分布式技术比较
  • .NET 设计模式—简单工厂(Simple Factory Pattern)
  • .NET3.5下用Lambda简化跨线程访问窗体控件,避免繁复的delegate,Invoke(转)
  • .NET6使用MiniExcel根据数据源横向导出头部标题及数据
  • ::
  • @ 代码随想录算法训练营第8周(C语言)|Day53(动态规划)
  • @RequestMapping处理请求异常