Python英文文本分词(无空格)模块wordninja的使用实例-创新互联

在NLP中，数据清洗与分词往往是很多工作开始的第一步，大多数工作中只有中文语料数据需要进行分词，现有的分词工具也已经有了很多了，这里就不再多介绍了。英文语料由于其本身存在空格符所以无需跟中文语料同样处理，如果英文数据中没有了空格，那么应该怎么处理呢？

创新互联专注于岚县企业网站建设,成都响应式网站建设公司,商城网站定制开发。岚县网站建设公司,为岚县等地区提供建站服务。全流程定制开发，专业设计，全程项目跟踪，创新互联专业和态度为您提供的服务

今天介绍一个工具就是专门针对上述这种情况进行处理的，这个工具叫做：wordninja，地址在这里。

下面简单以实例看一下它的功能:

def wordinjaFunc():
  '''
  https://github.com/yishuihanhan/wordninja
  '''
  import wordninja
  print wordninja.split('derekanderson')
  print wordninja.split('imateapot')
  print wordninja.split('wethepeopleoftheunitedstatesinordertoformamoreperfectunionestablishjusticeinsuredomestictranquilityprovideforthecommondefencepromotethegeneralwelfareandsecuretheblessingsoflibertytoourselvesandourposteritydoordainandestablishthisconstitutionfortheunitedstatesofamerica')
  print wordninja.split('littlelittlestar')

分享名称：Python英文文本分词(无空格)模块wordninja的使用实例-创新互联
URL地址：http://6mz.cn/article/dhiscd.html

网站建设知识

Python英文文本分词(无空格)模块wordninja的使用实例-创新互联

其他资讯