Re模块怎么支持python爬虫的正则表达式

这篇文章给大家分享的是有关Re模块怎么支持python爬虫的正则表达式的内容。小编觉得挺实用的，因此分享给大家做个参考。一起跟随小编过来看看吧。

创新互联主营通城网站建设的网络公司,主营网站建设方案,重庆APP开发公司,通城h5成都小程序开发搭建,通城网站营销推广欢迎通城等地区企业咨询

Python 自带了re模块，它提供了对正则表达式的支持。主要用到的方法列举如下

#返回pattern对象
re.compile(string[,flag])  
#以下为匹配所用函数
re.match(pattern, string[, flags])
re.search(pattern, string[, flags])
re.split(pattern, string[, maxsplit])
re.findall(pattern, string[, flags])
re.finditer(pattern, string[, flags])
re.sub(pattern, repl, string[, count])
re.subn(pattern, repl, string[, count])

本篇文章以两个最常用的方法进行举例介绍

1.re.match(pattern, string[, flags])

这个方法将会从 string（我们要匹配的字符串）的开头开始，尝试匹配pattern，一直向后匹配，如果遇到无法匹配的字符，立即返回None，如果匹配未结束已经到达string的末尾，也会返回None。两个结果均表示匹配失败，否则匹配pattern成功，同时匹配终止，不再对string向后匹配。下面我们通过一个例子理解一下

__author__ = 'CQC'
# -*- coding: utf-8 -*-
 
#导入re模块
import re
 
# 将正则表达式编译成Pattern对象，注意hello前面的r的意思是“原生字符串”
pattern = re.compile(r'hello')
 
# 使用re.match匹配文本，获得匹配结果，无法匹配时将返回None
result1 = re.match(pattern,'hello')
result2 = re.match(pattern,'helloo CQC!')
result3 = re.match(pattern,'helo CQC!')
result4 = re.match(pattern,'hello CQC!')
 
#如果1匹配成功
if result1:
    # 使用Match获得分组信息
    print result1.group()
else:
    print '1匹配失败！'
 
 
#如果2匹配成功
if result2:
    # 使用Match获得分组信息
    print result2.group()
else:
    print '2匹配失败！'
 
 
#如果3匹配成功
if result3:
    # 使用Match获得分组信息
    print result3.group()
else:
    print '3匹配失败！'
 
#如果4匹配成功
if result4:
    # 使用Match获得分组信息
    print result4.group()
else:
print '4匹配失败！'

运行结果

hello
hello
3匹配失败！
Hello

2.re.search(pattern, string[, flags])

search 方法与match方法极其类似，区别在于match ()函数只检测re是不是在string的开始位置匹配，search ()会扫描整个string查找匹配，match（）只有在0位置匹配成功的话才有返回，如果不是开始位置匹配成功的话，match ()就返回None。同样，search方法的返回对象同样match ()返回对象的方法和属性。我们用一个例子感受一下

#导入re模块
import re
 
# 将正则表达式编译成Pattern对象
pattern = re.compile(r'world')
# 使用search()查找匹配的子串，不存在能匹配的子串时将返回None
# 这个例子中使用match()无法成功匹配
match = re.search(pattern,'hello world!')
if match:
    # 使用Match获得分组信息
    print match.group()
### 输出 ###
# world

感谢各位的阅读！关于Re模块怎么支持python爬虫的正则表达式就分享到这里了，希望以上内容可以对大家有一定的帮助，让大家可以学到更多知识。如果觉得文章不错，可以把它分享出去让更多的人看到吧！

当前名称：Re模块怎么支持python爬虫的正则表达式
当前链接：http://6mz.cn/article/iihjcp.html

网站建设知识

Re模块怎么支持python爬虫的正则表达式

其他资讯