这篇文章将为大家详细讲解有关python中去除标点符号的方法,小编觉得挺实用的,因此分享给大家做个参考,希望大家阅读完这篇文章后可以有所收获。
Python去掉标点符号的方法如下:
方法一:
str.isalnum:
S.isalnum() -> bool
返回值:如果string至少有一个字符并且所有字符都是字母或数字则返回True,否则返回False。
实例:
>>> string = "Special $#! characters spaces 888323"
>>> ''.join(e for e in string if e.isalnum())
'Specialcharactersspaces888323'
只能识别字母和数字,杀伤力大,会把中文、空格之类的也干掉
方法二:
string.punctuation
import re, string
s ="string. With. Punctuation?" # Sample string
# 写法一:
out = s.translate(string.maketrans("",""), string.punctuation)
# 写法二:
out = s.translate(None, string.punctuation)
# 写法三:
exclude = set(string.punctuation)
out = ''.join(ch for ch in s if ch not in exclude)
# 写法四:
>>> for c in string.punctuation:
s = s.replace(c,"")
>>> s
'string With Punctuation'
# 写法五:
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)
## re.escape:对字符串中所有可能被解释为正则运算符的字符进行转义
# 写法六:
# string.punctuation 只包括 ascii 格式; 想要一个包含更广(但是更慢)的方法是使用: unicodedata module :
from unicodedata import category
s = u'String — with - «Punctuation »...'
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)
print 'Stripped', out
# 输出:u'Stripped String \u2014 with \xabPunctuation \xbb'
out = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'Stripped', out
# 输出:u'Stripped String with Punctuation '
# For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.
# To remove (some?) punctuation then, use:
import string
remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)
# Your method doesn't work in Python 3, as the translate method doesn't accept the second argument any more.
import unicodedata
import sys
tbl = dict.fromkeys(i for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P'))
def remove_punctuation(text):
return text.translate(tbl)
方法三:
re
例:
import re
s ="string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)
测试:
import re, string, timeit
s ="string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))
def test_set(s):
return ''.join(ch for ch in s if ch not in exclude)
def test_re(s):
return regex.sub('', s)
def test_trans(s):
return s.translate(table, string.punctuation)
def test_repl(s):
for c in string.punctuation:
s=s.replace(c,"")
return s
print"sets :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print"regex :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print"translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print"replace :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)
out_put:
# sets : 19.8566138744
# regex : 6.86155414581
# translate : 2.12455511093
# replace : 28.4436721802
关于python中去除标点符号的方法就分享到这里了,希望以上内容可以对大家有一定的帮助,可以学到更多知识。如果觉得文章不错,可以把它分享出去让更多的人看到。
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。