这篇文章主要讲解了“Python分割器怎么使用”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“Python分割器怎么使用”吧!
# 将txt小说分割转换成多个HTML文件 # @author : GreatGhoul # @email : greatghoul@gmail.com # @blog : http://greatghoul.javaeye.com import re import os # regex for the section title # sec_re = re.compile(r'第.+卷\s+.+\s+第.+章\s+.+') # txt book's path. source_path = 'f:\\佣兵天下.txt' path_pieces = os.path.split(source_path) novel_title = re.sub(r'(\..*$)|($)', '', path_pieces[1]) target_path = '%s%s_html' % (path_pieces[0], novel_title) section_re = re.compile(r'^\s*第.+卷\s+.*$') section_head = ''''' <html> <head> <meta http-equiv="Content-Type" content="GBK"/> <title>%s</title> </head> <body style="font-family:楷体,宋体;font-size:16px; margin:0; padding: 20px; background:#FAFAD2;color:#2B4B86;text-align:center;"> <h3>%s</h3><a href="#bottom">去页尾</a><hr/>''' # escape xml/html def escape_xml(code): text = code text = re.sub(r'<', '<', text) text = re.sub(r'>', '>', text) text = re.sub(r'&', '&', text) text = re.sub(r'\t', ' ', text) text = re.sub(r'\s', ' ', text) return text # entry of the script def main(): # create the output folder if not os.path.exists(target_path): os.mkdir(target_path) # open the source file input = open(source_path, 'r') sec_count = 0 sec_cache = [] idx_cache = [] output = open('%s\\%d.html' % (target_path, sec_count), 'w') preface_title = '%s 前言' % novel_title output.writelines([section_head % (preface_title, preface_title)]) idx_cache.append('<li><a href="%d.html">%s</a></li>' % (sec_count, novel_title)) for line in input: # is a chapter's title? if line.strip() == '': pass elif re.match(section_re, line): line = re.sub(r'\s+', ' ', line) print 'converting %s...' % line # write the section footer sec_cache.append('<hr/><p>') if sec_count == 0: sec_cache.append('<a href="index.html">目录</a> | ') sec_cache.append('<a href="%d.html">下一篇</a> | ' % (sec_count + 1)) else: sec_cache.append('<a href="%d.html">上一篇</a> | ' % (sec_count - 1)) sec_cache.append('<a href="index.html">目录</a> | ') sec_cache.append('<a href="%d.html">下一篇</a> | ' % (sec_count + 1)) sec_cache.append('<a name="bottom" href="#">回页首</a></p>') sec_cache.append('</body></html>') output.writelines(sec_cache) output.flush() output.close() sec_cache = [] sec_count += 1 # create a new section output = open('%s\\%d.html' % (target_path, sec_count), 'w') output.writelines([section_head % (line, line)]) idx_cache.append('<li><a href="%d.html">%s</a></li>' % (sec_count, line)) else: sec_cache.append('<p style="text-align:left;">%s</p>' % escape_xml(line)) # write rest lines sec_cache.append('<a href="%d.html">下一篇</a> | ' % (sec_count - 1)) sec_cache.append('<a href="index.html">目录</a> | ') sec_cache.append('<a name="bottom" href="#">回页首</a></p></body></html>') output.writelines(sec_cache) output.flush() output.close() sec_cache = [] # write the menu output = open('%s\\index.html' % (target_path), 'w') menu_head = '%s 目录' % novel_title output.writelines([section_head % (menu_head, menu_head), '<ul style="text-align:left">']) output.writelines(idx_cache) output.writelines(['</ul><body></html>']) output.flush() output.close() inx_cache = [] print 'completed. %d chapter(s) in total.' % sec_count if __name__ == '__main__': main()
感谢各位的阅读,以上就是“Python分割器怎么使用”的内容了,经过本文的学习后,相信大家对Python分割器怎么使用这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是亿速云,小编将为大家推送更多相关知识点的文章,欢迎关注!
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。