这期内容当中小编将会给大家带来有关如何用代码实现RNN文本生成模型,文章内容丰富且以专业的角度为大家分析和叙述,阅读完这篇文章希望大家可以有所收获。
文本生成(generating text)对机器学习和NLP初学者来说似乎很有趣的项目之一,但也是一个非常困难的项目。值得庆幸的是,网络上有各种各样的优秀资源,可以用于了解RNN如何用于文本生成,从理论到深入具体的技术,都有一些非常好的资源。所有的这些资源都会特别分享一件事情:在文本生成过程中的某个时候,你必须建立RNN模型并调参来完成这项工作。
虽然文本生成是一项有价值的工作,特别是在学习的该过程中,但如果任务抽象程度高,应该怎么办呢?如果你是一个数据科学家,需要一个RNN文本生成器形式的模块来填充项目呢?或者作为一个新人,你只是想试试或者提升下自己。对于这两种情况,都可以来看看textgenrnn项目,它用几行代码就能够轻松地在任何文本数据集上训练任意大小和复杂的文本生成神经网络。 textgenrnn项目由数据科学家Max Woolf开发而成。
textgenrnn是建立在Keras和TensorFlow之上的,可用于生成字符和文字级文本。网络体系结构使用注意力加权来加速训练过程并提高质量,并允许调整大量超参数,如RNN模型大小、RNN层和双向RNN。读者可以在Github上或类似的介绍博客文章中阅读有关textgenrnn及其功能和体系结构的更多信息。
本文爬取2014年1月1日至2018年6月11日特朗普的推文,其中包括美国总统就职前后的推文(来自特朗普Twitter Archive)。从中只选择日期范围内的推文来获取文本,并将其保存到一个文本文件中,将该文本命名为trump-tweets.txt。
下面让我们看看用textgenrnn生成文本的简单方法。以下4行是我们需要导入的库,并创建文本生成对象,在trump-tweets.txt文件中训练模型10个epoch,然后生成一些示例推文。
from textgenrnn import textgenrnn textgen = textgenrnn() textgen.train_from_file('trump-tweets.txt', num_epochs=10) textgen.generate(5)
大约30分钟后,会产生以下结果(训练时间与电脑性能有关,在第10个epoch):
My @FoxNews will be self finally complaining about me that so he is a great day and companies and is starting to report the president in safety and more than any mention of the bail of the underaches to the construction and freedom and efforts the politicians and expensive meetings should have bee The world will be interviewed on @foxandfriends at 7:30pm. Enjoy! .@JebBush and Fake News Media is a major place in the White House in the service and sense where the people of the debate and his show of many people who is a great press considering the GREAT job on the way to the U.S. A the best and people in the biggest! Thank you! New Hampshire Trump Int'l Hotel Leadership Barrier Lou Clinton is a forever person politically record supporters have really beginning in the media on the heart of the bad and women who have been succeeded and before you can also work the people are there a time strong and send out the world with Join me in Maryland at 7:00 A.M. and happened to the WALL and be true the longer of the same sign into the Fake News Media will be a great honor to serve that the Republican Party will be a great legal rate the media with the Best Republican Party and the American people that will be the bill by a...
抛开政治而言,考虑到我们仅仅在10个epoch内使用约12千条推文进行训练,这些生成的推文并不是很糟糕的。textgenrnn默认设置为0.5,如果想获得更多的创意推文可以将该值调高,下面让我们试试看调高该值后效果如何:
textgen.generate(5, temperature=0.9)
“Via-can see this Democrats were the opening at GREAT ENSUS CALL! .@GovSeptorald Taster is got to that the subcent Vote waiting them. @Calkers Major President Obama will listen for the disaster! Grateful and South Carolina so his real ability and much better-- or big crisis on many signing!It is absolutely dumbers for well tonight. Love us in the great inherition of fast. With bill of badly to forget the greatest puppet at my wedds. No Turnberry is "bigger.” - All
上述结果不太具有说服力,那么将textgenrnn值调低效果又如何呢?结果表明模型更稳定:
textgen.generate(5, temperature=0.1)
The Fake News Media is a great people of the president was a great people of the many people who would be a great people of the president was a big crowd of the statement of the media is a great people of the people of the statement of the people of the people of the world with the statement of th Thank you @TrumpTowerNY #Trump2016 https://t.co/25551R58350Thank you for your support! #Trump2016 https://t.co/7eN53P55cThe people of the U.S. has been a great people of the presidential country is a great time and the best thing that the people of the statement of the media is the people of the state of the best thing that the people of the statement of the statement of the problem in the problem and success and t Thank you @TheBrodyFile tonight at 8:00 A.M. Enjoy!
根据两个例子的对比,可以对这个项目有更清晰的了解。
上述就是小编为大家分享的如何用代码实现RNN文本生成模型了,如果刚好有类似的疑惑,不妨参照上述分析进行理解。如果想知道更多相关知识,欢迎关注亿速云行业资讯频道。
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。