文本生成(generating text)对机器学习和NLP初学者来说似乎很有趣的项目之一,但也是一个非常困难的项目。值得庆幸的是,网络上有各种各样的优秀资源,可以用于了解RNN如何用于文本生成,从理论到深入具体的技术,都有一些非常好的资源。所有的这些资源都会特别分享一件事情:在文本生成过程中的某个时候,你必须建立RNN模型并调参来完成这项工作。
虽然文本生成是一项有价值的工作,特别是在学习的该过程中,但如果任务抽象程度高,应该怎么办呢?如果你是一个数据科学家,需要一个RNN文本生成器形式的模块来填充项目呢?或者作为一个新人,你只是想试试或者提升下自己。对于这两种情况,都可以来看看textgenrnn项目,它用几行代码就能够轻松地在任何文本数据集上训练任意大小和复杂的文本生成神经网络。 textgenrnn项目由数据科学家Max Woolf开发而成。
由于“Hello,World!”对于文本生成而言类似于特朗普产生推文一样简单, textgenrnn的默认预训练模型可以轻松地在新文本上进行训练,此外也可以使用textgenrnn来训练新模型(只需将new_model = True添加到任何训练的函数中)。
本文爬取2014年1月1日至2018年6月11日特朗普的推文,其中包括美国总统就职前后的推文(来自特朗普Twitter Archive)。从中只选择日期范围内的推文来获取文本,并将其保存到一个文本文件中,将该文本命名为trump-tweets.txt。
from textgenrnn import textgenrnn textgen = textgenrnn() textgen.train_from_file('trump-tweets.txt', num_epochs=10) textgen.generate(5)
My @FoxNews will be self finally complaining about me that so he is a great day and companies and is starting to report the president in safety and more than any mention of the bail of the underaches to the construction and freedom and efforts the politicians and expensive meetings should have bee The world will be interviewed on @foxandfriends at 7:30pm. Enjoy! .@JebBush and Fake News Media is a major place in the White House in the service and sense where the people of the debate and his show of many people who is a great press considering the GREAT job on the way to the U.S. A the best and people in the biggest! Thank you! New Hampshire Trump Int'l Hotel Leadership Barrier Lou Clinton is a forever person politically record supporters have really beginning in the media on the heart of the bad and women who have been succeeded and before you can also work the people are there a time strong and send out the world with Join me in Maryland at 7:00 A.M. and happened to the WALL and be true the longer of the same sign into the Fake News Media will be a great honor to serve that the Republican Party will be a great legal rate the media with the Best Republican Party and the American people that will be the bill by a...
textgen.generate(5, temperature=0.9)
“Via-can see this Democrats were the opening at GREAT ENSUS CALL! .@GovSeptorald Taster is got to that the subcent Vote waiting them. @Calkers Major President Obama will listen for the disaster! Grateful and South Carolina so his real ability and much better-- or big crisis on many signing!It is absolutely dumbers for well tonight. Love us in the great inherition of fast. With bill of badly to forget the greatest puppet at my wedds. No Turnberry is "bigger.” - All
textgen.generate(5, temperature=0.1)
The Fake News Media is a great people of the president was a great people of the many people who would be a great people of the president was a big crowd of the statement of the media is a great people of the people of the statement of the people of the people of the world with the statement of th Thank you @TrumpTowerNY #Trump2016 https://t.co/25551R58350Thank you for your support! #Trump2016 https://t.co/7eN53P55cThe people of the U.S. has been a great people of the presidential country is a great time and the best thing that the people of the statement of the media is the people of the state of the best thing that the people of the statement of the statement of the problem in the problem and success and t Thank you @TheBrodyFile tonight at 8:00 A.M. Enjoy!