开放式文本生成(Open-Ended Text Generation)

计算岩土力学

2021年7月27日 13:05

1 引言

开放式文本生成(Open-Ended Text Generation)通俗一点儿来说就是续写句子。我们以前使用马尔科夫链产生新的文本《马尔可夫链(Markov chain)随机产生新的文档》,可以认为是续写句子的一种方法。在开放式文本生成中，已知一个给定的句子，目标是按照这个句子创建一个连贯的文本，使得句子能够延续下去。Transformers的管道名为"text-generation"，其构建基础是因果语言模拟(causal language modeling), 默认的模型是GPT-2，Top-K采样。

from transformers import pipelinetext_generator = pipeline("text-generation")

管道对象调用方法generate()来生成文本。可以用max_length和do_sample方法重载默认参数。在下面的测试中，除了使用GPT-2外，也使用distilgpt2和gpt2-large(3.25G)进行了观察。

开放式文本生成(Open-Ended Text Generation)的图1

2 实例

(1) 输入的文本 text = "These solutions have been combined with probabilistic Monte Carlo methods to identify critical step paths."【这些解决方案与概率蒙特卡洛方法相结合，以确定关键阶梯路径。】

生成的文本如下：

The study was supported by the National Science Foundation. The results of the study are published in the journal Nature. "We have shown that the probabilistic approach is able to identify the critical step paths that are necessary for the development of a new system," said lead author Dr. Michael J. H. Hirsch, a professor of physics at the University of California, Berkeley. "This is a major step forward in the development of a new system." 【这些解决方案与概率蒙特卡洛方法相结合，以确定关键阶梯路径。这项研究得到了美国国家科学基金会的支持。该研究的结果发表在《自然》杂志上。"我们已经表明，概率方法能够确定开发一个新系统所需的关键阶梯路径，" 主要作者、加州大学伯克利分校物理学教授Michael J. Hirsch博士说。"这是开发新系统的一个重要步骤"。】

(2) 输入的文本 text ="Recent research indicates that brittle fracture and step-path failure are important considerations in both natural high-mountain and engineered rock slopes."【最近的研究表明，脆性断裂和阶梯式破坏是自然高山和工程岩坡的重要考虑因素。】

生成的文本如下：

The following table summarizes the results of a series of studies on the effects of fracture and step-path failure on rock slopes. Table 1. Effects of fracture and step-path failure on rock slopes (in inches) 【下表总结了一系列关于断裂和阶梯式破坏对岩坡影响的研究结果。表1. 断裂和阶梯式破坏对岩坡的影响（单位：英寸)】

3 结束语

开放式文本生成提供了一种自动续写句子的方法。不过，生成的句子尽管在语法上满足了要求，但在语义上和逻辑上仍然显得文理不通，就像上面实例所示的一样。

登录后免费查看全文

立即登录