自回归式语言XLNet模型的文本生成试验

计算岩土力学

2021年8月30日 11:04

1 引言

自回归式语言生成基于假设：一个词序列的概率分布可以分解为邻接的下一个词条件概率分布的乘积。使用不同的解码策略，目前产生出许多用于自回归语言生成的模型，最流行的模型有GPT2, XLNet, OpenAi-GPT, CTRL, TransfoXL, XLM, Bart和T5，对GPT2模型我们已经作了很多探索性的工作：

GeotechSet数据集在GPT2上的训练过程

GPT2-Large模型解码方法比较

GPT2-Large模型解码方法---Top-K and Top-p

新探索---EleutherAI的GPT Neo/GPT-3模型

GeotechSet模型的扩展和优化---集成了aitextgen

开放式文本生成(Open-Ended Text Generation)

同时也对T5模型作了探索性的工作:

生成摘要(Summarization)的新方法

Transformers的Text2TextGeneration管道测试

这个笔记探索另一个模型XLNet。

2 XLNet模型

XLNet来自Google公司Yang等人(2019)的论文《XLNet: Generalized Autoregressive Pretraining for Language Understanding(XLNet: 用于语言理解的广义自回归预训练)》，XLNet是一种无监督的语言表征学习方法，它基于一种新的广义包络语言建模目标。XLnet是Transformer-XL模型的一个扩展，使用自回归方法进行预训练，在涉及长上下文的语言任务中表现出优异的性能。XLNet在各种下游语言任务上取得了最先进的(SOTA)结果，包括问题回答、自然语言推理、情感分析和文档排名。

XLNet模型主要有两个：一个是小模型xlnet-base-cased，另一个是大模型xlnet-large-cased。在本次试验中，使用了后者xlnet-large-cased。

3 XLNet测试

测试代码为geotech-XLNnet-Text-Generation.py, 使用下面的段落作为Padding text：

"Shear failure commonly takes place along the joint fissure at the lower part of rock slope, while tensile fracture often occurs at the top part of rock slope. While large-scale stability can be modelled using equivalent rock mass properties, at smaller scale the local variations become significant and failure along the fracture planes is possible. If rock bridge failure represents a brittle failure process, can fracture mechanics principles be used to simulate rock bridge failure better and more realistically? Even when incorporating complex fracture networks, continuum models cannot adequately simulate the relative displacement of blocks, interlocking, internal moments produced by block rotations, and fracturing of the intact rock material. Of interest, Kemeny (2003) recognized the importance of considering the time-dependency of progressive failure of intact rock bridges.". [剪切破坏通常发生在边坡下部的连接裂缝处，而拉伸断裂通常发生在岩坡的顶部。虽然大规模的稳定性可以用等效岩体的特性来模拟，但在较小的范围内，局部的变化变得很明显，沿断裂面的破坏是可能的。如果岩桥破坏代表了一种脆性破坏过程，那么能否用断裂力学原理来更好、更真实地模拟岩桥破坏？即使纳入复杂的断裂网络，连续体模型也不能充分模拟岩块的相对位移、互锁、岩块旋转产生的内力矩以及完整岩石材料的断裂。值得注意的是，Kemeny(2003)认识到考虑完整岩桥的渐进式破坏时间依赖性的重要性"。]

使用"shear failure in rock slope" 作为引导语, 采用beam search=6，生成的文本如下：

"shear failure in rock slope as a mechanism of fracture failure. Shear loss is an important mechanism for fracture loss in rocky slope and is a major cause of erosion and erosion-induced sedimentation. The erosion of rocks and sediments by erosion is the most important source of sedimentary and igneous rocks in the U.S. Geological Survey has estimated that about 1.5 million tons per year are lost to erosion by the United States. The erosion rate is estimated to be about 1% per million years for the entire continental shelf and at 2% per Million Years for all other continental shelves. In addition, erosion rates are estimated at about 2.0 % per 100 m2 of surface area for both continental and sub-continental shelf. "

显然这个结果不是我们期望的，作为比较，使用gpt2-large模型运行，结果显示gpt2-large生成的文本质量比XLNet好。

4 结束语

其他人的测试结果显示XLNet模型并不像模型作者们所说的那样有着神奇的效果，有的测试显示Bert模型比XLNet模型的效果好。另外，这个模型得运行速度与gpt2-large的运行速度相比太慢了。不过，XLNet给我们提供了另一条文本生成的途径。对run_language_modeling.py稍作修改便可以微调XLNet，但由于本机的GPU只有6G,没法运行下去，因此目前得不出微调结果。从理论上来说，我们可以使用目前的方法训练出基于GeotechSet的模型。

自回归式语言XLNet模型的文本生成试验的图1