The next frontier, optimization
Self-attention is required. The model must contain at least one self-attention layer. This is the defining feature of a transformer — without it, you have an MLP or RNN, not a transformer.
,这一点在吃瓜中也有详细论述
广泛开展国际人文交流合作,加强多层次文明对话,推动中华文化更好走向世界。举办全球文明对话大会,持续办好良渚论坛。开展“读懂中国”、“兰花奖”等品牌活动,办好中国文化和旅游年(节)、海外中国电影节展。加强文化遗产领域国际发展援助,深化文物追索返还国际合作。支持中华文化传播展示和海外中国学发展。加强区域国别研究。(见专栏14)
Well, it’s easy to see several benefits.