Pointer-prototype fusion network for few-shot named entity recognition

doi:10.19682/j.cnki.1005-8885.2023.0011

中国邮电高校学报(英文) ›› 2023, Vol. 30 ›› Issue (5): 32-41.doi: 10.19682/j.cnki.1005-8885.2023.0011

所属专题： Special Topic on Digital Human

• Special Topic : Digital Human • 上一篇下一篇

Pointer-prototype fusion network for few-shot named entity recognition

赵海英, 郭轩

北京邮电大学

收稿日期:2023-07-20 修回日期:2023-09-02 出版日期:2023-10-31 发布日期:2023-10-30
通讯作者: 赵海英 E-mail:zhaohaiying@bupt.edu.cn
基金资助:
the National Key Research and Development Project (2021YFF0901701).

Pointer-prototype fusion network for few-shot named entity recognition

Hai-Ying ZHAO² , Xuan GUO¹

1. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China

Received:2023-07-20 Revised:2023-09-02 Online:2023-10-31 Published:2023-10-30
Contact: Hai-Ying ZHAO E-mail:zhaohaiying@bupt.edu.cn
Supported by:
the National Key Research and Development Project (2021YFF0901701).

摘要/Abstract

摘要：

Few-shot named entity recognition (NER) aims to identify named entities in new domains using a limited amount of annotated data. Previous methods divided this task into entity span detection and entity classification, achieving good results. However these methods are limited by the imbalance between the entity and non-entity categories due to the use of sequence labeling for entity span detection. To this end, a point-proto network ( PPN) combining pointer and prototypical networks was proposed. Specifically, the pointer network generates the position of entities in sentences in the entity span detection stage. The prototypical network builds semantic prototypes of entity types and classifies entities based on their distance from these prototypes in the entity classification stage. Moreover, the low-rank adaptation ( LoRA) fine-tuning method, which involves freezing the pre-trained weights and injecting a trainable decomposition matrix, reduces the parameters that need to be trained and saved. Extensive experiments on the few-shot NER Dataset (Few-NERD) and Cross-Dataset demonstrate the superiority of PPN in this domain.

关键词: few-shot named entity recognition (NER), pointer network, prototypical network, low-rank adaptation

Abstract:

Key words:

few-shot named entity recognition (NER), pointer network, prototypical network, low-rank adaptation

Zhao Haiying, GUO Xuan. Pointer-prototype fusion network for few-shot named entity recognition[J]. The Journal of China Universities of Posts and Telecommunications, 2023, 30(5): 32-41.

参考文献

[1] HUANG Z H, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging. arXiv Preprint, arXiv:1508. 01991, 2015.

[2] ZHANG Y, YANG J. Chinese NER using lattice LSTM. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Vol 1, 2018, Jul 15 -20, Melbourne, Australia. Stroudsburg, PA, USA: Association for Computational

Linguistics, 2018: 1554 -1564.

[3] HE H F, SUN X. A unified model for cross-domain and semi-supervised named entity recognition in Chinese social media. Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI'17), 2017, Feb 4 - 9, San Francisco, CA, USA. Palo Alto, CA, USA: Association for the Advancement of Artificial Intelligence (AAAI), 2017: 3216 -3222.

[4] SNELL J, SWERSKY K, ZEMEL R S. Prototypical networks for few-shot learning. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), 2017, Dec 4 -9, Long Beach, CA, USA. Red Hook, NY, USA: Curran Associates Inc, 2017: 4080 -4090.

[5] YANG Y, KATIYAR A. Simple and effective few-shot named entity recognition with structured nearest neighbor learning. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP'20), 2020, Nov 16 -20,

Online. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020: 6365 -6375.

[6] HOU Y T, CHE W, LAI Y K, et al. Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive

projection network. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, Jul 5 - 10,

Online. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020: 1381 -1393.

[7] WANG P Y, XU R X, LIU T Y, et al. An enhanced span-based decomposition method for few-shot sequence labeling. Proceedings

of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL'22), 2022, Jul 10 - 15, Seattle, WA, USA. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022: 5012 -5024.

[8] MA T T, JIANG H Q, WU Q H, et al. Decomposed meta-learning for few-shot named entity recognition. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL'22), 2022, May 22 - 27, Dublin, Ireland. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022: 1584 -1596.

[9] WANG J N, WANG C Y, TAN C Q, et al. SpanProto: a two-stage span-based prototypical network for few-shot named entity recognition. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, Dec 7 -11, Abu Dhabi, United Arab Emirates. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022: 3466 -3476.

[10] VINYALS O, FORTUNATO M, JAITLY N. Pointer networks. Proceedings of the 28th Annual Conference on Neural Information Processing Systems ( NIPS'15), 2015, Dec 7 - 12, Montreal, Canada. Cambridge, MA, USA: MIT Press, 2015: 2692 -2700.

[11] HU E J, SHEN Y L, WALLIS P, et al. LoRA: low-rank adaptation of large language models. Proceedings of the 10th International Conference on Learning Representations (ICLR'22), 2022, Apr 25 -29, Virtual Event. OpenReview. net.

[12] DING N, XU G W, CHEN Y L, et al. Few-NERD: a few-shot named entity recognition dataset. Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP'21): Vol 1, 2021, Aug 1 - 6, Online. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021: 3198 -3213.

[13] LI J, CHIU B, FENG S S, et al. Few-shot named entity recognition via meta-learning. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(9): 4245 -4256.

[14] LAFFERTY J D, MCCALLUM A, PEREIRA F N. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning ( ICML'01 ), 2001, Jun 28-Jul 1, Williamstown, MA, USA. San Francisco, CA, USA: Morgan Kaufmann Publishers, 2001: 282 -289.

[15] YAN H, GUI T, DAI J P, et al. A unified generative framework for various NER subtasks. Proceedings of the Joint Conference of

the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural

Language Processing ( ACL-IJCNLP'21): Vol 1, 2021, Aug 1 -6, Online. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021: 5808 -5822.

[16] CHEN X, LI L, DENG S M, et al. LightNER: a lightweight tuning paradigm for low-resource NER via pluggable prompting. Proceedings of the 29th International Conference on Computational Linguistics ( COLING'22 ), 2022, Oct 12 - 17, Gyeongju, Republic of Korea. New York, NY, USA: International Committee on Computational Linguistics, 202: 2374 -2387.

[17] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks. PMLR 70: Proceedings of the 34th International Conference on Machine Learning (ICML'17), 2017, Aug 6 - 11, Sydney, Australia. Brookline, MA, USA: Microtome Publishing, 2017: 1126 -1135.

[18] AGHAJANYAN A, GUPTA S, ZETTLEMOYER L. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing ( ACL-IJCNLP'21 ): Vol 1, 2021, Aug 1 - 6, Online. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021: 7319 -7328.

[19] HE J X, ZHOU C T, MA X Z, et al. Towards a unified view of parameter-efficient transfer learning. Proceedings of the (ICLR'22), 2022, Apr 25 - 29, Virtual Event. OpenReview. net.

[20] LI X L, LIANG P. Prefix-tuning: optimizing continuous prompts for generation. Proceedings of the Joint Conference of the 59th

Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language

Processing (ACL-IJCNLP'21): Vol 1, 2021, Aug 1 -6, Online. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021: 4582 -4597.

[21] HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP. PMLR 97: Proceedings of the

36th International Conference on Machine Learning (ICML'19), 2019, Jun 9 - 15, Long Beach, CA, USA. Atlanta, GA USA: International Machine Learning Society (IMLS), 2019: 2790 -2799.

[22] VIJAYAKUMAR A K, COGSWELL M, SELVARAJU R R, et al. Diverse beam search: decoding diverse solutions from neural sequence models. arXiv Preprint, arXiv:1610. 02424, 2016. [23] TJONG KIM SANG E F, DE MEULDER F. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL ( CONLL'03), 2003, May 31, Edmonton Canada. Stroudsburg, PA, USA: Association for Computational Linguistics, 2003: 142 -147.

[24] ZELDES A. The GUM corpus: creating multilayer resources in the classroom. Language Resources and Evaluation, 2017, 51(3): 581 -612.

[25] DERCZYNSKI L, NICHOLS E, VAN ERP M, et al. Results of the WNUT2017 shared task on novel and emerging entity recognition. Proceedings of the 3rd Workshop on Noisy User-generated Text ( W-NUT'17 ), 2017, Sep 7, Copenhagen, Denmark. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017: 140 -147.

[26] WEISCHEDEL R, PALMER M, MARCUS M, et al. OntoNotes Release 5. 0. LDC2013T19. Philadelphia, PA, USA: Linguistic Data Consortium, 2013. [27] LEWIS M, LIU Y H, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, Jul 5 - 10, Online. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020: 7871 -7880.

[28] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT'19), 2019, Jun 2 -7, Minneapolis, MN, USA. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019: 4171 -4186.

Pointer-prototype fusion network for few-shot named entity recognition

Pointer-prototype fusion network for few-shot named entity recognition

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价