RIBERT Leveraging Rule Information for Chemical Production Named Entity Recognition via Pretrained BERT
##plugins.themes.bootstrap3.article.main##
Abstract
Chemical Named Entity Recognition (NER) faces critical challenges in accurately
identifying domain-specific entities such as complex compound nomenclature and
safety protocols. To address the underutilization of structural rules in chemical literature,
this study proposes RiBERT, a novel framework integrating rule-informed boundary
definitions with a hybrid BERT-KNN architecture. We systematically define 15 entity
types across 4 categories based on regulatory documents. The model synergizes
BERT's contextual embeddings with KNN's non-parametric retrieval to dynamically
adapt to sparse entities and lexical variations. Evaluations on specialized chemical and
public maintenance corpora demonstrate F1 scores of 69.77% and 88.12%,
outperforming state-of-the-art baselines by up to 3.91%. The results indicate that
incorporating rule-based information significantly enhances NER performance for
chemical data.