##plugins.themes.bootstrap3.article.main##

Jingyang Ge Jianbin Wang Qingtao Chang Yali Chen Xun Wang Hongxun Shi

Abstract

Chemical Named Entity Recognition (NER) faces critical challenges in accurately
identifying domain-specific entities such as complex compound nomenclature and
safety protocols. To address the underutilization of structural rules in chemical literature,
this study proposes RiBERT, a novel framework integrating rule-informed boundary
definitions with a hybrid BERT-KNN architecture. We systematically define 15 entity
types across 4 categories based on regulatory documents. The model synergizes
BERT's contextual embeddings with KNN's non-parametric retrieval to dynamically
adapt to sparse entities and lexical variations. Evaluations on specialized chemical and
public maintenance corpora demonstrate F1 scores of 69.77% and 88.12%,
outperforming state-of-the-art baselines by up to 3.91%. The results indicate that
incorporating rule-based information significantly enhances NER performance for
chemical data.

Downloads

Download data is not yet available.

##plugins.themes.bootstrap3.article.details##

Section
Articles