Abstract
Abstractive summarization models rely on a single gold reference for training and evaluation, assuming it is optimal. We show that machine-generated summaries frequently exhibit stronger semantic alignment with source articles than gold references on CNN/DM and XSum. To address this, we propose a unified framework comprising the Multi-Reference Training Framework (MRTF) and Multi-Reference Evaluation Framework (MREF), both driven by an LSI-based Semantic Selection Mechanism (LSI-SSM). Grounded in the Eckart–Young–Mirsky theorem, LSI-SSM uses only k=2 dimensions to achieve 98.9% of neural embedding quality at zero GPU cost. Across BRIO, BART, PEGASUS, and DistilBART, MRTF yields statistically significant (p < 0.001) ROUGE-1 gains of +2.13 to +2.61 on CNN/DM. MREF reveals that traditional evaluation undervalues summaries by 3–5 ROUGE-1 points, with distributional bias analysis confirming 83–87% of corrections reflect genuine quality. Cross-domain validation on Multi-News, SAMSum, and PubMed confirms generalizability.
| Original language | English |
|---|---|
| Journal | IEEE Access |
| DOIs | |
| Publication status | Accepted/In press - 2026 |
All Science Journal Classification (ASJC) codes
- General Computer Science
- General Materials Science
- General Engineering
Fingerprint
Dive into the research topics of 'A Unified Multi-Reference Framework for Training and Evaluation in Abstractive Summarization'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver