Learning attention for historical text normalization by learning to pronounce
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Standard
Learning attention for historical text normalization by learning to pronounce. / Bollmann, Marcel; Bingel, Joachim; Søgaard, Anders.
ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics, 2017. s. 332-344.Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Learning attention for historical text normalization by learning to pronounce
AU - Bollmann, Marcel
AU - Bingel, Joachim
AU - Søgaard, Anders
PY - 2017/1/1
Y1 - 2017/1/1
N2 - Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in performance. We analyze the induced models across 44 different texts from Early New High German. Interestingly, we observe that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms. This, we believe, is an important step toward understanding how MTL works.
AB - Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named task. We address this problem by using several novel encoder-decoder architectures, including a multi-task learning (MTL) architecture using a grapheme-to-phoneme dictionary as auxiliary data, pushing the state-of-the-art by an absolute 2% increase in performance. We analyze the induced models across 44 different texts from Early New High German. Interestingly, we observe that, as previously conjectured, multi-task learning can learn to focus attention during decoding, in ways remarkably similar to recently proposed attention mechanisms. This, we believe, is an important step toward understanding how MTL works.
UR - http://www.scopus.com/inward/record.url?scp=85040931354&partnerID=8YFLogxK
U2 - 10.18653/v1/P17-1031
DO - 10.18653/v1/P17-1031
M3 - Article in proceedings
AN - SCOPUS:85040931354
SP - 332
EP - 344
BT - ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
PB - Association for Computational Linguistics
T2 - 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017
Y2 - 30 July 2017 through 4 August 2017
ER -
ID: 194945376