The multilingual model was the obvious pick and it lost by six F1 points. Why tokenizer coverage beat parameter breadth on Bundesdatenschutzgesetz entities, what disentangled attention did to German compound nouns, and the cost of specializing a model inside a regulated pipeline.
NLPDeBERTaFine-tuningGDPR