Clever domain adaptation strategies for BERT in the task of hostile-language detection

Villa Cueva, EmilioAragón Saenzpardo, Mario EzraLópez Monroy, AdriánSánchez Vega, Fernando2026-04-222026-04-222026-03-31Villa-Cueva, E., Aragón, M.E., López-Monroy, A.P., & Sánchez-Vega, F. (2026) Clever domain adaptation strategies for BERT in the task of hostile-language detection. Multimedia Tools and Applications, 85(323). https://doi.org/10.1007/s11042-026-21521-1https://hdl.handle.net/10347/46880Cyberbullying has experienced a surge in recent years, mainly due to the widespread adoption of social media platforms. This trend manifests in multiple ways, with hostile language being one of the most common. The latter underscores the urgent need for robust detection methods to address this issue effectively. To address this problem, we propose a novel pipeline to enhance hostile language detection in social media. Our approach consists of a combination of two ideas: First, we propose conducting a Domain Adaptation procedure to specialize the knowledge of a pre-trained BERT, making it more specialized in the domain of social media. For this adaptation, we modify the traditional random Masked Language Modeling technique and propose three novel strategies for selecting the subset of tokens to mask out cleverly. Second, we tailor an Adversarial Regularizer when fine-tuning the adapted BERT for specific hostile-language datasets. We evaluate the performance of our method for detecting hate speech, aggressiveness, offensiveness, and sexism. Our results show that the Domain Adaptation procedure significantly outperforms vanilla BERT, and the Adversarial Regularizer can lead to more robust fine-tuning, thereby enhancing performance. Moreover, we demonstrate that these methods can be used together to achieve an even more significant performance boost.engThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.Attribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/Hostile languageDomain adaptationSocial mediaText classificationClever domain adaptation strategies for BERT in the task of hostile-language detectionjournal article10.1007/s11042-026-21521-11573-7721open access