Using Translated Data to Improve Deep Learning Author Profiling Models

Veenhoven, R., Snijders, S., van der Hall, D. & van Noord, R., 10-Sep-2018, Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018). CLEF, 12 p.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

In this report on our participation in the PAN shared task on author profiling, we describe our attempt to identify the gender of authors using their posted tweets and images. The data of interest are tweets in the English, Spanish and Arabic languages as well as images. Included in our report is our final submitted system, a bi-LSTM model with attention, as well as an explanation on the less effective solutions we explored. We also detail an approach to obtain more training data, by simply translating the gold standard data of other languages to the language of interest. This proved to be a cheap and robust method for increasing the accuracy of all three languages. Official test accuracy scores are 79.3, 80.4 and 74.9 for English, Spanish and Arabic respectively.
Original languageEnglish
Title of host publicationProceedings of the Ninth International Conference of the CLEF Association (CLEF 2018)
Number of pages12
Publication statusPublished - 10-Sep-2018


  • author profiling, deep learning, additional training data

Download statistics

No data available

ID: 78990582