Using Translated Data to Improve Deep Learning Author Profiling ModelsVeenhoven, R., Snijders, S., van der Hall, D. & van Noord, R., 10-Sep-2018, Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018). CLEF, 12 p.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Academic › peer-review
In this report on our participation in the PAN shared task on author profiling, we describe our attempt to identify the gender of authors using their posted tweets and images. The data of interest are tweets in the English, Spanish and Arabic languages as well as images. Included in our report is our final submitted system, a bi-LSTM model with attention, as well as an explanation on the less effective solutions we explored. We also detail an approach to obtain more training data, by simply translating the gold standard data of other languages to the language of interest. This proved to be a cheap and robust method for increasing the accuracy of all three languages. Official test accuracy scores are 79.3, 80.4 and 74.9 for English, Spanish and Arabic respectively.
|Title of host publication||Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018)|
|Number of pages||12|
|Publication status||Published - 10-Sep-2018|
- author profiling, deep learning, additional training data
No data available