N-GrAM: New Groningen Author-profiling Model

Basile, A., Dwyer, G., Medvedeva, M., Rawee, J., Haagsma, H. & Nissim, M. 2017 11 p.

Research output: ScientificPaper

We describe our participation in the PAN 2017 shared task on Author Profiling, identifying authors’ gender and language variety for English, Spanish, Arabic and Portuguese. We describe both the final, submitted system, and a series of negative results. Our aim was to create a single model for both gender and language, and for all language varieties. Our best-performing system (on cross-validated results) is a linear support vector machine (SVM) with word unigrams and character 3- to 5-grams as features. A set of additional features, including POS tags, additional datasets, geographic entities, and Twitter handles, hurt, rather than improve, performance. Results from cross-validation indicated high performance overall and results on the test set confirmed them, at 0.86 averaged accuracy, with performance on sub-tasks ranging from 0.68 to 0.98.
Original languageEnglish
Number of pages11
StatePublished - 2017
EventConference and Labs of the Evaluation Forum (CLEF 2017) - Trinity College, Dublin, Ireland
Duration: 11-Sep-201714-Sep-2017


ConferenceConference and Labs of the Evaluation Forum (CLEF 2017)


Conference and Labs of the Evaluation Forum (CLEF 2017): Information Access Evaluation meets Multilinguality, Multimodality, and Visualization


Dublin, Ireland

Event: Conference

Related Activities
  1. Conference and Labs of the Evaluation Forum (CLEF 2017)

    Haagsma, H. (Speaker)

    Activity: ScientificParticipation in conference

View all (1) »

Related Prizes
  1. Best Performance in the 5th International Competition on Author Profiling

    Angelo Basile (Recipient), Gareth Dwyer (Recipient), Maria Medvedeva (Recipient), Josine Rawee (Recipient), Hessel Haagsma (Recipient) & Malvina Nissim (Recipient), 2017

    Prize: Other distinction

View all (0) »

View graph of relations

ID: 48217275