Exploiting Grammatical Relations for Protein Relation Extraction and Role Labeling

Publication type
Publication status
Fayruzov, T., De Cock, M., Cornelis, C., & Hoste, V.
E, Hoenkamp, M. De Cock, and V. Hoste
Proceedings of the Dutch-Belgian Information Retrieval workshop (DIR-2008)


Automatic protein interaction mining from natural language texts and automatic identification of the agent and target proteins (i.e. role labeling) are challenging problems that attract a lot of attention because of the growing amount of biomedical text resources. We propose a novel approach that relies exclusively on parsing and dependency information. We strategically omit any context information such as keywords or parts-of-speech to maximally abstract from the given corpora and look whether the grammatical relations correspond to the semantic relations in the text and how close this correspondence is. In particular, we construct a feature vector for each sentence only from the grammatical relations and some parsing information. We then use the obtained vector with standard machine learning algorithms in deciding whether a sentence describes a protein interaction and which roles the interaction participants play. Evaluation on benchmark datasets shows that our method is competitive with existing state-of-the-art algorithms for the extraction of protein interactions, and gives promising results for protein role detection.