Multiple Features Based Approach to Extract Bio-molecular Event Triggers Using Conditional Random Field

Full Text (PDF, 369KB), PP.41-47

Views: 0 Downloads: 0


Amit Majumder 1,*

1. Department of Computer Application, Academy of Technology, Hooghly, West Bengal, India

* Corresponding author.


Received: 20 Feb. 2012 / Revised: 11 Jun. 2012 / Accepted: 17 Aug. 2012 / Published: 8 Nov. 2012

Index Terms

BioNLP, Conditional Random Field (CRF), Event, Event Trigger, Template


The purpose of Biomedical Natural Language Processing (BioNLP) is to capture biomedical phenomena from textual data by extracting relevant entities, information and relations between biomedical entities (i.e. proteins and genes). In general, in most of the published papers, only binary relations were extracted. In a recent past, the focus is shifted towards extracting more complex relations in the form of bio-molecular events that may include several entities or other relations. In this paper we propose an approach that enables event trigger extraction of relatively complex bio-molecular events. We approach this problem as a detection of bio-molecular event trigger using the well-known algorithm, namely Conditional Random Field (CRF). We apply our experiments on development set. It shows the overall average recall, precision and F-measure values of 64.27504%, 69.97559% and 67.00429%, respectively for the event detection.

Cite This Paper

Amit Majumder, "Multiple Features Based Approach to Extract Bio-molecular Event Triggers Using Conditional Random Field", International Journal of Intelligent Systems and Applications(IJISA), vol.4, no.12, pp.41-47, 2012. DOI:10.5815/ijisa.2012.12.06


[1]Hyoung-Gyu Lee, Han-Cheol Cho, Min-Jeong Kim Joo-Young Lee, Gumwon Hong, Hae-Chang Rim. A Multi-Phase Approach to Biomedical Event Extraction. In BioNLP ’09: Proceedings of the Workshop on BioNLP, 107-110.

[2]Arzucan O¨ zgu¨ r, Dragomir R. Radev. Supervised Classification for Extracting Biomedical Events. BioNLP ’09: Proceedings of the Workshop on BioNLP, 111-114.

[3]Pyysalo S, Ginter F, Heimonen J, Bj¨orne J, Boberg J, J¨arvinen J, Salakoski T, BioInfer: A corpus for information extraction in the biomedical domain, BMC Bioinformatics 8:50, 2007.

[4]Kim J-D, Ohta T, Tsujii J, Corpus annotation for mining biomedical events from literature, BMC Bioinformatics 9:10, 2008.

[5]Kim J-D, Ohta T, Pyysalo S, Kano Y, Tsujii J, Overview of BioNLP’09 shared task on event extraction, in BioNLP ’09: Proceedings of the Workshop on BioNLP, pp. 1–9, 2009.

[6]Nancy Chinchor. 1998. Overview of MUC-7/MET-2. In Message Understanding Conference (MUC-7) Proceedings.

[7]Asif Ekbal, Amit Majumder, Mohammad Hasanuzzaman and Sriparna Saha. Supervised Machine Learning Approach for Bio-molecular Event Extraction in Swarm, Evolutionary, and Memetic Computing (SEMCO), 2011

[8]Ellen Voorhees. 2007. Overview of TREC 2007. In the Sixteenth Text REtrieval Conference (TREC 2007) Proceedings.

[9]Sriparna Saha, M. Hasanuzzaman, Amit Majumder, Asif Ekbal. Bio-molecular event extraction using Support Vector Machine in Third International Conference on Advanced Computing (ICoAC), 2011

[10]Stephanie Strassel, Mark Przybocki, Kay Peterson, Zhiyi Song, and Kazuaki Maeda. 2008. Linguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

[11]Lynette Hirschman, Martin Krallinger, and Alfonso Valencia, editors. 2007. Proceedings of the Second BioCreative Challenge Evaluation Workshop. CNIO Centro Nacional de Investigaciones Oncol´ogicas.

[12]Jin-Dong Kim, Tomoko Ohta, Yoshimasa Tsuruoka, Yuka Tateisi, and Nigel Collier. 2004. Introduction to the bio-entity recognition task at JNLPBA. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA), pages 70–75.

[13]Claire N´edellec. 2005. Learning Language in Logic -Genic Interaction Extraction Challenge. In J. Cussens and C. N´edellec, editors, Proceedings of the 4th Learning Language in Logic Workshop (LLL05), pages 31–37.

[14]Andrew Chatr-aryamontri, Arnaud Ceol, Luisa Montecchi Palazzi, Giuliano Nardelli, Maria Victoria Schneider, Luisa Castagnoli, and Gianni Cesareni. 2007. MINT: the Molecular INTeraction database. Nucleic Acids Research, 35(suppl 1):D572–574.

[15]Gary D. Bader, Michael P. Cary, and Chris Sander. 2006. Pathguide: a Pathway Resource List. Nucleic Acids Research., 34(suppl 1):D504–506.

[16]Evelyn Camon, Michele Magrane, Daniel Barrell, Vivian Lee, Emily Dimmer, John Maslen, David Binns, Nicola Harte, Rodrigo Lopez, and Rolf Apweiler. 2004. The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucl. Acids Res., 32(suppl 1):D262–266.

[17]Jin-Dong Kim, Tomoko Ohta, and Jun’ichi Tsujii. 2008. Corpus annotation for mining biomedical events from lterature. BMC Bioinformatics, 9(1):10.

[18]Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. 2006. Generating Typed Dependency Parses from Phrase Structure Parses. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), pages 449–454.

[19]Lorraine Tanabe, Natalie Xie, Lynne Thom, Wayne Matten, and John Wilbur. 2005. Genetag: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics, 6(Suppl 1):S3.

[20]Sampo Pyysalo, Tomoko Ohta, Jin-Dong Kim, and Jun’ichi Tsujii. 2009. Static Relations: a Piece in the Biomedical Information Extraction Puzzle. In Proceedings of Natural Language Processing in Biomedicine (BioNLP) NAACL 2009 Workshop, 1-9.

[21]Tomoko Ohta, Jin-Dong Kim, Sampo Pyysalo, and Jun’ichi Tsujii. 2009. Incorporating GENETAG-style annotation to GENIA corpus. In Proceedings of Natural Language Processing in Biomedicine (BioNLP) NAACL 2009 Workshop, 1-9.

[22]Gu¨nes¸ Erkan, Arzucan O¨ zgu¨r, and Dragomir R. Radev. 2007. Semi-supervised classification for extracting protein interaction sentences using dependency parsing. In Proceedings of EMNLP, pages 228–237.

[23]R. C. Bunescu and R. J. Mooney, 2007. Text Mining and Natural Language Processing, Chapter Extracting Relations from Text: From Word Sequences to Dependency Paths, pages 29–44, Springer.

[24]Lafferty, J., McCallum, A., Pereira, F. 2001 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of 18th International Conference on Machine Learning, pp.282-289.

[25]Sha, F., Pereira, F. 2003. Shallow Parsing with Conditional Random Fields. Proceedings of HLT-NAACL.