public class StanfordTokenizer
extends org.apache.uima.fit.component.JCasAnnotator_ImplBase
This uses the Stanford POS Tagger to tokenize sentences and words and give the words a POS
feature.
The code is derived from the TaggerDemo given by Stanford itself.
The part of speech tags given by the Stanford tagger are described here:
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Author:
Benjamin Paassen - bpaassen(at)techfak.uni-bielefeld.de
Copyright (C) 2013, 2014 Raphael Dickfelder, Jan Göpfert, Benjamin Paassen, Andreas Stöckel, licensed under the AGPL v. 3: http://openresearch.cit-ec.de/projects/scie