I have been writing Natural Language Processing (NLP) library for Mongolian text. As the most of us know already, there are a couple NLP packages out there such as NLTK, spaCy, and gensim. However, those are not for processing Mongolian text. So I am writing that.
Currently, I have done a basic pre-processing (stop words and punctuation removal) and tokenizer. From here, I think that I should implement stemmer or lemmatizer. And I have no idea how to implement those two. Do I create a bunch of rules based system? or Should I use Machine Learning? Or both?
Could anyone shed light on how to implement such features?