Statistics for A Case Study for Massive Text Mining: K Nearest Neighbor Algorithm on PubMed data