A very short blog post this time. I wrote a paper on how ROC curves are constructed to measure and visualize the performance of binary classifiers. If you are interested, you may download the paper by clicking here. Enjoy and cheers! Sib ntsib dua nawb mog.إلى اللقاء
In this blog post, I’m going to be referring to a paper that I wrote about some Bayesian scoring functions for learning the structure of Bayesian belief networks (BBNs). The paper may be downloaded by clicking here, and there is also an accompanying slide deck that may be downloaded by clicking here. These documents are […]
Introduction I am reading a book by (Lin and Dyer 2010). This book is very informative about designing efficient algorithms under the Map/Reduce (M/R) programming paradigm. Of particular interest is the “in-mapper combining” design pattern that I came across while reading this book. As if engineers and data miners did not have to change their […]
I am going to continue on the Map/Reduce Text Mining Toolkit (MRTMT) API in this blog post. I have worked on it a little bit more, and now I will be releasing v0.3. The added improvements include allowing the user to specify the local and global weights used to build the vector space model (VSM).
In this blog I will discuss a simple toolkit you may use to create a vector space model (VSM) (Salton 75). The toolkit is called, Map/Reduce Text Mining Toolkit (MRTMT), however, for now, its accomplishments does not entirely cover the scope of text mining and just merely creating a VSM from text documents. The purpose […]
Last time I analyzed Mahout’s collaborative filtering algorithm. In this blog, I will be writing about computing the canonical Pearson correlation between two variables for a set of data using Hadoop’s M/R paradigm. If you have already written your own M/R tasks for Jobs, this tutorial is not for you. If you are just starting […]