In this blog post, I’m going to be referring to a paper that I wrote about some Bayesian scoring functions for learning the structure of Bayesian belief networks (BBNs). The paper may be downloaded by clicking here, and there is also an accompanying slide deck that may be downloaded by clicking here. These documents are […]
Introduction Sometimes, we would like to sort the values coming into the Reducer of a Hadoop Map/Reduce (MR) Job. You can indirectly sort the values by using a combination of implementations. They are as follows. Use a composite key. Extend org.apache.hadoop.mapreduce.Partitioner. Extend org.apache.hadoop.io.WritableComparator. Other tutorials that explains this approach on sorting values going into a […]
I am going to continue on the Map/Reduce Text Mining Toolkit (MRTMT) API in this blog post. I have worked on it a little bit more, and now I will be releasing v0.3. The added improvements include allowing the user to specify the local and global weights used to build the vector space model (VSM).
I recently released MRTMT v0.1. In that article, I stated there were still a lot of work to be done. Since, I have simplified the process of building a vector space model (VSM) in a newer version, MRTMT v0.2. The full source code may be downloaded here. Again, this API is released under the Apache […]
In this blog I will discuss a simple toolkit you may use to create a vector space model (VSM) (Salton 75). The toolkit is called, Map/Reduce Text Mining Toolkit (MRTMT), however, for now, its accomplishments does not entirely cover the scope of text mining and just merely creating a VSM from text documents. The purpose […]
In this blog entry, I will show with a few lines of code how to create an in-memory PDF report and send it as an email attachment. Why is this exercise or illustration important? It is important to me because I’m involved in a lot of report generation projects where the reporting logic and display […]