- downloads classed webpages
- counts unique words on each page
- counts by class and site
- pretty good approximation for textdiff
- outputs weighted rankings
I need to see if the generated list is a good classifier of pages by class.
I would like a statistician to develop a metric of this performance.
No comments:
Post a Comment