boilerpipeR: Interface to the boilerpipe Java library by Christian Kohlschutter (http://code.google.com/p/boilerpipe/)

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Version: 1.0
Depends: rJava
Published: 2012-12-21
Author: Mario Annau [aut, cre]
Maintainer: Mario Annau <mario.annau at gmail.com>
License: Apache License (== 2.0)
NeedsCompilation: no
CRAN checks: boilerpipeR results

Downloads:

Package source: boilerpipeR_1.0.tar.gz
MacOS X binary: boilerpipeR_1.0.tgz
Windows binary: boilerpipeR_1.0.zip
Reference manual: boilerpipeR.pdf
Vignettes: Introduction to the tm.plugin.webmining Package

Reverse dependencies:

Reverse depends: tm.plugin.webmining