Gumbo is an implementation of the HTML5 parsing algorithm implemented as a pure
C99 library with no outside dependencies.
Goals and features of the C library:
- Fully conformant with the HTML5 spec.
- Robust and resilient to bad input.
- Simple API that can be easily wrapped by other languages.
(This is one of such wrappers.)
- Support for source locations and pointers back to the original text.
(Not exposed by this implementation at the moment.)
- Relatively lightweight, with no outside dependencies.
- Passes all html5lib-0.95 tests.
- Tested on over 2.5 billion pages from Google's index.
WWW: https://metacpan.org/pod/HTML::Gumbo
WWW: https://github.com/ruz/HTML-Gumbo