-
unpdf
High-performance PDF content extraction to Markdown, text, and JSON
-
keyword_extraction
Collection of algorithms for keyword extraction from text
-
pdfvec
High-performance PDF text extraction library for vectorization pipelines
-
pdf_oxide
The Complete PDF Toolkit: extract, create, and edit PDFs. Rust core with bindings for Python, Node, WASM, Go, and more.
-
docx-lite
Lightweight, fast DOCX text extraction library with minimal dependencies
-
heavy-pdf-parser
Extract text from PDF files with support for multiple output formats
-
epub-parser
extracting metadata, table of contents, text, cover, and images from EPUB files
-
arabic_pdf_to_text
A CLI tool to convert Arabic PDFs to text using Google's Gemini API
-
parser-core
extracting text from various file formats including PDF, DOCX, XLSX, PPTX, images via OCR, and more
-
parser-web
Web API for extracting text from various file formats
-
the-daily-stallman
Read the news like Stallman would. No JavaScript required.
Try searching with DuckDuckGo.