#text-extraction

  1. unpdf

    High-performance PDF content extraction to Markdown, text, and JSON

    v0.1.6 #pdf #markdown #text-extraction #document-parser #pdf-parser #text-document
  2. keyword_extraction

    Collection of algorithms for keyword extraction from text

    v1.5.0 110 #extract #tf-idf #algorithm #text-extraction
  3. pdfvec

    High-performance PDF text extraction library for vectorization pipelines

    v0.1.1 #pdf #vectorization #nlp #text-extraction
  4. pdf_oxide

    The Complete PDF Toolkit: extract, create, and edit PDFs. Rust core with bindings for Python, Node, WASM, Go, and more.

    v0.3.2 150 #pdf #pdf-parser #text-extraction
  5. docx-lite

    Lightweight, fast DOCX text extraction library with minimal dependencies

    v0.2.0 13K #docx #text-extraction #parser #word #office
  6. heavy-pdf-parser

    Extract text from PDF files with support for multiple output formats

    v0.1.0 #pdf #text-extraction #document-processing #rust
  7. epub-parser

    extracting metadata, table of contents, text, cover, and images from EPUB files

    v0.3.4 #ebook #epub #text-extraction #metadata #parser
  8. arabic_pdf_to_text

    A CLI tool to convert Arabic PDFs to text using Google's Gemini API

    v0.1.0 #gemini-api #pdf #arabic #text-extraction
  9. parser-core

    extracting text from various file formats including PDF, DOCX, XLSX, PPTX, images via OCR, and more

    v0.1.3 120 #docx #text-parser #pdf #ocr #text-extraction
  10. parser-web

    Web API for extracting text from various file formats

    v0.1.3 #web-api #pdf #text-extraction #parser
  11. Try searching with DuckDuckGo.

  12. the-daily-stallman

    Read the news like Stallman would. No JavaScript required.

    v0.3.1 #stallman #text-extraction #rms #news