No description
  • TypeScript 98.1%
  • JavaScript 1.9%
Rifa Achrinza aa4c270c90
feat: initial commit
Signed-off-by: Rifa Achrinza <25147899+achrinza@users.noreply.github.com>
2024-12-06 01:05:55 +08:00
LICENSES feat: initial commit 2024-12-06 01:05:55 +08:00
src feat: initial commit 2024-12-06 01:05:55 +08:00
.editorconfig feat: initial commit 2024-12-06 01:05:55 +08:00
.gitignore feat: initial commit 2024-12-06 01:05:55 +08:00
package-lock.json feat: initial commit 2024-12-06 01:05:55 +08:00
package.json feat: initial commit 2024-12-06 01:05:55 +08:00
README.md feat: initial commit 2024-12-06 01:05:55 +08:00
test.js feat: initial commit 2024-12-06 01:05:55 +08:00
tsconfig.json feat: initial commit 2024-12-06 01:05:55 +08:00

XML SAX Parser for Node.js / JavaScript (Work in Progress)

This is an experiment to attempt implementation of a validating XML SAX parser in pure-JavaScript.

Warning

This library cannot parse XML documents in any meaningful way at this moment.

What can it do now?

It can only read an XML declaration (e.g. <?xml version="1.0" charset="UTF-8" standalone="yes"?>).

To test this out:

$ git clone https://github.com/achrinza/xml-parser
$ cd xml-paresr
$ npm ci
$ npm run build
$ npm start

Overview

Goals

  • Flexible
  • Pure-JavaScript
  • Browser and Node.js-compatible with no bundling
  • SAX-compatible streaming API
  • Validating
  • XML 1.0 and XML 1.1 spec-compliant
  • Markup declaration-aware (aka. reosolve entity references against DTD subset)
  • Similar security boundaries as JAXP.
  • Namespace-aware
  • xInclude-aware

Currently out of scope (for now) but in the back of the mind:

  • DOM parser Focus on the SAX parser first; The plan is to build a DOM parser on top of the SAX parser.
  • XML Schema-aware (why does this spec have 5 parts?)
  • RELAX-NG
  • XML Catalog-aware
  • XSLT 3.0 streamable transformations
  • XML Fast InfoSet
  • XML XOP (Need an MIME parser)

Flexibility

The SAXParser class has static members which can be altered to change its behaviour.

Static member Description
SAXParser.KEYWORDS Tokens used to identify fundamental XML elements
SAXParser.RESERVED_XMLNS_PREFIXES XML namespace prefixes that cannot be overriden by the parsed document
SAXParser.PREDEFINED_GENERAL_ENTITIES XML general entities that are predefined for substitution
SAXParserConfig Configuration options; Most options have no effect at the moment

For SAXParser.KEYWORDS, strings and O(1)-like data types are used where possible (e.g. for tokens with only one value or tokens with multiple tokens of equal, known length). Otherwise, arrays of strings are used.

Principles

  • Parser works with strings. Hence, constants should be "precompilied" as strings before parse.

  • Reduce the need for object property traversal. Leverage existing local variables where possible, but do not reinstantiate complex variables.

  • Reduce the number of hot functions. Exceptions to this are functions that should be rarely triggered (e.g. raising validation/well-formedness errors) and event emitters.

  • Use POJOs when all the keys are numbers.

  • Use Sets for constants with multiple valid values where possible

  • Fallback to arrays for multi-value constants with variable length

  • Logic for each state must be self-contained. States must not be aware of the next state's expected values. The exceptio to this are junction states

  • Use junction states where there are multiple valid next states

  • Use entry states with fallthrough (i.e. no break) during explicit state change to reduce redundant state reinitialization, such as to assign keyword* variables.

License

The source code is dual-licensed. Hence, you may modify an re-release under either, or both licenses.

EPL-2.0 OR GPL-2.0-or-later

Config files (e.g. .gitignore) are licensed under FSFAP.

Check the individual files to verify the license.