jsontext

package standard library
go1.26.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 7, 2026 License: BSD-3-Clause Imports: 15 Imported by: 172

Documentation

Overview

Package jsontext implements syntactic processing of JSON as specified in RFC 4627, RFC 7159, RFC 7493, RFC 8259, and RFC 8785. JSON is a simple data interchange format that can represent primitive data types such as booleans, strings, and numbers, in addition to structured data types such as objects and arrays.

This package (encoding/json/jsontext) is experimental, and not subject to the Go 1 compatibility promise. It only exists when building with the GOEXPERIMENT=jsonv2 environment variable set. Most users should use encoding/json.

The Encoder and Decoder types are used to encode or decode a stream of JSON tokens or values.

Tokens and Values

A JSON token refers to the basic structural elements of JSON:

  • a JSON literal (i.e., null, true, or false)
  • a JSON string (e.g., "hello, world!")
  • a JSON number (e.g., 123.456)
  • a begin or end delimiter for a JSON object (i.e., '{' or '}')
  • a begin or end delimiter for a JSON array (i.e., '[' or ']')

A JSON token is represented by the Token type in Go. Technically, there are two additional structural characters (i.e., ':' and ','), but there is no Token representation for them since their presence can be inferred by the structure of the JSON grammar itself. For example, there must always be an implicit colon between the name and value of a JSON object member.

A JSON value refers to a complete unit of JSON data:

  • a JSON literal, string, or number
  • a JSON object (e.g., `{"name":"value"}`)
  • a JSON array (e.g., `[1,2,3,]`)

A JSON value is represented by the Value type in Go and is a []byte containing the raw textual representation of the value. There is some overlap between tokens and values as both contain literals, strings, and numbers. However, only a value can represent the entirety of a JSON object or array.

The Encoder and Decoder types contain methods to read or write the next Token or Value in a sequence. They maintain a state machine to validate whether the sequence of JSON tokens and/or values produces a valid JSON. Options may be passed to the NewEncoder or NewDecoder constructors to configure the syntactic behavior of encoding and decoding.

Terminology

The terms "encode" and "decode" are used for syntactic functionality that is concerned with processing JSON based on its grammar, and the terms "marshal" and "unmarshal" are used for semantic functionality that determines the meaning of JSON values as Go values and vice-versa. This package (i.e., jsontext) deals with JSON at a syntactic layer, while encoding/json/v2 deals with JSON at a semantic layer. The goal is to provide a clear distinction between functionality that is purely concerned with encoding versus that of marshaling. For example, one can directly encode a stream of JSON tokens without needing to marshal a concrete Go value representing them. Similarly, one can decode a stream of JSON tokens without needing to unmarshal them into a concrete Go value.

This package uses JSON terminology when discussing JSON, which may differ from related concepts in Go or elsewhere in computing literature.

  • a JSON "object" refers to an unordered collection of name/value members.
  • a JSON "array" refers to an ordered sequence of elements.
  • a JSON "value" refers to either a literal (i.e., null, false, or true), string, number, object, or array.

See RFC 8259 for more information.

Specifications

Relevant specifications include RFC 4627, RFC 7159, RFC 7493, RFC 8259, and RFC 8785. Each RFC is generally a stricter subset of another RFC. In increasing order of strictness:

  • RFC 4627 and RFC 7159 do not require (but recommend) the use of UTF-8 and also do not require (but recommend) that object names be unique.
  • RFC 8259 requires the use of UTF-8, but does not require (but recommends) that object names be unique.
  • RFC 7493 requires the use of UTF-8 and also requires that object names be unique.
  • RFC 8785 defines a canonical representation. It requires the use of UTF-8 and also requires that object names be unique and in a specific ordering. It specifies exactly how strings and numbers must be formatted.

The primary difference between RFC 4627 and RFC 7159 is that the former restricted top-level values to only JSON objects and arrays, while RFC 7159 and subsequent RFCs permit top-level values to additionally be JSON nulls, booleans, strings, or numbers.

By default, this package operates on RFC 7493, but can be configured to operate according to the other RFC specifications. RFC 7493 is a stricter subset of RFC 8259 and fully compliant with it. In particular, it makes specific choices about behavior that RFC 8259 leaves as undefined in order to ensure greater interoperability.

Security Considerations

See the "Security Considerations" section in encoding/json/v2.

Example (StringReplace)

This example demonstrates the use of the Encoder and Decoder to parse and modify JSON without unmarshaling it into a concrete Go type.

package main

import (
	"bytes"
	"fmt"
	"io"
	"log"
	"strings"

	"encoding/json/jsontext"
)

func main() {
	// Example input with non-idiomatic use of "Golang" instead of "Go".
	const input = `{
		"title": "Golang version 1 is released",
		"author": "Andrew Gerrand",
		"date": "2012-03-28",
		"text": "Today marks a major milestone in the development of the Golang programming language.",
		"otherArticles": [
			"Twelve Years of Golang",
			"The Laws of Reflection",
			"Learn Golang from your browser"
		]
	}`

	// Using a Decoder and Encoder, we can parse through every token,
	// check and modify the token if necessary, and
	// write the token to the output.
	var replacements []jsontext.Pointer
	in := strings.NewReader(input)
	dec := jsontext.NewDecoder(in)
	out := new(bytes.Buffer)
	enc := jsontext.NewEncoder(out, jsontext.Multiline(true)) // expand for readability
	for {
		// Read a token from the input.
		tok, err := dec.ReadToken()
		if err != nil {
			if err == io.EOF {
				break
			}
			log.Fatal(err)
		}

		// Check whether the token contains the string "Golang" and
		// replace each occurrence with "Go" instead.
		if tok.Kind() == '"' && strings.Contains(tok.String(), "Golang") {
			replacements = append(replacements, dec.StackPointer())
			tok = jsontext.String(strings.ReplaceAll(tok.String(), "Golang", "Go"))
		}

		// Write the (possibly modified) token to the output.
		if err := enc.WriteToken(tok); err != nil {
			log.Fatal(err)
		}
	}

	// Print the list of replacements and the adjusted JSON output.
	if len(replacements) > 0 {
		fmt.Println(`Replaced "Golang" with "Go" in:`)
		for _, where := range replacements {
			fmt.Println("\t" + where)
		}
		fmt.Println()
	}
	fmt.Println("Result:", out.String())

}
Output:
Replaced "Golang" with "Go" in:
	/title
	/text
	/otherArticles/0
	/otherArticles/2

Result: {
	"title": "Go version 1 is released",
	"author": "Andrew Gerrand",
	"date": "2012-03-28",
	"text": "Today marks a major milestone in the development of the Go programming language.",
	"otherArticles": [
		"Twelve Years of Go",
		"The Laws of Reflection",
		"Learn Go from your browser"
	]
}

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrDuplicateName = errors.New("duplicate object member name")

ErrDuplicateName indicates that a JSON token could not be encoded or decoded because it results in a duplicate JSON object name. This error is directly wrapped within a SyntacticError when produced.

The name of a duplicate JSON object member can be extracted as:

err := ...
serr, ok := errors.AsType[*jsontext.SyntacticError](err)
if ok && serr.Err == jsontext.ErrDuplicateName {
	ptr := serr.JSONPointer // JSON pointer to duplicate name
	name := ptr.LastToken() // duplicate name itself
	...
}

This error is only returned if AllowDuplicateNames is false.

View Source
var ErrNonStringName = errors.New("object member name must be a string")

ErrNonStringName indicates that a JSON token could not be encoded or decoded because it is not a string, as required for JSON object names according to RFC 8259, section 4. This error is directly wrapped within a SyntacticError when produced.

View Source
var Internal exporter

Internal is for internal use only. This is exempt from the Go compatibility agreement.

Functions

func AppendFormat

func AppendFormat(dst, src []byte, opts ...Options) ([]byte, error)

AppendFormat formats the JSON value in src and appends it to dst according to the specified options. See Value.Format for more details about the formatting behavior.

The dst and src may overlap. If an error is reported, then the entirety of src is appended to dst.

func AppendQuote

func AppendQuote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)

AppendQuote appends a double-quoted JSON string literal representing src to dst and returns the extended buffer. It uses the minimal string representation per RFC 8785, section 3.2.2.2. Invalid UTF-8 bytes are replaced with the Unicode replacement character and an error is returned at the end indicating the presence of invalid UTF-8. The dst must not overlap with the src.

func AppendUnquote

func AppendUnquote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)

AppendUnquote appends the decoded interpretation of src as a double-quoted JSON string literal to dst and returns the extended buffer. The input src must be a JSON string without any surrounding whitespace. Invalid UTF-8 bytes are replaced with the Unicode replacement character and an error is returned at the end indicating the presence of invalid UTF-8. Any trailing bytes after the JSON string literal results in an error. The dst must not overlap with the src.

Types

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder is a streaming decoder for raw JSON tokens and values. It is used to read a stream of top-level JSON values, each separated by optional whitespace characters.

Decoder.ReadToken and Decoder.ReadValue calls may be interleaved. For example, the following JSON value:

{"name":"value","array":[null,false,true,3.14159],"object":{"k":"v"}}

can be parsed with the following calls (ignoring errors for brevity):

d.ReadToken() // {
d.ReadToken() // "name"
d.ReadToken() // "value"
d.ReadValue() // "array"
d.ReadToken() // [
d.ReadToken() // null
d.ReadToken() // false
d.ReadValue() // true
d.ReadToken() // 3.14159
d.ReadToken() // ]
d.ReadValue() // "object"
d.ReadValue() // {"k":"v"}
d.ReadToken() // }

The above is one of many possible sequence of calls and may not represent the most sensible method to call for any given token/value. For example, it is probably more common to call Decoder.ReadToken to obtain a string token for object names.

func NewDecoder

func NewDecoder(r io.Reader, opts ...Options) *Decoder

NewDecoder constructs a new streaming decoder reading from r.

If r is a bytes.Buffer, then the decoder parses directly from the buffer without first copying the contents to an intermediate buffer. Additional writes to the buffer must not occur while the decoder is in use.

func (*Decoder) InputOffset

func (d *Decoder) InputOffset() int64

InputOffset returns the current input byte offset. It gives the location of the next byte immediately after the most recently returned token or value. The number of bytes actually read from the underlying io.Reader may be more than this offset due to internal buffering effects.

func (*Decoder) Options

func (d *Decoder) Options() Options

Options returns the options used to construct the encoder and may additionally contain semantic options passed to a encoding/json/v2.UnmarshalDecode call.

If operating within a encoding/json/v2.UnmarshalerFrom.UnmarshalJSONFrom method call or a encoding/json/v2.UnmarshalFromFunc function call, then the returned options are only valid within the call.

func (*Decoder) PeekKind

func (d *Decoder) PeekKind() Kind

PeekKind retrieves the next token kind, but does not advance the read offset.

It returns KindInvalid if an error occurs. Any such error is cached until the next read call and it is the caller's responsibility to eventually follow up a PeekKind call with a read call.

func (*Decoder) ReadToken

func (d *Decoder) ReadToken() (Token, error)

ReadToken reads the next Token, advancing the read offset. The returned token is only valid until the next Peek, Read, or Skip call. It returns io.EOF if there are no more tokens.

func (*Decoder) ReadValue

func (d *Decoder) ReadValue() (Value, error)

ReadValue returns the next raw JSON value, advancing the read offset. The value is stripped of any leading or trailing whitespace and contains the exact bytes of the input, which may contain invalid UTF-8 if AllowInvalidUTF8 is specified.

The returned value is only valid until the next Peek, Read, or Skip call and may not be mutated while the Decoder remains in use. If the decoder is currently at the end token for an object or array, then it reports a SyntacticError and the internal state remains unchanged. It returns io.EOF if there are no more values.

func (*Decoder) Reset

func (d *Decoder) Reset(r io.Reader, opts ...Options)

Reset resets a decoder such that it is reading afresh from r and configured with the provided options. Reset must not be called on an a Decoder passed to the encoding/json/v2.UnmarshalerFrom.UnmarshalJSONFrom method or the encoding/json/v2.UnmarshalFromFunc function.

func (*Decoder) SkipValue

func (d *Decoder) SkipValue() error

SkipValue is semantically equivalent to calling Decoder.ReadValue and discarding the result except that memory is not wasted trying to hold the entire result.

func (*Decoder) StackDepth

func (d *Decoder) StackDepth() int

StackDepth returns the depth of the state machine for read JSON data. Each level on the stack represents a nested JSON object or array. It is incremented whenever an BeginObject or BeginArray token is encountered and decremented whenever an EndObject or EndArray token is encountered. The depth is zero-indexed, where zero represents the top-level JSON value.

func (*Decoder) StackIndex

func (d *Decoder) StackIndex(i int) (Kind, int64)

StackIndex returns information about the specified stack level. It must be a number between 0 and Decoder.StackDepth, inclusive. For each level, it reports the kind:

It also reports the length of that JSON object or array. Each name and value in a JSON object is counted separately, so the effective number of members would be half the length. A complete JSON object must have an even length.

func (*Decoder) StackPointer

func (d *Decoder) StackPointer() Pointer

StackPointer returns a JSON Pointer (RFC 6901) to the most recently read value.

func (*Decoder) UnreadBuffer

func (d *Decoder) UnreadBuffer() []byte

UnreadBuffer returns the data remaining in the unread buffer, which may contain zero or more bytes. The returned buffer must not be mutated while Decoder continues to be used. The buffer contents are valid until the next Peek, Read, or Skip call.

type Encoder

type Encoder struct {
	// contains filtered or unexported fields
}

Encoder is a streaming encoder from raw JSON tokens and values. It is used to write a stream of top-level JSON values, each terminated with a newline character.

Encoder.WriteToken and Encoder.WriteValue calls may be interleaved. For example, the following JSON value:

{"name":"value","array":[null,false,true,3.14159],"object":{"k":"v"}}

can be composed with the following calls (ignoring errors for brevity):

e.WriteToken(BeginObject)        // {
e.WriteToken(String("name"))     // "name"
e.WriteToken(String("value"))    // "value"
e.WriteValue(Value(`"array"`))   // "array"
e.WriteToken(BeginArray)         // [
e.WriteToken(Null)               // null
e.WriteToken(False)              // false
e.WriteValue(Value("true"))      // true
e.WriteToken(Float(3.14159))     // 3.14159
e.WriteToken(EndArray)           // ]
e.WriteValue(Value(`"object"`))  // "object"
e.WriteValue(Value(`{"k":"v"}`)) // {"k":"v"}
e.WriteToken(EndObject)          // }

The above is one of many possible sequence of calls and may not represent the most sensible method to call for any given token/value. For example, it is probably more common to call Encoder.WriteToken with a string for object names.

func NewEncoder

func NewEncoder(w io.Writer, opts ...Options) *Encoder

NewEncoder constructs a new streaming encoder writing to w configured with the provided options. It flushes the internal buffer when the buffer is sufficiently full or when a top-level value has been written.

If w is a bytes.Buffer, then the encoder appends directly into the buffer without copying the contents from an intermediate buffer.

func (*Encoder) AvailableBuffer

func (e *Encoder) AvailableBuffer() []byte

AvailableBuffer returns a zero-length buffer with a possible non-zero capacity. This buffer is intended to be used to populate a Value being passed to an immediately succeeding Encoder.WriteValue call.

Example usage:

b := d.AvailableBuffer()
b = append(b, '"')
b = appendString(b, v) // append the string formatting of v
b = append(b, '"')
... := d.WriteValue(b)

It is the user's responsibility to ensure that the value is valid JSON.

func (*Encoder) Options

func (e *Encoder) Options() Options

Options returns the options used to construct the decoder and may additionally contain semantic options passed to a encoding/json/v2.MarshalEncode call.

If operating within a encoding/json/v2.MarshalerTo.MarshalJSONTo method call or a encoding/json/v2.MarshalToFunc function call, then the returned options are only valid within the call.

func (*Encoder) OutputOffset

func (e *Encoder) OutputOffset() int64

OutputOffset returns the current output byte offset. It gives the location of the next byte immediately after the most recently written token or value. The number of bytes actually written to the underlying io.Writer may be less than this offset due to internal buffering effects.

func (*Encoder) Reset

func (e *Encoder) Reset(w io.Writer, opts ...Options)

Reset resets an encoder such that it is writing afresh to w and configured with the provided options. Reset must not be called on a Encoder passed to the encoding/json/v2.MarshalerTo.MarshalJSONTo method or the encoding/json/v2.MarshalToFunc function.

func (*Encoder) StackDepth

func (e *Encoder) StackDepth() int

StackDepth returns the depth of the state machine for written JSON data. Each level on the stack represents a nested JSON object or array. It is incremented whenever an BeginObject or BeginArray token is encountered and decremented whenever an EndObject or EndArray token is encountered. The depth is zero-indexed, where zero represents the top-level JSON value.

func (*Encoder) StackIndex

func (e *Encoder) StackIndex(i int) (Kind, int64)

StackIndex returns information about the specified stack level. It must be a number between 0 and Encoder.StackDepth, inclusive. For each level, it reports the kind:

It also reports the length of that JSON object or array. Each name and value in a JSON object is counted separately, so the effective number of members would be half the length. A complete JSON object must have an even length.

func (*Encoder) StackPointer

func (e *Encoder) StackPointer() Pointer

StackPointer returns a JSON Pointer (RFC 6901) to the most recently written value.

func (*Encoder) WriteToken

func (e *Encoder) WriteToken(t Token) error

WriteToken writes the next token and advances the internal write offset.

The provided token kind must be consistent with the JSON grammar. For example, it is an error to provide a number when the encoder is expecting an object name (which is always a string), or to provide an end object delimiter when the encoder is finishing an array. If the provided token is invalid, then it reports a SyntacticError and the internal state remains unchanged. The offset reported in SyntacticError will be relative to the Encoder.OutputOffset.

func (*Encoder) WriteValue

func (e *Encoder) WriteValue(v Value)