JSONReader
A simple JSON data loader with various options. Either parses the entire string, cleaning it and treat each line as an embedding or performs a recursive depth-first traversal yielding JSON paths.
Usage
import { JSONReader } from "llamaindex";
const file = "../../PATH/TO/FILE";
const content = new TextEncoder().encode("JSON_CONTENT");
const reader = new JSONReader({ levelsBack: 0, collapseLength: 100 });
const docsFromFile = reader.loadData(file);
const docsFromContent = reader.loadDataAsContent(content);
Options
Basic:
-
ensureAscii?
: Wether to ensure only ASCII characters be present in the output by converting non-ASCII characters to their unicode escape sequence. Default isfalse
. -
isJsonLines?
: Wether the JSON is in JSON Lines format. If true, will split into lines, remove empty one and parse each line as JSON. Default isfalse
-
cleanJson?
: Whether to clean the JSON by filtering out structural characters ({}, [], and ,
). If set to false, it will just parse the JSON, not removing structural characters. Default istrue
.
Depth-First-Traversal:
-
levelsBack?
: Specifies how many levels up the JSON structure to include in the output.cleanJson
will be ignored. If set to 0, all levels are included. If undefined, parses the entire JSON, treat each line as an embedding and create a document per top-level array. Default isundefined
-
collapseLength?
: The maximum length of JSON string representation to be collapsed into a single line. Only applicable whenlevelsBack
is set. Default isundefined
Examples
Input:
{"a": {"1": {"key1": "value1"}, "2": {"key2": "value2"}}, "b": {"3": {"k3": "v3"}, "4": {"k4": "v4"}}}
Default options:
LevelsBack
= undefined
& cleanJson
= true
Output:
"a": {
"1": {
"key1": "value1"
"2": {
"key2": "value2"
"b": {
"3": {
"k3": "v3"
"4": {
"k4": "v4"
Depth-First Traversal all levels:
levelsBack
= 0
Output:
a 1 key1 value1
a 2 key2 value2
b 3 k3 v3
b 4 k4 v4
Depth-First Traversal and Collapse:
levelsBack
= 0
& collapseLength
= 35
Output:
a 1 {"key1":"value1"}
a 2 {"key2":"value2"}
b {"3":{"k3":"v3"},"4":{"k4":"v4"}}
Depth-First Traversal limited levels:
levelsBack
= 2
Output:
1 key1 value1
2 key2 value2
3 k3 v3
4 k4 v4
Uncleaned JSON:
levelsBack
= undefined
& cleanJson
= false
Output:
{"a":{"1":{"key1":"value1"},"2":{"key2":"value2"}},"b":{"3":{"k3":"v3"},"4":{"k4":"v4"}}}
ASCII-Conversion:
Input:
{ "message": "こんにちは世界" }
Output:
"message": "\u3053\u3093\u306b\u3061\u306f\u4e16\u754c"
JSON Lines Format:
Input:
{"tweet": "Hello world"}\n{"tweet": "こんにちは世界"}
Output:
"tweet": "Hello world"
"tweet": "こんにちは世界"