summaryrefslogtreecommitdiffstats
path: root/docs/JOURNAL_EXPORT_FORMATS.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/JOURNAL_EXPORT_FORMATS.md')
-rw-r--r--docs/JOURNAL_EXPORT_FORMATS.md36
1 files changed, 28 insertions, 8 deletions
diff --git a/docs/JOURNAL_EXPORT_FORMATS.md b/docs/JOURNAL_EXPORT_FORMATS.md
index e1eb0d36d1..83336784b1 100644
--- a/docs/JOURNAL_EXPORT_FORMATS.md
+++ b/docs/JOURNAL_EXPORT_FORMATS.md
@@ -15,12 +15,24 @@ The binary format on disk is documented as the [Journal File Format](JOURNAL_FIL
_Before reading on, please make sure you are aware of the [basic properties of journal entries](https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html), in particular realize that they may include binary non-text data (though usually don't), and the same field might have multiple values assigned within the same entry (though usually hasn't)._
-When exporting journal data for other uses or transferring it via the network/local IPC the _journal export format_ is used. It's a simple serialization of journal entries, that is easy to read without any special tools, but still binary safe where necessary. The format is like this:
+When exporting journal data for other uses or transferring it via the network/local IPC the _journal export format_ is used.
+It's a simple serialization of journal entries, that is easy to read without any special tools, but still binary safe where necessary.
+The format is like this:
* Two journal entries that follow each other are separated by a double newline.
-* Journal fields consisting only of valid non-control UTF-8 codepoints are serialized as they are (i.e. the field name, followed by '=', followed by field data), followed by a newline as separator to the next field. Note that fields containing newlines cannot be formatted like this. Non-control UTF-8 codepoints are the codepoints with value at or above 32 (' '), or equal to 9 (TAB).
-* Other journal fields are serialized in a special binary safe way: field name, followed by newline, followed by a binary 64-bit little endian size value, followed by the binary field data, followed by a newline as separator to the next field.
-* Entry metadata that is not actually a field is serialized like it was a field, but beginning with two underscores. More specifically, `__CURSOR=`, `__REALTIME_TIMESTAMP=`, `__MONOTONIC_TIMESTAMP=`, `__SEQNUM=`, `__SEQNUM_ID` are introduced this way. Note that these meta-fields are only generated when actual journal files are serialized. They are omitted for entries that do not originate from a journal file (for example because they are transferred for the first time to be stored in one). Or in other words: if you are generating this format you shouldn't care about these special double-underscore fields. But you might find them usable when you deserialize the format generated by us. Additional fields prefixed with two underscores might be added later on, your parser should skip over the fields it does not know.
+* Journal fields consisting only of valid non-control UTF-8 codepoints are serialized as they are
+ (i.e. the field name, followed by '=', followed by field data), followed by a newline as separator to the next field.
+ Note that fields containing newlines cannot be formatted like this.
+ Non-control UTF-8 codepoints are the codepoints with value at or above 32 (' '), or equal to 9 (TAB).
+* Other journal fields are serialized in a special binary safe way:
+ field name, followed by newline, followed by a binary 64-bit little endian size value, followed by the binary field data, followed by a newline as separator to the next field.
+* Entry metadata that is not actually a field is serialized like it was a field, but beginning with two underscores.
+ More specifically, `__CURSOR=`, `__REALTIME_TIMESTAMP=`, `__MONOTONIC_TIMESTAMP=`, `__SEQNUM=`, `__SEQNUM_ID` are introduced this way.
+ Note that these meta-fields are only generated when actual journal files are serialized.
+ They are omitted for entries that do not originate from a journal file (for example because they are transferred for the first time to be stored in one).
+ Or in other words: if you are generating this format you shouldn't care about these special double-underscore fields.
+ But you might find them usable when you deserialize the format generated by us.
+ Additional fields prefixed with two underscores might be added later on, your parser should skip over the fields it does not know.
* The order in which fields appear in an entry is undefined and might be different for each entry that is serialized.
And that's already it.
@@ -130,10 +142,18 @@ _Before reading on, please make sure you are aware of the [basic properties of j
In most cases the Journal JSON serialization is the obvious mapping of the entry field names (as JSON strings) to the entry field values (also as JSON strings) encapsulated in one JSON object. However, there are a few special cases to handle:
-* A field that contains non-printable or non-UTF8 is serialized as a number array instead. This is necessary to handle binary data in a safe way without losing data, since JSON cannot embed binary data natively. Each byte of the binary field will be mapped to its numeric value in the range 0…255.
-* The JSON serializer can optionally skip huge (as in larger than a specific threshold) data fields from the JSON object. If that is enabled and a data field is too large, the field name is still included in the JSON object but assigned _null_.
-* Within the same entry, Journal fields may have multiple values assigned. This is not allowed in JSON. The serializer will hence create a single JSON field only for these cases, and assign it an array of values (which the can be strings, _null_ or number arrays, see above).
-* If the JSON data originates from a journal file it may include the special addressing fields `__CURSOR`, `__REALTIME_TIMESTAMP`, `__MONOTONIC_TIMESTAMP`, `__SEQNUM`, `__SEQNUM_ID`, which contain the cursor string of this entry as string, the realtime/monotonic timestamps of this entry as formatted numeric string of usec since the respective epoch, and the sequence number and associated sequence number ID, both formatted as strings.
+* A field that contains non-printable or non-UTF8 is serialized as a number array instead.
+ This is necessary to handle binary data in a safe way without losing data, since JSON cannot embed binary data natively.
+ Each byte of the binary field will be mapped to its numeric value in the range 0…255.
+* The JSON serializer can optionally skip huge (as in larger than a specific threshold) data fields from the JSON object.
+ If that is enabled and a data field is too large, the field name is still included in the JSON object but assigned _null_.
+* Within the same entry, Journal fields may have multiple values assigned. This is not allowed in JSON.
+ The serializer will hence create a single JSON field only for these cases, and assign it an array of values
+ (which the can be strings, _null_ or number arrays, see above).
+* If the JSON data originates from a journal file it may include the special addressing fields
+ `__CURSOR`, `__REALTIME_TIMESTAMP`, `__MONOTONIC_TIMESTAMP`, `__SEQNUM`, `__SEQNUM_ID`, which contain the cursor string of this entry as string,
+ the realtime/monotonic timestamps of this entry as formatted numeric string of usec since the respective epoch,
+ and the sequence number and associated sequence number ID, both formatted as strings.
Here's an example, illustrating all cases mentioned above. Consider this entry: