| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's ensure that entry seqnums remain stable and monotonic across the
entire runtime of the system, even if local storage is turned off. Let's
do this by maintainer a counter file in /run/ which we mmap() and
wherein we maintain the counter from early-boot on till late shutdown.
This takes inspiration of the kernel-seqnum file we already maintain
like that that tracks which kmsg messages we already processed. In fact,
we reuse the same code for maintaining it.
This should allow the behaviour entry seqnums to be more predictable, in
particular when journal local storage is turned off. Previously, we'd
maintain the seqnum simply by always bumping it to the maximum of the
last written entry seqnum plus one, and the biggest seqnum so far
written to the journal file on disk. If we'd never write a file on disk,
or if no journal file was existing during the initrd→seqnum transition
we'd completely lose the current seqnum position during daemon restarts
(such as the one happening during the switch-root operation).
This also will cause a journal file rotation whenever we try to write to
a journal file with multiple sequence number IDs, so that we know that
from early boot trhough the entire runtime we'll have stable sequence
numbers that do not jump, and thus can be used to determine "lost"
messages.
|
|
|
|
|
| |
Instead of maintaining two different constants, move the constant
to journal-internal.h and share it between files.
|
|
|
|
|
|
| |
The compiler should recognize that these are constant expressions, but
let's better make this explicit, so that the linker can safely share the
initializations all over the place.
|
|
|
|
|
|
| |
systemd-journald is prone to spamming logs if the system gets into
a messy state. Let's improve the situation by ratelimiting logs on
the hot code paths to 3 times per minute.
|
|
|
|
|
| |
The _INITRD field is a boolean field (0 or 1) that specifies whether
a message was processed by systemd-journald in the initrd or not.
|
|
|
|
| |
to void type
|
|
|
|
|
| |
JournalFile and JournaldFile are hard to distinguish from each other.
Let's use ManagedJournalFile instead to make the distinction more clear.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, all the logic related to writing journal files lives in
journal-file.c which is part of libsystemd (sd-journal). Because it's
part of libsystemd, we can't depend on any code from src/shared.
To allow using code from src/shared when writing journal files, let's
gradually move the write related logic from journal-file.c to
journald-file.c in src/journal. This directory is not part of libsystemd
and as such can use code from src/shared.
We can safely remove any journal write related logic from libsystemd as
it's not used by any public APIs in libsystemd.
This commit introduces the new file along with the JournaldFile struct
which wraps an instance of JournalFile. The goal is to gradually move
more functions from journal-file.c and fields from JournalFile to
journald-file.c and JournaldFile respectively.
This commit also modifies all call sites that write journal files to
use JournaldFile instead of JournalFile. All sd-journal tests that
write journal files are moved to src/journal so they can make use of
journald-file.c.
Because the deferred closes logic is only used by journald, we move it
out of journal-file.c as well. In journal_file_open(), we would wait for
any remaining deferred closes for the file we're about to open to complete
before continuing if the file was not newly created. In journald_file_open(),
we call this logic unconditionally since it stands that if a file is newly
created, it can't have any outstanding deferred closes.
No changes in behavior are introduced aside from the earlier execution
of waiting for any deferred closes to complete when opening a new journal
file.
|
|\
| |
| | |
journal: Limit the number of audit fields we add to a message
|
| |
| |
| |
| |
| |
| |
| | |
Similar to the kmsg handler, let's also limit the number of fields
we parse from audit messages.
Fixes #19799
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we discarded any kmsg messages coming from journald
itself to avoid infinite loops where potentially the processing
of a kmsg message causes journald to log one or more messages to
kmsg which then get read again by the kmsg handler, ...
However, if we completely disable logging whenever we're processing
a kmsg message coming from journald itself, we also prevent any
infinite loops as we can be sure that journald won't accidentally
generate logging messages while processing a kmsg log message.
This change allows us to store all journald logs generated during
the processing of log messages from other services in the system
journal. Previously these could only be found in kmsg which has
low retention, can't be queried using journalctl and whose logs
don't survive reboots.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We recently started making more use of malloc_usable_size() and rely on
it (see the string_erase() story). Given that we don't really support
sytems where malloc_usable_size() cannot be trusted beyond statistics
anyway, let's go fully in and rework GREEDY_REALLOC() on top of it:
instead of passing around and maintaining the currenly allocated size
everywhere, let's just derive it automatically from
malloc_usable_size().
I am mostly after this for the simplicity this brings. It also brings
minor efficiency improvements I guess, but things become so much nicer
to look at if we can avoid these allocation size variables everywhere.
Note that the malloc_usable_size() man page says relying on it wasn't
"good programming practice", but I think it does this for reasons that
don't apply here: the greedy realloc logic specifically doesn't rely on
the returned extra size, beyond the fact that it is equal or larger than
what was requested.
(This commit was supposed to be a quick patch btw, but apparently we use
the greedy realloc stuff quite a bit across the codebase, so this ends
up touching *a*lot* of code.)
|
|
|
|
|
|
|
|
|
| |
As suggested in https://github.com/systemd/systemd/pull/11484#issuecomment-775288617.
This does not touch anything exposed in src/systemd. Changing the defines there
would be a compatibility break.
Note that tests are broken after this commit. They will be fixed in the next one.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
journald startup
Let's make it optional whether auditing is enabled at journald start-up
or not.
Note that this only controls whether audit is enabled/disabled in the
kernel. Either way we'll still collect the audit data if it is
generated, i.e. if some other tool enables it, we'll collect it.
Fixes: #959
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we do, we operate on a separate set of logs and runtime objects
The namespace is configured via argv[1].
Fixes: #12123
Fixes: #10230 #9519
(These latter two issues ask for slightly different stuff, but the
usecases generally can be solved by running separate instances of
journald now, hence also declaring that as "Fixes:")
|
| |
|
|
|
|
|
|
| |
"ratelimit" is a real word, so we don't need to use the other form anywhere.
We had both forms in various places, let's standarize on the shorter and more
correct one.
|
|
|
|
|
|
| |
This makes the operations previously available via asynchronous signals
also available as regular varlink method calls, i.e. with sane
completion.
|
|
|
|
|
| |
This means we need to include many more headers in various files that simply
included util.h before, but it seems cleaner to do it this way.
|
|
|
|
|
|
|
| |
In normal use, this allow us to drop dead entries from the cache and reduces
the cache size so that we don't evict entries unnecessarily. The time limit is
there mostly to serve as a guard against malicious logging from many different
PIDs.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
|
|
|
|
|
|
|
|
|
|
|
| |
This makes most header files easier to look at. Also Emacs gets really
slow when browsing through large sections of overly long prototypes,
which is much improved by this macro.
We should probably not do something similar with too many other cases,
as macros like this might help readability for some, but make it worse
for others. But I think given the complexity of this specific prototype
and how often we use it, it's worth doing.
|
|
|
|
|
|
|
|
|
|
| |
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
|
|
|
|
|
| |
Allow a user to set a number of bytes as Compress to use as the compression
threshold.
|
|
|
|
|
| |
Let's employ coccinelle to do this for us.
Follow-up for #7625.
|
|
|
|
|
|
|
|
|
|
| |
N_IOVEC_OBJECT_FIELDS is bumped 14 → 18 (see dispatch_message_real() and
count!)
N_IOVEC_PAYLOAD_FIELDS is bumped 15 → 16 (see
server_space_usage_message() and count!)
Also, add comments, to make clear what is what.
|
| |
|
|
|
|
|
| |
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PID 1 to journald
And let's make use of it to implement two new unit settings with it:
1. LogLevelMax= is a new per-unit setting that may be used to configure
log priority filtering: set it to LogLevelMax=notice and only
messages of level "notice" and lower (i.e. more important) will be
processed, all others are dropped.
2. LogExtraFields= is a new per-unit setting for configuring per-unit
journal fields, that are implicitly included in every log record
generated by the unit's processes. It takes field/value pairs in the
form of FOO=BAR.
Also, related to this, one exisiting unit setting is ported to this new
facility:
3. The invocation ID is now pulled from /run/systemd/units/ instead of
cgroupfs xattrs. This substantially relaxes requirements of systemd
on the kernel version and the privileges it runs with (specifically,
cgroupfs xattrs are not available in containers, since they are
stored in kernel memory, and hence are unsafe to permit to lesser
privileged code).
/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.
Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.
This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.
Fixes: #4089
Fixes: #3041
Fixes: #4441
|
|
|
|
|
|
|
|
|
|
|
| |
When we drop messages of a unit, we log about. Let's add some structured
data to that. Let's include how many messages we dropped, but more
importantly, let's link up the message we generate to the unit we
dropped the messages from by using the "OBJECT" logic, i.e. by
generating OBJECT_SYSTEMD_UNIT= fields and suchlike, that "journalctl
-u" and friends already look for.
Fixes: #6494
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
48K (#6838)
This adds a new setting LineMax= to journald.conf, and sets it by
default to 48K. When we convert stream-based stdout/stderr logging into
record-based log entries, read up to the specified amount of bytes
before forcing a line-break.
This also makes three related changes:
- When a NUL byte is read we'll not recognize this as alternative line
break, instead of silently dropping everything after it. (see #4863)
- The reason for a line-break is now encoded in the log record, if it
wasn't a plain newline. Specifically, we distuingish "nul",
"line-max" and "eof", for line breaks due to NUL byte, due to the
maximum line length as configured with LineMax= or due to end of
stream. This data is stored in the new implicit _LINE_BREAK= field.
It's not synthesized for plain \n line breaks.
- A randomized 128bit ID is assigned to each log stream.
With these three changes in place it's (mostly) possible to reconstruct
the original byte streams from log data, as (most) of the context of
the conversion from the byte stream to log records is saved now. (So,
the only bits we still drop are empty lines. Which might be something to
look into in a future change, and which is outside of the scope of this
work)
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=86465
See: #4863
Replaces: #4875
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cache client metadata, in order to be improve runtime behaviour under
pressure.
This is inspired by @vcaputo's work, specifically:
https://github.com/systemd/systemd/pull/2280
That code implements related but different semantics.
For a longer explanation what this change implements please have a look
at the long source comment this patch adds to journald-context.c.
After this commit:
# time bash -c 'dd bs=$((1024*1024)) count=$((1*1024)) if=/dev/urandom | systemd-cat'
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.2783 s, 95.2 MB/s
real 0m11.283s
user 0m0.007s
sys 0m6.216s
Before this commit:
# time bash -c 'dd bs=$((1024*1024)) count=$((1*1024)) if=/dev/urandom | systemd-cat'
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 52.0788 s, 20.6 MB/s
real 0m52.099s
user 0m0.014s
sys 0m7.170s
As side effect, this corrects the journal's rate limiter feature: we now
always use the unit name as key for the ratelimiter.
|
|
|
| |
Closes #6022
|
|
|
|
| |
Fixes #5516.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Embedding sd_id128_t's in constant strings was rather cumbersome. We had
SD_ID128_CONST_STR which returned a const char[], but it had two problems:
- it wasn't possible to statically concatanate this array with a normal string
- gcc wasn't really able to optimize this, and generated code to perform the
"conversion" at runtime.
Because of this, even our own code in coredumpctl wasn't using
SD_ID128_CONST_STR.
Add a new macro to generate a constant string: SD_ID128_MAKE_STR.
It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition
of the numbers, but in practice it is more convenient to use, and allows gcc
to generate smarter code:
$ size .libs/systemd{,-logind,-journald}{.old,}
text data bss dec hex filename
1265204 149564 4808 1419576 15a938 .libs/systemd.old
1260268 149564 4808 1414640 1595f0 .libs/systemd
246805 13852 209 260866 3fb02 .libs/systemd-logind.old
240973 13852 209 255034 3e43a .libs/systemd-logind
146839 4984 34 151857 25131 .libs/systemd-journald.old
146391 4984 34 151409 24f71 .libs/systemd-journald
It is also much easier to check if a certain binary uses a certain MESSAGE_ID:
$ strings .libs/systemd.old|grep MESSAGE_ID
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
$ strings .libs/systemd|grep MESSAGE_ID
MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27
MESSAGE_ID=b07a249cd024414a82dd00cd181378ff
MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7
MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f
MESSAGE_ID=d34d037fff1847e6ae669a370e694725
MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7
MESSAGE_ID=39f53479d3a045ac8e11786248231fbf
MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d
MESSAGE_ID=7b05ebc668384222baa8881179cfda54
MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This changes journald to not write to /var/log/journal until it received
SIGUSR1 for the first time, thus having been requested to flush the runtime
journal to disk.
This makes the journal work nicer with systems which have the root file system
writable early, but still need to rearrange /var before journald should start
writing and creating files to it, for example because ACLs need to be applied
first, or because /var is to be mounted from another file system, NFS or tmpfs
(as is the case for systemd.volatile=state).
Before this change we required setupts with /var split out to mount the root
disk read-only early on, and ship an /etc/fstab that remounted it writable only
after having placed /var at the right place. But even that was racy for various
preparations as journald might end up accessing the file system before it was
entirely set up, as soon as it was writable.
With this change we make scheduling when to start writing to /var/log/journal
explicit. This means persistent mode now requires
systemd-journal-flush.service in the mix to work, as otherwise journald would
never write to the directory.
See: #1397
|
|/
|
|
|
|
| |
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.
Fixes: https://github.com/systemd/systemd/issues/5039
|
|
|
|
|
|
|
|
|
|
| |
Updating min_use is rather an unusual operation that is limited when we first
open the journal files, therefore extracts it from determine_space_for() and
create a function of its own and call this new function when needed.
determine_space_for() is now dealing with storage space (cached) values only.
There should be no functional changes.
|
|
|
|
|
|
|
|
| |
The set of storage space values we cache are calculated according to a couple
of filesystem statistics (free blocks, block size).
This patch caches the vfs stats we're interested in so these values are
available later and coherent with the rest of the space cached values.
|
|
|
|
|
|
|
|
| |
This commit simply extracts from determine_space_for() the code which emits the
storage usage message and put it into a function of its own so it can be reused
by others paths later.
No functional changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This structure keeps track of specificities for a given journal type
(persistent or volatile) such as metrics, name, etc...
The cached space values are now moved in this structure so that each
journal has its own set of cached values.
Previously only one set existed and we didn't know if the cached
values were for the runtime journal or the persistent one.
When doing:
determine_space_for(s, runtime_metrics, ...);
determine_space_for(s, system_metrics, ...);
the second call returned the cached values for the runtime metrics.
|
|
|
|
|
|
|
|
|
|
| |
As soon as we notice that the clock jumps backwards, rotate journal files. This
is beneficial, as this makes sure that the entries in journal files remain
strictly ordered internally, and thus the bisection algorithm applied on it is
not confused.
This should help avoiding borked wallclock-based bisection on journal files as
witnessed in #4278.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
|
|
|
|
|
| |
We are already attaching the system slice information to log messages, now add
theuser slice info too, as well as the object slice info.
|