After the split off and adding jobs, the comment was bit outdated and
out of place, but still useful enough to keep it, but reword it and
move into a more relevant place.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This spawns some jobs, where each is waiting for messages from main
process. The message can be either a number followed by the number of
packages to handle (a batch) or command to shut down when there is no
more packages left to process. On the other hand, job can send a
message to the main process that it is done with the batch and is
ready for the next one. Any other message is printed on the terminal
by the main process.
After the packages are processed, the main process will collect and
merge the job reports into the main one.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This is to fill the silent moment between report generation in SDKs
and the beginning of package updates handling. Also adds missing info
about handling non-package updates.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
There is no functional change, other than the fact that the new
function now uses the bunch of maps to access some package
information. The split off inches us closer towards running the
package handling in multiple jobs.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
The purpose of this struct is to collect all the information that is
needed for handling package updates in one place. It is not really
used right now, but when the package handling is split off into a
separate function, it will come in handy as we can then pass a couple
of parameters to the new function instead of many.
Also, in future the struct will grow, when we add ignoring irrelevant
information in summary stubs or license filtering.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This is a continuation of passing the explicit location of an output
directory instead of hardcoding `${REPORTS_DIR}`.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This is a step towards using different output directory in package
handling. This will be needed for the eventual package handling jobs
system, where each job has its own output directory.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This is a step towards using different output directory in package
handling. This will be needed for the eventual package handling jobs
system, where each job has its own output directory.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This is a step towards using different output directory in package
handling. This will be needed for the eventual package handling jobs
system, where each job has its own output directory.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This is a step towards using different output directory in package
handling. This will be needed for the eventual package handling jobs
system, where each job has its own output directory.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This is a step towards using different output directory in package
handling. This will be needed for the eventual package handling jobs
system, where each job has its own output directory.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This is a step towards using different output directory in package
handling. This will be needed for the eventual package handling jobs
system, where each job has its own output directory.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This is a step towards using different output directory in package
handling. This will be needed for the eventual package handling jobs
system, where each job has its own output directory.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This is a step towards using different output directory in package
handling. This will be needed for the eventual package handling jobs
system, where each job has its own output directory.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
The slots were only used to repeatedly generate the same path to a
directory where the package ebuild diff is saved. So instead, generate
the output paths somewhere in outer scope, put them into a struct and
pass that around. That means that:
- We pass one parameter less (a name of a struct instead of two
slots).
- We can make it easier to change the output directory later (changing
it in a function like update_dir or update_dir_non_slot may affect
locations we didn't want to change, whereas changing the value in
struct scopes the affected areas). This will come in handy later,
when we put package update handling into jobs, where each job will
have its own output directory.
This does not remove the repeated generation of the paths, but it is a
first step.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
This will come in handy for spawning jobs for handling package
updates. Since we don't want to spawn as many jobs as there are
packages, then limiting ourselves to the job count matching the
processor or core count sounds like a better idea.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
We can run report generation for old and new in parallel in two
separate processes. Ought to be a bit less of wait.
This is more or less straightforward parallelization, since there are
only two jobs running. The only thing that needs taking care of is
forwarding job's output to the terminal and handling job failures.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
The library will be used for running emerge report and package update
report generation in separate processes to make them faster.
I initially wanted to use the relatively unknown feature of bash named
coprocs, but it was an unfinished feature as of bash 5.2, so I decided
to write my own then.
The library is rather basic - allows to fork a subprocess that will
run some bash function, communicate with it using subprocesses'
standard input/output, and reap the subprocess.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com>
Occurences file shows where the package name shows up in the
repository. It tries to be smart, so that checking for sys-devel/gcc
will not be showing sys-devel/gcc-config. But the smart check was
flawed as it ignored the forms like sys-devel/gcc-${PV}. Noticed when
trying to check occurences for sys-libs/libsepol and there were not
enough occurences shown.
Two warnings, SC2034 and SC2178, pop up very often with the references
- shellcheck handles them poorly and produces a ton of bogus warnings
about them. Silence the warnings and drop most of the "shellcheck
disable" clauses.
That way shellcheck sources some prepared files and learns about some
variables the sourced files define. Thanks to that, we can remove some
of the "shellcheck disable" clauses.
Reports generation used to be executed four times. The number of runs
was a result of cartesian product of two sets - old and new state, and
of amd64 and arm64 architectures. It was pretty much a slow process
because egencache was called implicitly four times, and it was running
in a single-threaded fashion, and also SDK reports were duplicated
(they were the same for old-amd64 and old-arm64, and the same for
new-amd64 and new-arm64 runs).
This changes the generation, so it is being run only two times - once
for old state and once for new state. Every run generates SDK packages
reports and per-architecture board packages reports. Egencache will
now utilize more threads too.
There used to be a possibility to override used SDK image per
architecture, but the need for it disappeared once SDK images started
to contain the initial form of board rootfs for both amd64 and
arm64. This eliminated problems with cyclic dependencies errors
popping up while gather the package reports. So with this change it is
now only possible to specify just one SDK image to use for any
arch. This feature is not used all that often anyway.
This adds an explicit generation of md5-metadata cache before any we
do any emerge invocations. That way we can have a copy of reports even
if emerge fails for some reason. But mostly the reason for this
copying is to consume the data later, outside the SDK container.
Declaring structs differs a bit from declaring typical variables in
that it takes one initializer and applies it to all the declared
variables.
Will be used a lot by upcoming libraries.
Some upcoming libraries will use this for their global variables. The
function is using a single counter, which ensures that the generated
names will be globally unique.
These tools I found useful when I had to investigate why report
generation failed. Since report generation failed, I had no reports
about ebuild diffs or package occurences. The new scripts,
`diff_pkg.sh` and `occurences.sh`, allowed me to get these reports for
the troublesome package.
The coreos-devel/sdk-depends is a metapackage that is emerged in
stage4 of the catalyst SDK build. But the fsscript that is being run
at the end of the stage4 build also pulls in dev-lang/rust package.
Thus pull both packages for the report.
This should result in updated dev-lang/rust to appear in the reports.
eclass is a file, as opposed to the package, which is a directory. So
doing `git -C eclass/foo.eclass log -- .` will fail because we can't
do "cd eclass/foo.eclass", which is what `-C` is trying to do.