This patch solves the PENDING_UPDATE and PENDING_ADD_RETRY issue
observed on the ONS EU topology.
The P4Runtime action profile group handling has been re-implemented to
be robust against inconsistencies of the device mirror, which is now
periodically synchronized with the device state. Similarly, we implement
a routine in the P4RuntimeClient to cleanup unused action profile
members.
This patch includes also:
- Refactor PI handle classes to allow creating handles without the
entity instance
- Use list instead of collections in P4RuntimeClient methods, as order
of updates sent and/or entities received from the device is important
Change-Id: I2e7964ce90f43d66680131b47ab52aca32ab55d2
Spec says:
the default entry for a table is always set. It can be set at
compile-time by the P4 programmer - or defaults to NoAction (which is a
no-op) otherwise - and assuming it is not declared as const, can be
modified by the P4Runtime client. Because the default entry is always
set, we do not allow INSERT and DELETE updates on the default entry and
the P4Runtime server must return an INVALID_ARGUMENT error code if the
client attempts one.
With this patch we convert insert or delete operations into modify ones
(unless specified by a driver property, to support non-compliant devices).
For delete, we use the interpreter to suggest a default action that is
the same as the one when the pipeline was originally deployed.
Also, we introduce the capability of synchronizing the device mirror
with the device state.
Change-Id: I3758fc11780eb0f1cf4ed5a295bd98b54b182e29
Also reduces default pipeconf watchdog probe interval
Also fixes NPE on GDP when device is unreachable
Change-Id: Ie2fe1874b0883a037596d9a555a2f8cc030a55a6
(cherry picked from commit d797e2c28505fbdbb8597038ebcf977a053bae72)
This patch is an attempt to solve issues observed when restarting both
switches and ONOS nodes. Most of the issues seemed to depend on a
brittle mastership handling when deploying the pipeline.
With this patch, GDP registers devices to the core with available=false
(i.e. offline) and marks them online only when the P4 pipeline has been
deployed to the device. A new PiPipeconfWatchdogService takes care of
deploying pipelines and producing event when devices are ready.
Moreover, we fix a race condition where pipeconf-related behaviors
were not found. This was caused by GDP enforcing the merged
driver name in the network config, while external entities (e.g.
Mininet) were pushing a JSON blob with the base driver name. This patch
removes the need to rely on such a trick and instead uses
pipeconf-aware logic directly in the driver manager (change #19622).
Finally, we fix issues in P4RuntimeClientImpl that were causing the
stream channel not detecting unreachable devices. The solution is to
follow gRPC APIs and re-instantiate a new channel once the first fails.
Change-Id: I6fbc91859c0fb58a6db3bc197b7081a8fe9f97f7
A convenient macro for packaging together all proto and gRPC libraries
in an OSGi jar is provided. Also re-packaging of gRPC core (to avoid OSGi
split problem) is simplified by depending on a patched fork of grpc-java.
Change-Id: Idb79a5bea8ae0bc57b146bda1fc47a4568d12c60
- NPE when removing agent listeners
- Don't block netcfg event dispatch thread on GDP
- Avoid unnecessary warn logs during disconnection in GDP
Change-Id: I612f7f7914579eea9ba393e952377a3933d92e8d
Most notably, we fix a bug in which some nodes were not able to find
pipeconf-specific behaviors for a given device. The problem is not
completelly solved but it's mitigated.
There's a race condition caused by the fact that the GDP updates the cfg
with the merged driver name before advertising the device to the core.
Some nodes might receive the cfg update after the device has been
advertised. We mitigate the problem by performing the pipeline deploy
(slow operation) after the cfg update, giving more time for nodes
to catch up. Perhaps we should listen for cfg update events before
advertising the device to the core?
Also:
- NPE when getting P4Runtime client
- Detect if a base driver is already merged in pipeconf manager
- Longer timeouts in P4Runtime driver and protocol (for slow networks)
- Configurable timeout in P4Runtime driver and GDP
- NPE when adding/removing device agent listeners in P4Rtunime handshaker
- Various exceptions due to race conditions in GDP when disconnecting
devices (by serializing disconnect tasks per device)
- NPE when cancelling polling tasks in GDP
- Refactored PipeconfService to distinguish between driver merge,
pipeconf map update, and cfg update (now performed in the GDP)
- Fixed PipeconfManagerTest, not testing driver behaviours
- Use Guava striped locks when possible (more memory-efficient than maps,
and with strict atomicity guarantees w.r.t. to caches).
Change-Id: I30f3887541ba0fd44439a86885e9821ac565b64c
The P4Runtime client was hanging (deadlock) on a master arbitration
request. As such, all other requests (e.g. table write) were waiting
for the client's request lock to become available.
Apart from fixing those deadlocks, this patch brings a number of
improvements that all together allow to run networks of 100+ P4Runtime
devices on a single ONOS instance (before only ~20 devices)
Includes:
- Asynchrounous mastership handling in DevicHandshaker (as defined in
the P4Runtime and OpenFlow spec)
- Refactored arbitration handling in the P4RuntimeClient
to be consistent with the P4Runtime spec
- Report suspect deadlocks in P4RuntimeClientImpl
- Exploit write errors in P4RuntimeClient to quickly report
channel/mastership errors to upper layers
- Complete all futures with deadlines in P4Runtime driver
- Dump all tables in one request
- Re-purposed ChannelEvent to DeviceAgentEvent to carry also mastership
response events
- Fixed IntelliJ warnings
- Various code and log clean-ups
Change-Id: I9376793a9fe69d8eddf7e8ac2ef0ee4c14fbd198
Includes:
- Use new P4Runtime "v1" package names
- Removed VALID match
- New table entry priority spec (1 is min priority, not 0)
- Fixed p4c-bm2-ss to include arch flag
- Re-compiled P4 programs with more recent p4c (with updated p4info)
Change-Id: I05908f40eda0f0c755009268fd261fb8bcc9be35
Includes fixes for:
- ONOS-7593: Support for indirect resource Index type
- ONOS-7595: Removed ID from direct resources
- P4Runtime requires unset bits to be 0 in ternary field matches
- Incorrect parsing of flow rule byte counters
- Full entity names in P4Info with top-level control block (fixed only
for basic.p4, other programs need to be re-compiled and PI IDs in
respective pipeconf changed)
Change-Id: Ia19aa949c02e363a550e692915c6d6516a2d13d7