- Do not read counters with table entries for Barefoot drivers
- If driver behavior setup fails, log which operation we are aborting
- Remove unnecessary setup steps in Stratum-related drivers
- Always get clients by their key in gRPC-based drivers
- Log when P4Runtime group operation fails because of missing group in
store
- Fix polling of table entry counters for P4Runtime driver
Change-Id: Ic9bf19b76d8cb5a191aec24852af4410fea8b998
- Workaround for PI bug that ignores max_group_size
- Use max_group_size and not buckets size when translating groups
Change-Id: Id12a12311b20ca8fb4e785e1c5a4f0f4215d1bbf
Implementations of P4Runtime like Stratum are expected to persist table
entries and other data plane state across reboots.
Change-Id: I4395e9e60b395bfca85c71c9d3bc604a2269a3ce
The fix is simple: when cleaning up inconsistent entries from device,
we update the mirror when the response is received, and not before
sending the request. Otherwise, if delete goes wrong, writes happening
right after reconciliation cycle might find an inconsistent mirror state.
When writing entries (e.g. apply group/flow rule) we keep updating the
mirror before sending the request to handled the case of back-to-back
writes.
Change-Id: I9e1cc5cac3f8746c67e93e2cee17aff78d3f1d7e
This patch also introduces a new driver property flag to indicate
whether a P4Runtime target supports default table entries or not.
Stratum targets built against the current version of p4lang/PI do not.
Change-Id: I1fbb57521516bee99057319ed1695cb05b68ee7c
This is achieved by optimistically updating the P4Runtime mirror using
the write request (instead of waiting for a response) and by serializing
building write requests for the same device.
This change requires updating the P4Runtime protocol classes to expose
the content of the write request.
It also includes:
- force member weight to 1 when reading groups (some server
implementation still fails to be compliant to the spec)
- remove unused operation timeout handling in GDP (now all RPCz have a
timeout)
Change-Id: Ib4f99a6085c1283f46a2797e0c883d96954e02e9
This (big) change aims at solving the issue observed with mastership flapping
and device connection/disconnection with P4Runtime.
Channel handling is now based on the underlying gRPC channel state. Before,
channel events (open/close/error) were generated as a consequence of P4Runtime
StreamChannel events, making device availability dependent on mastership. Now
Stream Channel events only affect mastership (MASTER/STANDBY or NONE when the
SteamChannel RPC is not active).
Mastership handling has been refactored to generate P4Runtime election IDs that
are compatible with the mastership preference decided by the MastershipService.
GeneralDeviceProvider has been re-implemented to support in-order
device event processing and to reduce implementation complexity. Stats polling
has been moved to a separate component, and netcfg handling updated to only
depend on BasicDeviceConfig, augmented with a pipeconf field, and re-using the
managementAddress field to set the gRPC server endpoints (e.g.
grpc://myswitch.local:50051). Before it was depending on 3 different config
classes, making hard to detect changes.
Finally, this change affects some core interfaces:
- Adds a method to DeviceProvider and DeviceHandshaker to check for device
availability, making the meaning of availability device-specific. This is needed
in cases where the device manager needs to change the availability state of a
device (as in change #20842)
- Support device providers not capable of reconciling mastership role responses
with requests (like P4Runtime).
- Clarify the meaning of "connection" in the DeviceConnect behavior.
- Allows driver-based providers to check devices for reachability and
availability without probing the device via the network.
Change-Id: I7ff30d29f5d02ad938e3171536e54ae2916629a2
The new client API supports batching and provides detailed response for
write requests (e.g. if entity already exists when inserting), which was
not possible with the old one.
This patch includes:
- New more efficient implementation of P4RuntimeClient (no more locking,
use native gRPC executor, use stub deadlines)
- Ported all codecs to new AbstractCodec-based implementation (needed to
implement codec cache in the future)
- Uses batching in P4RuntimeFlowRuleProgrammable and
P4RuntimeGroupActionProgrammable
- Minor changes to PI framework runtime classes
Change-Id: I3fac42057bb4e1389d761006a32600c786598683
Also includes:
- New abstract P4Runtime codec implementation. Currently used for action
profile members/groups encoding/deconding, the plan is to handle all
other codecs via this.
- Improved read requests in P4RuntimeClientImpl
- Removed handling of max group size in P4Runtime driver. Instead, added
modified group translator to specify a max group size by using
information from the pipeline model.
Change-Id: I684bae0184d683bb448ba19863c561f9848479d2
Members can exist outside of a group. Previous naming was ambiguous
about this.
Action group -> action profile group
Action group member -> action profile member
Change-Id: I5097e92253353d355b864e689f9653df2d318230
Includes also other minor changes to gRPC channel creation/connection
process, such as:
- More compact logs showing the gRPC client key
- GrpcChannelController.connectChannel() now returns the same
StatusRuntime exception, no need to wrap it in an IOException
- Wait for channel shutdown after initial connection error
Change-Id: Ib7d2b728b8c82d9f9b2097cffcebd31cac891b27
This patch affects both the P4 pipeline implementation and the
Java pipeconf.
P4 PIPELINE
- Less tables and smarter use of metadata to reduce inter-tables
dependencies and favor parallel execution of tables.
- Removed unused actions / renamed existing ones to make forwarding
behavior clearer (e.g. ingress_port_vlan table)
- Remove co-existence of simple and hansed table. Hashed should be the
default one, but implementations that do not support action profiles
might compile fabric.p4 to use the simple one.
- Use @name annotations for match fields to make control plane
independent of table implementation.
- Use @hidden to avoid showing actions and table on the p4info that
cannot be controlled at runtime.
- First attempt to support double VLAN cross-connect (xconnect table).
- New design has been tested with "fabric-refactoring" branch of
fabric-p4test:
github.com/opennetworkinglab/fabric-p4test/tree/fabric-refactoring
JAVA PIPECONF
This patch brings a major refactoring that reflects the experience
gathered in the past months of working on fabric.p4 and reasoning on its
pipeconf implementation. Indeed, the FlowObjective API is
under-specified and sometimes ambiguous which makes the process of
creating and maintaining a pipeliner implementation tedious. This
refactoring brings a simplified implementation by removing unused/
unnecessary functionalities and by recognizing commonality when possible
(e.g. by means of abstract and utility classes). It also makes design
patterns more explicit and consistent. Overall, the goal is to reduce
technical debt and to make it easier to support new features as we
evolve fabric.p4
Changes include:
- Changes in pipeliner/interpreter to reflect new pipeline design.
- By default translate objective treatment to PiAction. This favors
debuggability of flow rules in ONOS.
- Support new NextObjective’s NextTreatment class.
- Remove lots of unused/unnecessary code (e.g. async callback handling
for pending objective install status in pipeliner as current
implementation was always returning success)
- Gather commonality in abstract classes and simplify implementation
for objective translator (filtering, forwarding, next)
- New implementation of ForwardingFunctionTypes (FFT) that looks at
criterion instance values along with their types (to avoid relying on
case-specific if-else conditions to recognize variants of an FFT)
- Adaptive translation of NextObjective based on presence of simple or
hashed table.
- Support DENY FilteringObjective
Also:
- Fix onos-p4-gen-constants to avoid generating conflicting
PiMatchFieldId variable names.
- Install Graphviz tools in p4vm to generate p4c graphs
- Generate p4c graphs by default when compiling fabric.p4
- Use more compact Hex string when printing PI values
Change-Id: Ife79e44054dc5bc48833f95d0551a7370150eac5
This patch solves the PENDING_UPDATE and PENDING_ADD_RETRY issue
observed on the ONS EU topology.
The P4Runtime action profile group handling has been re-implemented to
be robust against inconsistencies of the device mirror, which is now
periodically synchronized with the device state. Similarly, we implement
a routine in the P4RuntimeClient to cleanup unused action profile
members.
This patch includes also:
- Refactor PI handle classes to allow creating handles without the
entity instance
- Use list instead of collections in P4RuntimeClient methods, as order
of updates sent and/or entities received from the device is important
Change-Id: I2e7964ce90f43d66680131b47ab52aca32ab55d2
Spec says:
the default entry for a table is always set. It can be set at
compile-time by the P4 programmer - or defaults to NoAction (which is a
no-op) otherwise - and assuming it is not declared as const, can be
modified by the P4Runtime client. Because the default entry is always
set, we do not allow INSERT and DELETE updates on the default entry and
the P4Runtime server must return an INVALID_ARGUMENT error code if the
client attempts one.
With this patch we convert insert or delete operations into modify ones
(unless specified by a driver property, to support non-compliant devices).
For delete, we use the interpreter to suggest a default action that is
the same as the one when the pipeline was originally deployed.
Also, we introduce the capability of synchronizing the device mirror
with the device state.
Change-Id: I3758fc11780eb0f1cf4ed5a295bd98b54b182e29
Since we cannot read the device state, try to delete and re-insert if
insert or modify operation fails.
Change-Id: I35733c7e0af509317e92f991978c0a4ef36b9bc8
This patch is an attempt to solve issues observed when restarting both
switches and ONOS nodes. Most of the issues seemed to depend on a
brittle mastership handling when deploying the pipeline.
With this patch, GDP registers devices to the core with available=false
(i.e. offline) and marks them online only when the P4 pipeline has been
deployed to the device. A new PiPipeconfWatchdogService takes care of
deploying pipelines and producing event when devices are ready.
Moreover, we fix a race condition where pipeconf-related behaviors
were not found. This was caused by GDP enforcing the merged
driver name in the network config, while external entities (e.g.
Mininet) were pushing a JSON blob with the base driver name. This patch
removes the need to rely on such a trick and instead uses
pipeconf-aware logic directly in the driver manager (change #19622).
Finally, we fix issues in P4RuntimeClientImpl that were causing the
stream channel not detecting unreachable devices. The solution is to
follow gRPC APIs and re-instantiate a new channel once the first fails.
Change-Id: I6fbc91859c0fb58a6db3bc197b7081a8fe9f97f7
- NPE when removing agent listeners
- Don't block netcfg event dispatch thread on GDP
- Avoid unnecessary warn logs during disconnection in GDP
Change-Id: I612f7f7914579eea9ba393e952377a3933d92e8d
Most notably, we fix a bug in which some nodes were not able to find
pipeconf-specific behaviors for a given device. The problem is not
completelly solved but it's mitigated.
There's a race condition caused by the fact that the GDP updates the cfg
with the merged driver name before advertising the device to the core.
Some nodes might receive the cfg update after the device has been
advertised. We mitigate the problem by performing the pipeline deploy
(slow operation) after the cfg update, giving more time for nodes
to catch up. Perhaps we should listen for cfg update events before
advertising the device to the core?
Also:
- NPE when getting P4Runtime client
- Detect if a base driver is already merged in pipeconf manager
- Longer timeouts in P4Runtime driver and protocol (for slow networks)
- Configurable timeout in P4Runtime driver and GDP
- NPE when adding/removing device agent listeners in P4Rtunime handshaker
- Various exceptions due to race conditions in GDP when disconnecting
devices (by serializing disconnect tasks per device)
- NPE when cancelling polling tasks in GDP
- Refactored PipeconfService to distinguish between driver merge,
pipeconf map update, and cfg update (now performed in the GDP)
- Fixed PipeconfManagerTest, not testing driver behaviours
- Use Guava striped locks when possible (more memory-efficient than maps,
and with strict atomicity guarantees w.r.t. to caches).
Change-Id: I30f3887541ba0fd44439a86885e9821ac565b64c
The P4Runtime client was hanging (deadlock) on a master arbitration
request. As such, all other requests (e.g. table write) were waiting
for the client's request lock to become available.
Apart from fixing those deadlocks, this patch brings a number of
improvements that all together allow to run networks of 100+ P4Runtime
devices on a single ONOS instance (before only ~20 devices)
Includes:
- Asynchrounous mastership handling in DevicHandshaker (as defined in
the P4Runtime and OpenFlow spec)
- Refactored arbitration handling in the P4RuntimeClient
to be consistent with the P4Runtime spec
- Report suspect deadlocks in P4RuntimeClientImpl
- Exploit write errors in P4RuntimeClient to quickly report
channel/mastership errors to upper layers
- Complete all futures with deadlines in P4Runtime driver
- Dump all tables in one request
- Re-purposed ChannelEvent to DeviceAgentEvent to carry also mastership
response events
- Fixed IntelliJ warnings
- Various code and log clean-ups
Change-Id: I9376793a9fe69d8eddf7e8ac2ef0ee4c14fbd198