This should often not matter much, but in b/412468630 it was cleary
visible that child creation order can skew load for the first batch of
RPCs. This doesn't solve all the cases, as further-away backends will
still be less likely chosen initially and it is ignorant of the LB
policy. But this doesn't impact correctness, is easy, and is one fewer
cases to worry about.
TimeProvider provides wall time. That can move forward and backward as time is adjusted. OutlierDetection is measuring durations, so it should use a monotonic clock.
Fixes#11622
I think at some point there were more usages in the tests. But now it
is pretty easy.
PriorityLb.ChildLbState.picker is initialized to
FixedResultPicker(NoResult). So now that GracefulSwitchLb is using the
same picker, equals() is able to de-dup an update.
Previously it would wait for the new LB to enter READY. However, that
prevents there being an upper-bound on how long the old policy will
continue to be used. The point of graceful switch is to avoid RPCs
seeing increased latency when we swap config. We don't want it to
prevent the system from becoming eventually consistent.
We want to move away from handleResolvedAddresses(). These are "easy" in
that they need no logic. LBs extending ForwardingLoadBalancer had the
method duplicated from handleResolvedAddresses() and swapped away from
`super` because ForwardingLoadBalancer only forwards
handleResolvedAddresses() reliably today. Duplicating small methods was
less bug-prone than dealing with ForwardingLoadBalancer.
A failing Status from acceptResolvedAddresses means something is wrong
with the config, but parts of the config may still have been applied.
Thus there are now two possible flows: errors that should prevent
updateOverallBalancingState() and errors that should have no effect
other than the return code. To manage that, MultChildLb must always be
responsible for calling updateOverallBalancingState().
acceptResolvedAddressesInternal() was inlined to make that error
processing easier. No existing usages actually needed to have logic
between updating the children and regenerating the picker.
RingHashLb already was verifying that the address list was not empty, so
the short-circuiting when acceptResolvedAddressesInternal() returned an
error was impossible to trigger. WrrLb's updateWeightTask() calls the
last picker, so it can run before acceptResolvedAddressesInternal(); the
only part that matters is re-creating the weightUpdateTimer.
I noticed an old JDK 8u275 failed on the test because the modification
time's resolution was one second. A newer JDK 8u432 worked fine, so it's
not really a problem for me, but increasing the time difference is
cheap. I used two seconds as that's the resolution available on FAT
(which is unlikely to be TMPDIR, even on Windows).
In 61f19d707a I swapped the signatures to use the version catalog. But I
failed to preserve the `@signature` extension and it all seemed to
work... But in fact all the animalsniffer tasks were completing as
SKIPPED as they lacked signatures. The build.gradle changes in this
commit are to fix that while still using version catalog.
But while it was broken violations crept in. Most violations weren't
too important and we're not surprised went unnoticed. For example, Netty
with TLS has long required the Java 8 API
`setEndpointIdentificationAlgorithm()`, so using `Optional` in the same
code path didn't harm anything in particular. I still swapped it to
Guava's `Optional` to avoid overuse of `@IgnoreJRERequirement`.
One important violation has not been fixed and instead I've disabled the
android signature in api/build.gradle for the moment. The violation is
in StatusException using the `fillInStackTrace` overload of Exception.
This problem [had been noticed][PR11066], but we couldn't figure out
what was going on. AnimalSniffer is now noticing this and agreeing with
the internal linter. There is still a question of why our interop tests
failed to notice this, but given they are no longer running on pre-API
level 24, that may forever be a mystery.
[PR11066]: https://github.com/grpc/grpc-java/pull/11066
This is a step toward removing ResolvedAddresses from ChildLbState,
which isn't actually used by MultiChildLb. Most usages of the EAG usages
can be served more directly without peering into MultiChildLb's
internals or even accessing ChildLbStates, which make the tests less
sensitive to implementation changes. Some changes do leverage the new
behavior of MultiChildLb where it preserves the order of the entries.
This does fix an important bug in shutdown tests. The tests looped over
the ChildLbStates after shutdown, but shutdown deleted all the children
so it looped over an entry collection. Fixing that exposed that
deliverSubchannelState() didn't function after shutdown, as the listener
was removed from the map when the subchannel was shut down. Moving the
listener onto the TestSubchannel allowed having access to the listener
even after shutdown.
A few places in LeastRequestLb lines were just deleted, but that's
because an existing assertion already provided the same check but
without digging into MultiChildLb.
A map of children is still needed, but is created temporarily on update.
The order of children is currently preserved, but we could use regular
HashMaps if that is not useful.
Instead of doing a dance of supplementing config so the later
createChildAddressesMap() won't delete children, just look at the
existing children and don't delete any that shouldn't be deleted.
This makes ClusterManagerLB more straight-forward, focusing on just the
things that are relevant to it, and it avoids specialized map key
handling in updateChildrenWithResolvedAddresses().
The child policy config should be refreshed every address update, so it
shouldn't be stored in the ChildLbState. In addition, none of the
current usages actually used what was stored in the ChildLbState in a
meaningful way (it was always null).
ResolvedAddresses was also removed from createChildLbState(), as nothing
in it should be needed for creation; it varies over time and the values
passed at creation are immutable.
Bazel had the dependency added because of #5046, where Guava was
depending on it as compile-only and Bazel build have "unknown enum
constant" warnings. Guava now has a compile dependency on j2objc, so
this workaround is no longer needed. There are currently no version skew
issues in Gradle, which was the only usage.
Since 04474970 RingHashLB has not used
acceptResolvedAddressesInternal(). At the time that was needed because
deactivated children were part of MultiChildLB. But in 9de8e443, the
logic of RingHashLB and MultiChildLB.acceptResolvedAddressesInternal()
converged, so it can now swap back to using the base class for more
logic.
One LB no longer needs to extend ChildLbState and one has to start, so
it is a bit of a wash. There are more LBs that need the auto-request
logic, but if we have an API where subclasses override it without
calling super then we can't change the implementation in the future.
Adding behavior on top of a base class allows subclasses to call super,
which lets the base class change over time.
Some addresses are equal even though their toString is different
(InetSocketAddress ignores the hostname when it has an address). And
some addresses are not equal even though their toString might be the
same (AnonymousInProcessSocketAddress doesn't override toString()).
InetSocketAddress/InetAddress do not cache the toString() result. Thus,
even in the worst case that uses a HashSet, this should use less memory
than the earlier approach, as no strings are formatted. It probably also
significantly improves performance in the reasonably common case when an
Endpoint is created just for looking up a key, because the string
creation in the constructor isn't then amorized.
updateChildrenWithResolvedAddresses(), for example, creates n^2 Endpoint
objects for lookups.
They share very little code, and we really don't want RoundRobinLb to be
public and non-final. Originally, WRR was expected to share much more
code with RR, and even delegated to RR at times. The delegation was
removed in 111ff60e. After dca89b25, most of the sharing has been moved
out into general-purpose tools that can be used by any LB policy.
FixedResultPicker now has equals to makes it as a EmptyPicker
replacement. RoundRobinLb still uses EmptyPicker because fixing its
tests is a larger change. OutlierDetectionLbTest was changed because
FixedResultPicker is used by PickFirstLeafLb, and now RoundRobinLb can
squelch some of its updates for ready pickers.
This is to replace switchTo(), to allow composing GracefulSwitchLb with
other helpers like MultiChildLb. It also prevents users of
GracefulSwitchLb from needing to use ServiceConfigUtil.
This will be used by the metadata exchange of CSM. When recording
per-attempt metrics, we really need per-attempt data and can't leverage
ClientInterceptors.
gRFC A78 has WRR and pick-first include a `grpc.target` label, defined
in A66:
> `grpc.target` : Canonicalized target URI used when creating gRPC
> Channel, e.g. "dns:///pubsub.googleapis.com:443",
> "xds:///helloworld-gke:8000". Canonicalized target URI is the form
> with the scheme included if the user didn't mention the scheme
> (`scheme://[authority]/path`). For channels such as inprocess channels
> where a target URI is not available, implementations can synthesize a
> target URI.
As part of gRFC A78:
> To support the locality label in the per-call metrics, we will provide
> a mechanism for LB picker to add optional labels to the call attempt
> tracer.