Commit Graph

343 Commits

Author SHA1 Message Date
AgraVator daf901b1d5 otel: add LongUpDownCounterMetricInstrument 2025-07-22 23:25:35 +05:30
Patrick Strawderman 80217275db
api: Size Sets and Maps correctly in handling of Metadata values to be exchanged during a call (#12229)
Fix HashSet / HashMap initializations to have sufficient capacity allocated based on the number of keys to be inserted, without which it would always lead to a rehash / resize operation.
2025-07-22 09:14:08 +05:30
Eric Anderson 1fc4ab0bb2 LBs should avoid calling LBs after lb.shutdown()
LoadBalancers shouldn't be called after shutdown(), but RingHashLb could
have enqueued work to the SynchronizationContext that executed after
shutdown(). This commit fixes problems discovered when auditing all LBs
usage of the syncContext for that type of problem.

Similarly, PickFirstLb could have requested a new connection after
shutdown(). We want to avoid that sort of thing too.

RingHashLb's test changed from CONNECTING to TRANSIENT_FAILURE to get
the latest picker. Because two subchannels have failed it will be in
TRANSIENT_FAILURE. Previously the test was using an older picker with
out-of-date subchannelView, and the verifyConnection() was too imprecise
to notice it was creating the wrong subchannel.

As discovered in b/430347751, where ClusterImplLb was seeing a new
subchannel being called after the child LB was shutdown (the shutdown
itself had been caused by RingHashConfig not implementing equals() and
was fixed by a8de9f07ab, which caused ClusterResolverLb to replace its
state):

```
java.lang.NullPointerException
	at io.grpc.xds.ClusterImplLoadBalancer$ClusterImplLbHelper.createClusterLocalityFromAttributes(ClusterImplLoadBalancer.java:322)
	at io.grpc.xds.ClusterImplLoadBalancer$ClusterImplLbHelper.createSubchannel(ClusterImplLoadBalancer.java:236)
	at io.grpc.util.ForwardingLoadBalancerHelper.createSubchannel(ForwardingLoadBalancerHelper.java:47)
	at io.grpc.util.ForwardingLoadBalancerHelper.createSubchannel(ForwardingLoadBalancerHelper.java:47)
	at io.grpc.internal.PickFirstLeafLoadBalancer.createNewSubchannel(PickFirstLeafLoadBalancer.java:527)
	at io.grpc.internal.PickFirstLeafLoadBalancer.requestConnection(PickFirstLeafLoadBalancer.java:459)
	at io.grpc.internal.PickFirstLeafLoadBalancer.acceptResolvedAddresses(PickFirstLeafLoadBalancer.java:174)
	at io.grpc.xds.LazyLoadBalancer$LazyDelegate.activate(LazyLoadBalancer.java:64)
	at io.grpc.xds.LazyLoadBalancer$LazyDelegate.requestConnection(LazyLoadBalancer.java:97)
	at io.grpc.util.ForwardingLoadBalancer.requestConnection(ForwardingLoadBalancer.java:61)
	at io.grpc.xds.RingHashLoadBalancer$RingHashPicker.lambda$pickSubchannel$0(RingHashLoadBalancer.java:440)
	at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:96)
	at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:128)
	at io.grpc.xds.client.XdsClientImpl$ResourceSubscriber.onData(XdsClientImpl.java:817)
```
2025-07-17 12:56:33 +00:00
Kannan J d352540a02
api: Add more Javadoc for NameResolver.Listener2 interface (#12220) 2025-07-16 14:39:43 +05:30
Eric Anderson d2d8ed8efa xds: Add logical dns cluster support to XdsDepManager
ClusterResolverLb gets the NameResolverRegistry from
LoadBalancer.Helper, so a new API was added in NameResover.Args to
propagate the same object to the name resolver tree.

RetryingNameResolver was exposed to xds. This is expected to be
temporary, as the retrying is being removed from ManagedChannelImpl and
moved into the resolvers. At that point, DnsNameResolverProvider would
wrap DnsNameResolver with a similar API to RetryingNameResolver and xds
would no longer be responsible.
2025-06-17 22:14:20 +00:00
Kim Jin Young 12aaf88d86
Fix comment's typo (#12045) 2025-05-05 22:32:31 +05:30
Kannan J 7952afdd56
Add some documentation to StatusOr.equals regarding how underlying statuses are compared, to avoid any confusion, as suggested in issue #11949. (#12036)
Add some documentation to StatusOr.equals regarding how underlying statuses are compared, to avoid any confusion, as suggested in issue #11949.
2025-04-23 18:42:24 +05:30
Eric Anderson 9619453799
Implement grpc.lb.backend_service optional label
This completes gRFC A89. 7162d2d66 and fc86084df had already implemented
the LB plumbing for the optional label on RPC metrics. This observes the
value in OpenTelemetry and adds it to WRR metrics as well.

https://github.com/grpc/proposal/blob/master/A89-backend-service-metric-label.md
2025-04-21 06:17:43 -07:00
Kurt Alfred Kluever 84bd01454b context: Remove mention of "epoch" from Ticker.nanoTime() javadocs, plus other minor touchups
In Java, when people hear "epoch", they think "unix epoch".

cl/747082451
2025-04-15 13:00:35 -07:00
Eric Anderson f79ab2f16f api: Remove deprecated SubchannelPicker.requestConnection()
It has been deprecated since cec9ee368, six years ago. It was replaced
with LoadBalancer.requestConnection().
2025-04-09 12:51:33 -07:00
Abhishek Agrawal d4c46a7f1f
refactor: prevents global stats config freeze in ConfiguratorRegistry.getConfigurators() (#11991) 2025-04-04 11:23:08 +05:30
Alex Panchenko d60e6fc251
Replace usages of deprecated ExpectedException in grpc-api and grpc-core (#11962) 2025-03-21 13:00:24 +05:30
Eric Anderson e80c197455
xds: Use XdsDependencyManager for XdsNameResolver
Contributes to the gRFC A74 effort.
https://github.com/grpc/proposal/blob/master/A74-xds-config-tears.md

The alternative to using Mockito's ArgumentMatcher is to use Hamcrest.
However, Hamcrest did not impress me. ArgumentMatcher is trivial if you
don't care about the error message.

This fixes a pre-existing issue where ConfigSelector.releaseCluster
could revert the LB config back to using cluster manager after releasing
all RPCs using a cluster have committed.

Co-authored-by: Larry Safran <lsafran@google.com>
2025-03-18 14:05:01 -07:00
Eric Anderson 70825adce6 Replace jsr305's GuardedBy with Error Prone's
We should avoid jsr305 and error prone's has the same semantics.
2025-01-10 08:16:48 -08:00
Eric Anderson 7b5d0692cc
Replace jsr305's CheckReturnValue with Error Prone's (#11811)
We should avoid jsr305 and error prone's has the same semantics.

Fixes #8687
2025-01-09 13:45:35 -08:00
Eric Anderson 805cad3782 bazel: Restore DoNotCall ErrorProne check
In e08b9db20 we added `@DoNotCall` annotations to some call sites, but
Bazel used an older version of ErrorProne that complained at times it
shouldn't. The minimum version of Bazel we test/support is now Bazel 6,
well past Bazel 3.4+.
2024-12-23 12:45:42 -08:00
Eric Anderson aafab74087 api: Use package-private IgnoreJRERequirement
This avoids the dependency on animalsniffer-annotations. grpc-api, and
particularly grpc-context, are used many low-level places and it is
beneficial for them to be very low dependency. This brings grpc-context
back to zero-dependency.
2024-12-23 12:45:26 -08:00
Alex Panchenko ebe2b48677
api: StatusRuntimeException without stacktrace - Android compatibility (#11072)
This is an alternative to e36f099be9 that avoids the "fillInStaceTrace"
constructor which is only available starting at Android API level 24.
2024-12-21 17:09:57 -08:00
John Cormie 0b2d44098f
Introduce custom NameResolver.Args (#11669)
grpc-binder's upcoming AndroidIntentNameResolver needs to know the target Android user so it can resolve target URIs in the correct place. Unfortunately, Android's built in intent:// URI scheme has no way to specify a user and in fact the android.os.UserHandle object can't reasonably be encoded as a String at all.

We solve this problem by extending NameResolver.Args with the same type-safe and domain-specific Key<T> pattern used by CallOptions, Context and CreateSubchannelArgs. New "custom" arguments could apply to all NameResolvers of a certain URI scheme, to all NameResolvers producing a particular type of java.net.SocketAddress, or even to a specific NameResolver subclass.
2024-12-19 23:32:47 -08:00
Eric Anderson 8ea3629378
Re-enable animalsniffer, fixing violations
In 61f19d707a I swapped the signatures to use the version catalog. But I
failed to preserve the `@signature` extension and it all seemed to
work... But in fact all the animalsniffer tasks were completing as
SKIPPED as they lacked signatures. The build.gradle changes in this
commit are to fix that while still using version catalog.

But while it was broken violations crept in. Most violations weren't
too important and we're not surprised went unnoticed. For example, Netty
with TLS has long required the Java 8 API
`setEndpointIdentificationAlgorithm()`, so using `Optional` in the same
code path didn't harm anything in particular. I still swapped it to
Guava's `Optional` to avoid overuse of `@IgnoreJRERequirement`.

One important violation has not been fixed and instead I've disabled the
android signature in api/build.gradle for the moment.  The violation is
in StatusException using the `fillInStackTrace` overload of Exception.
This problem [had been noticed][PR11066], but we couldn't figure out
what was going on. AnimalSniffer is now noticing this and agreeing with
the internal linter. There is still a question of why our interop tests
failed to notice this, but given they are no longer running on pre-API
level 24, that may forever be a mystery.

[PR11066]: https://github.com/grpc/grpc-java/pull/11066
2024-12-19 07:54:54 -08:00
Eric Anderson 0192bece47 api: DeadlineSubject should include actual on failure
This was noticed because of a CallOptionsTest flake that had a
surprising error:
```
expected                    : 59.983387319
but was                     : 59.983387319
outside tolerance in seconds: 0.01
```
2024-11-27 10:55:34 -08:00
Vindhya Ningegowda 20d09cee57
xds: Add counter and gauge metrics (#11661)
Adds the following xDS client metrics defined in [A78](https://github.com/grpc/proposal/blob/master/A78-grpc-metrics-wrr-pf-xds.md#xdsclient).

Counters
- grpc.xds_client.server_failure
- grpc.xds_client.resource_updates_valid
- grpc.xds_client.resource_updates_invalid

Gauges
- grpc.xds_client.connected
- grpc.xds_client.resources
2024-11-25 16:47:32 -08:00
Kannan J dae078c0a6
api: When forwarding from Listener onAddresses to Listener2 continue to use onResult (#11666)
When forwarding from Listener onAddresses to Listener2 continue to use onResult and not onResult2 because the latter requires to be called from within synchronization context and it breaks existing code that didn't need to do so when using the old Listener interface.
2024-11-05 23:52:20 +05:30
Eric Anderson 1993e68b03
Upgrade depedencies (#11655) 2024-11-01 07:50:08 -07:00
Kannan J c167ead851
xds: Per-rpc rewriting of the authority header based on the selected route. (#11631)
Implementation of A81.
2024-10-30 21:11:41 +05:30
SreeramdasLavanya 766b92379b
api: Add java.time.Duration overloads to CallOptions, AbstractStub taking TimeUnit and a time value (#11562) 2024-10-30 18:49:53 +05:30
Lucas Mirelmann 00c8bc78dd
Minor grammar fix in Javadoc (#11609) 2024-10-18 11:29:35 +05:30
Eng Zer Jun 1e0928fb79 api: fix javadoc of CallCredentials.applyRequestMetadata
It is the `Executor appExecutor` that should be given an asynchronous
task, not `CallCredentials.MetadataApplier applier`.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2024-10-17 10:13:12 -07:00
Kannan J 1ded8aff81
On result2 resolution result have addresses or error (#11330)
Combined success / error status passed via ResolutionResult to the NameResolver.Listener2 interface's onResult2 method - Addresses in the success case or address resolution error in the failure case now get set in ResolutionResult::addressesOrError by the internal name resolvers.
2024-10-07 17:55:56 +05:30
Larry Safran 9bb06af963
Change PickFirstLeafLoadBalancer to only have 1 subchannel at a time (#11520)
* Change PickFirstLeafLoadBalancer to only have 1 subchannel at a time if environment variable GRPC_SERIALIZE_RETRIES == true.

Cache serializingRetries value so that it doesn't have to look up the flag every time.

Clear the correct task when READY in processSubchannelState and move the logic to cancelScheduledTasks

Cleanup based on PR review

remove unneeded checks for shutdown.

* Fix previously broken tests

* Shutdown previous subchannel when run off end of index.

* Provide option to disable subchannel retries to let PFLeafLB take control of retries.

* InternalSubchannel internally goes to IDLE when sees TF when reconnect is disabled.
Remove an extra index.increment in LeafLB
2024-10-02 17:03:47 -07:00
Vindhya Ningegowda 1dae144f0a
xds: Fix load reporting when pick first is used for locality-routing. (#11495)
* Determine subchannel's network locality from connected address, instead of assuming that all addresses for a subchannel are in the same locality.
2024-08-31 16:07:53 -07:00
Eric Anderson ff8e413760
Remove direct dependency on j2objc
Bazel had the dependency added because of #5046, where Guava was
depending on it as compile-only and Bazel build have "unknown enum
constant" warnings. Guava now has a compile dependency on j2objc, so
this workaround is no longer needed. There are currently no version skew
issues in Gradle, which was the only usage.
2024-08-13 21:33:55 -07:00
Kurt Alfred Kluever 06135a0745 Migrate from the deprecated `Charsets` constants (in Guava) to the `StandardCharsets` constants (in the JDK)
cl/658539667
2024-08-05 13:31:08 -07:00
Eric Anderson 780e4ba086 api: Move ClientStreamTracerTest from core to api
It uses nothing from core and tests an api class.
2024-08-02 09:06:04 -07:00
Kannan J 90d0fabb1f
Introduce onResult2 in NameResolver Listener2 that returns Status
Lets the Name Resolver receive the status of the acceptance of the name resolution by the load balancer.
2024-08-02 20:40:31 +05:30
Eric Anderson ebffb0a6b2 Revert "Introduce onResult2 in NameResolver Listener2 that returns Status (#11313)"
This reverts commit 9ba2f9dec5.

It causes a channel panic due to unimplemented onResult2().

```
java.lang.UnsupportedOperationException: Not implemented.
        at io.grpc.NameResolver$Listener2.onResult2(NameResolver.java:257)
        at io.grpc.internal.DnsNameResolver$Resolve.lambda$run$0(DnsNameResolver.java:334)
        at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:94)
        at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:126)
	at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:333)
```

b/356669977
2024-07-31 14:16:01 -07:00
Eric Anderson dc83446d98 xds: Stop extending RR in WRR
They share very little code, and we really don't want RoundRobinLb to be
public and non-final. Originally, WRR was expected to share much more
code with RR, and even delegated to RR at times. The delegation was
removed in 111ff60e. After dca89b25, most of the sharing has been moved
out into general-purpose tools that can be used by any LB policy.

FixedResultPicker now has equals to makes it as a EmptyPicker
replacement. RoundRobinLb still uses EmptyPicker because fixing its
tests is a larger change. OutlierDetectionLbTest was changed because
FixedResultPicker is used by PickFirstLeafLb, and now RoundRobinLb can
squelch some of its updates for ready pickers.
2024-07-31 13:32:49 -07:00
Kannan J 9ba2f9dec5
Introduce onResult2 in NameResolver Listener2 that returns Status (#11313)
Introducing NameResolver listener method "Status Listener2::onResult2(ResolutionResult)" that returns Status of the acceptance of the name resolution by the load balancer, and the Name Resolver will call this method for both success and error cases.
2024-07-26 15:43:36 +05:30
Eric Anderson b108ed3ddf
api: Give instruments a toString() including their name
This makes it much easier when testing to understand what the
values/arguments are at various parts of the code.
2024-07-24 21:30:10 -07:00
Eric Anderson 7ba293f49f
Upgrade ErrorProne Core to 2.28.0 2024-07-12 14:59:20 -07:00
cooper 25a8b7c507 Support setting onReadyThreshold through AbstractStub
Add copy of the onReadyThreshold property when copying CallOptions(fix bug)
2024-06-27 09:36:49 -07:00
Vindhya Ningegowda 4849e0a191
core: Add label values size validation in MetricRecorder (#11306)
Enhance MetricRecorder: Validate label values count against registered label keys count for default record APIs
2024-06-21 17:06:55 -07:00
Terry Wilson 85ed053006
api: Stabilize ServerBuilder.AddServices() (#11285) 2024-06-13 13:06:01 -07:00
Eric Anderson 960012d76e api: Add ClientStreamTracer.inboundHeaders(Metadata)
This will be used by the metadata exchange of CSM. When recording
per-attempt metrics, we really need per-attempt data and can't leverage
ClientInterceptors.
2024-05-24 11:28:40 -07:00
Eric Anderson 7a663f633c api: Hide internal metric APIs
Some APIs were marked experimental but had internal APIs in their
surface. These were all changed to internal. And then the internal APIs
were mostly hidden from generated documentation.

All these APIs will eventually become public and maybe even stable. But
they need some iteration before we're ready for others to start using
them.
2024-05-08 10:24:24 -07:00
Eric Anderson 54ac06ae30 rls: Add metric test with real channel 2024-05-07 10:06:46 -07:00
hakusai22 6ec744f2a0
Fix various typos (#11144) 2024-05-06 20:29:44 -07:00
Eric Anderson 354b028cae
Add gauge metric API and Otel implementation
This is needed by gRFC A78 for xds metrics, and for RLS metrics. Since
gauges need to acquire a lock (or other synchronization) in the
callback, the callback allows batching multiple gauges together to avoid
acquiring-and-requiring such locks.

Unlike other metrics, gauges are reported on-demand to the MetricSink.
This means not all sinks will receive the same data, as the sinks will
ask for the gauges at different times.
2024-05-06 11:38:04 -07:00
Eric Anderson ca35577327 Add internal channel builder API to get target
This will be used for gRFC A66's OTel per-RPC metric label:

> `grpc.target` : Canonicalized target URI used when creating gRPC
> Channel, e.g. "dns:///pubsub.googleapis.com:443",
> "xds:///helloworld-gke:8000". Canonicalized target URI is the form
> with the scheme included if the user didn't mention the scheme
> (`scheme://[authority]/path`).

The majority of the changes are to move target computation from
ManagedChannelImpl into the builder. A small hack API was added to
ManagedChannelBuilder to get the target to create an interceptor.
2024-05-06 10:53:46 -07:00
Eric Anderson c368a0f9f8
Migrate GlobalInterceptors to ConfiguratorRegistry
This should preserve all the existing behavior of GlobalInterceptors as
used by grpc-gcp-observability, including it disabling the implicit
OpenCensus integration.

Both the old and new API are internal. I hid Configurator and
ConfiguratorRegistry behind Internal-prefixed classes, like had been
done with GlobalInterceptors to further discourage use until the API is
ready.

GlobalInterceptorsTest was modified to become ConfiguratorRegistryTest.
2024-05-06 07:27:41 -07:00