Commit Graph

6814 Commits

Author SHA1 Message Date
Larry Safran 764a4e3f08
xds: Cleanup by moving methods in XdsDependencyManager ahead of classes (#11890)
* Move private methods ahead of classes
2025-02-11 14:34:46 -08:00
Larry Safran ade2dd2038
xds: Change XdsClusterConfig to have children field instead of endpoint (#11888)
* Change XdsConfig to match spec with a `children` object holding either `a list of leaf cluster names` or `an EdsUpdate`.  Removed intermediate aggregate nodes from `XdsConfig.clusters`.
2025-02-11 12:38:52 -08:00
Kannan J fc8571a0e5
Version upgrades (#11874) 2025-02-12 01:08:46 +05:30
Naveen Prasanna V 302342cfce
core: logging the error message when onClose() itself fails (#11880) 2025-02-11 11:38:07 +05:30
Benjamin Peterson dc316f7fd9 Add missing frame release to Http2ClientStreamTransportState.
If a data frame is received before headers, processing of the frame is abandoned. The frame must be released in that case.
2025-02-10 21:40:05 -08:00
Sergii Tkachenko bd6af59221
xds: improve code readability of server FilterChain parsing
- Improve code flow and variable names
- Reduce nesting
- Add comments between logical blocks
- Add comments explaining some xDS/gRPC nuances
2025-02-10 17:14:07 -08:00
Larry Safran 67fc2e156a
Add new classes for eliminating xds config tears (#11740)
* Framework definition to support A74
2025-02-07 16:33:17 -08:00
MV Shiva 90b1c4fe94
protobuf: Stabilize marshallerWithRecursionLimit (#11884) 2025-02-07 14:16:13 +05:30
Abhishek Agrawal 44e92e2c2c
core: updates the backoff range as per the A6 redefinition (#11858)
* core: updates the backoff range being used from [0, 1] to [0.8, 1.2] as per the A6 redefinition

* adds a flag for experimental jitter

* xds: Allow FaultFilter's interceptor to be reused

This is the only usage of PickSubchannelArgs when creating a filter's
ClientInterceptor, and a follow-up commit will remove the argument and
actually reuse the interceptors. Other filter's interceptors can
already be reused.

There doesn't seem to be any significant loss of legibility by making
FaultFilter a more ordinary interceptor, but the change does cause the
ForwardingClientCall to be present when faultDelay is configured,
independent of whether the fault delay ends up being triggered.

Reusing interceptors will move more state management out of the RPC path
which will be more relevant with RLQS.

* netty: Removed 4096 min buffer size (#11856)

* netty: Removed 4096 min buffer size

* turns the flag in a var for better efficiency

---------

Co-authored-by: Eric Anderson <ejona@google.com>
2025-02-05 14:18:20 -08:00
Alex Panchenko 3142928fa3
use gradle java-platform in grpc-bom; Fixes #5530 (#11875)
* use gradle java-platform in grpc-bom; Fixes #5530

* fix withXml

* explicitly exclude grpc-compiler
2025-02-05 11:10:39 -08:00
Eric Anderson 199a7ea3e8
xds: Improve XdsNR's selectConfig() variable handling
The variables from the do-while are no longer initialized to let the
compiler verify that the loop sets each. Unnecessary comparisons to null
are also removed and is more obvious as the variables are never set to
null. Added a minor optimization of computing the RPCs path once instead
of once for each route. The variable declarations were also sorted to
match their initialization order.

This does fix an unlikely bug where if the old code could successfully
matched a route but fail to retain the cluster, then when trying a
second time if the route was _not_ matched it would re-use the prior route
and thus infinite-loop failing to retain that same cluster.

It also adds a missing cast to unsigned long for a uint32 weight. The old
code would detect if the _sum_ was negative, but a weight using 32 bits
would have been negative and never selected.
2025-02-05 10:37:22 -08:00
Eric Anderson ea3f644eef
Replace Kokoro ARM build with GitHub Actions
The Kokoro aarch64 build runs on x86 with an emulator, and has always
been flaky due to the slow execution speed. At present it is continually
failing due to deadline exceededs. GitHub Actions is running on aarch64
hardware, so is much faster (4 minutes vs 30 minutes, without including
the speedup from GitHub Action's caching).
2025-02-04 10:51:13 -08:00
Alex Panchenko 4a10a38166
servlet: remove 4096 min buffer size
Similar to 7153ff8
2025-02-04 10:49:30 -08:00
Eric Anderson b1bc0a9d24 alts: Add ClientCall support to AltsContextUtil
This adds a createFrom(Attributes) to mirror the check(Attributes) added
in ba8ab79. It also adds conveniences for ClientCall for both
createFrom() and check(). This allows getting peer information from
ClientCall and CallCredentials.RequestInfo, as was already available
from ServerCall.

The tests were reworked to test the Attribute-based methods and then
only basic tests for client/server.

Fixes #11042
2025-01-30 16:10:19 -08:00
Eric Anderson 04f1cc5845 xds: Make XdsNR.RoutingConfig.empty a constant
The field was made final in 4b52639aa but was soon reverted in 3ebb3e192
because of what I assume was a bad merge conflict resolution. The field
has contained an immutable object since its introduction in d25f5acf1,
so it is pretty likely to remain a constant in the future.
2025-01-30 15:10:12 -08:00
Eric Anderson c506190b0f
xds: Reuse filter interceptors across RPCs
This moves the interceptor creation from the ConfigSelector to the
resource update handling.

The code structure changes will make adding support for filter
lifecycles (for RLQS) a bit easier. The filter lifecycles will allow
filters to share state across interceptors, and constructing all the
interceptors on a single thread will mean filters wouldn't need to be
thread-safe (but their interceptors would be thread-safe).
2025-01-30 12:43:51 -08:00
Eric Anderson 90aefb26e7 core: Propagate authority override from LB exactly once
Setting the authority is only useful when creating a real stream, as
there will be a following pick otherwise. In addition, DelayedStream
will buffer each call to setAuthority() in a list and we don't want that
memory usage. Note that no LBs are using this feature yet, so users
would not have been exposed to the memory use.

We also needed to setAuthority() when the LB selected a subchannel on
the first pick attempt.
2025-01-30 08:08:17 -08:00
Abhishek Agrawal 7153ff8522
netty: Removed 4096 min buffer size (#11856)
* netty: Removed 4096 min buffer size
2025-01-30 12:52:37 +05:30
Eric Anderson b3db8c2489 xds: Allow FaultFilter's interceptor to be reused
This is the only usage of PickSubchannelArgs when creating a filter's
ClientInterceptor, and a follow-up commit will remove the argument and
actually reuse the interceptors. Other filter's interceptors can
already be reused.

There doesn't seem to be any significant loss of legibility by making
FaultFilter a more ordinary interceptor, but the change does cause the
ForwardingClientCall to be present when faultDelay is configured,
independent of whether the fault delay ends up being triggered.

Reusing interceptors will move more state management out of the RPC path
which will be more relevant with RLQS.
2025-01-29 14:21:53 -08:00
vinodhabib 9e8629914f
examples: Added README files for all missing Examples (#11676) 2025-01-28 13:02:00 +05:30
Larry Safran 87aa6deadf
core:Have acceptResolvedAddresses() do a seek when in CONNECTING state and cleanup removed subchannels when a seek was successful (#11849)
* Have acceptResolvedAddresses() do a seek when in CONNECTING state and cleanup removed subchannels when a seek was successful.
Move cleanup of removed subchannels into a method so it can be called from 2 places in acceptResolvedAddresses.
Since the seek could mean we never looked at the first address, if we go off the end of the index and haven't looked at the all of the addresses then instead of scheduleBackoff() we reset the index and request a connection.
2025-01-24 16:42:56 -08:00
Boddu Harshavardhan 67351c0c53
gcp-observability: Optimize GcpObservabilityTest.enableObservability execution time (#11783) 2025-01-24 16:50:41 +05:30
MV Shiva bf8eb24a30
Update README etc to reference 1.70.0 (#11854) 2025-01-24 15:21:14 +05:30
Kannan J 0f5503ebb1
xds: Include max concurrent request limit in the error status for concurre… (#11845)
Include max concurrent request limit in the error status for concurrent connections limit exceeded
2025-01-23 21:40:21 +05:30
Eric Anderson 495a8906b2 xds: Fix fallback test FakeClock TSAN failure
d65d3942e increased the test speed of
connect_then_mainServerDown_fallbackServerUp by using FakeClock.
However, it introduced a data race because FakeClock is not thread-safe.
This change injects a single thread for gRPC callbacks such that
syncContext is run on a thread under the test's control.

A simpler approach would be to expose syncContext from XdsClientImpl for
testing. However, this test is in a different package and I wanted to
avoid adding a public method.

```
  Read of size 8 at 0x00008dec9d50 by thread T25:
    #0 io.grpc.internal.FakeClock$ScheduledExecutorImpl.schedule(Lio/grpc/internal/FakeClock$ScheduledTask;JLjava/util/concurrent/TimeUnit;)V FakeClock.java:140
    #1 io.grpc.internal.FakeClock$ScheduledExecutorImpl.schedule(Ljava/lang/Runnable;JLjava/util/concurrent/TimeUnit;)Ljava/util/concurrent/ScheduledFuture; FakeClock.java:150
    #2 io.grpc.SynchronizationContext.schedule(Ljava/lang/Runnable;JLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/ScheduledExecutorService;)Lio/grpc/SynchronizationContext$ScheduledHandle; SynchronizationContext.java:153
    #3 io.grpc.xds.client.ControlPlaneClient$AdsStream.handleRpcStreamClosed(Lio/grpc/Status;)V ControlPlaneClient.java:491
    #4 io.grpc.xds.client.ControlPlaneClient$AdsStream.lambda$onStatusReceived$0(Lio/grpc/Status;)V ControlPlaneClient.java:429
    #5 io.grpc.xds.client.ControlPlaneClient$AdsStream$$Lambda+0x00000001004a95d0.run()V ??
    #6 io.grpc.SynchronizationContext.drain()V SynchronizationContext.java:96
    #7 io.grpc.SynchronizationContext.execute(Ljava/lang/Runnable;)V SynchronizationContext.java:128
    #8 io.grpc.xds.client.ControlPlaneClient$AdsStream.onStatusReceived(Lio/grpc/Status;)V ControlPlaneClient.java:428
    #9 io.grpc.xds.GrpcXdsTransportFactory$EventHandlerToCallListenerAdapter.onClose(Lio/grpc/Status;Lio/grpc/Metadata;)V GrpcXdsTransportFactory.java:149
    #10 io.grpc.PartialForwardingClientCallListener.onClose(Lio/grpc/Status;Lio/grpc/Metadata;)V PartialForwardingClientCallListener.java:39
    ...

  Previous write of size 8 at 0x00008dec9d50 by thread T4 (mutexes: write M0, write M1, write M2, write M3):
    #0 io.grpc.internal.FakeClock.forwardTime(JLjava/util/concurrent/TimeUnit;)I FakeClock.java:368
    #1 io.grpc.xds.XdsClientFallbackTest.connect_then_mainServerDown_fallbackServerUp()V XdsClientFallbackTest.java:358
    ...
```
2025-01-22 16:00:00 -08:00
Eric Anderson fc86084df5 xds: Rename grpc.xds.cluster to grpc.lb.backend_service
The name is being changed to allow the value to be used in more metrics
where xds-specifics are awkward.
2025-01-17 17:16:32 -08:00
Eric Anderson a0a42fc8e6 Update README etc to reference 1.69.1 2025-01-17 13:30:22 -08:00
Eric Anderson f299ef5e58 compiler: Prepare for C++ protobuf using string_view
Protobuf is interested in using absl::string_view instead of const
std::string&. Just copy to std::string as the C++17 build isn't yet
operational and that level of performance doesn't matter.

cl/711732759 b/353571051
2025-01-17 10:44:01 -08:00
MV Shiva b44ebce45d
xds: Envoy proto sync to 2024-11-11 (#11816) 2025-01-17 14:58:52 +05:30
Larry Safran 4d8aff72dc
stub: Eliminate invalid test cases where different threads were calling close from the thread writing. (#11822)
* Eliminate invalid test cases where different threads were calling close from the thread writing.
* Remove multi-thread cancel/write test
2025-01-15 10:43:48 -08:00
Eric Anderson 87c7b7a375 interop-testing: Move soak out of AbstractInteropTest
The soak code grew considerably in 6a92a2a22e. Since it isn't a JUnit
test and doesn't resemble the other tests, it doesn't belong in
AbstractInteropTest. AbstractInteropTest has lots of users, including it
being re-compiled for use on Android, so moving it out makes the
remaining code more clear for the more common cases.
2025-01-14 13:28:10 -08:00
zbilun 87b27b1545
interop-testing: fix peer extraction issue in soak test iterations
This PR resolves an issue with peer address extraction in the soak
test. 

In current `TestServiceClient` implementation, the same
`clientCallCapture` atomic is shared across threads, leading to
incorrect peer extraction. This fix ensures that each thread uses a
local variable for capturing the client call.
2025-01-13 17:57:39 -08:00
Larry Safran 228dcf7a01
core: Alternate ipV4 and ipV6 addresses for Happy Eyeballs in PickFirstLeafLoadBalancer (#11624)
* Interweave ipV4 and ipV6 addresses as per gRFC.
2025-01-13 16:38:16 -08:00
Eric Anderson 7162d2d661 xds: Pass grpc.xds.cluster label to tracer
This is in service to gRFC A89. Since the gRFC isn't finalized this
purposefully doesn't really do anything yet. The grpc-opentelemetry
change to use this optional label will be done after the gRFC is merged.
grpc-opentelemetry currently has a hard-coded list (one entry) of labels
that it looks for, and this label will need to be added.

b/356167676
2025-01-13 11:54:36 -08:00
Larry Safran 176f3eed12
xds: Enable Xds Client Fallback by default (#11817) 2025-01-10 15:25:15 -08:00
Eric Anderson d65d3942e6 xds: Increase speed of fallback test
These changes reduce connect_then_mainServerDown_fallbackServerUp test
time from 20 seconds to 5 s by faking time for the the does-no-exist
timer.

XdsClientImpl only uses the TimeProvider for CSDS cache details, so any
implementation should be fine. FakeXdsClient provides an implementation,
so might as well use it as it is one less clock to think about.
2025-01-10 08:28:48 -08:00
Eric Anderson 82403b9bfa util: Increase modtime in AdvancedTlsX509TrustManagerTest
I noticed an old JDK 8u275 failed on the test because the modification
time's resolution was one second. A newer JDK 8u432 worked fine, so it's
not really a problem for me, but increasing the time difference is
cheap. I used two seconds as that's the resolution available on FAT
(which is unlikely to be TMPDIR, even on Windows).
2025-01-10 08:17:12 -08:00
Eric Anderson 70825adce6 Replace jsr305's GuardedBy with Error Prone's
We should avoid jsr305 and error prone's has the same semantics.
2025-01-10 08:16:48 -08:00
Eric Anderson 7b5d0692cc
Replace jsr305's CheckReturnValue with Error Prone's (#11811)
We should avoid jsr305 and error prone's has the same semantics.

Fixes #8687
2025-01-09 13:45:35 -08:00
Albumen Kevin 73721acc0d
Add UnitTest to verify updateTrustCredentials rotate (#11798)
* Add lastUpdateTime to avoid read
2025-01-09 11:53:36 -08:00
Eric Anderson e61b03cb9f netty: Fix getAttributes() data races in NettyClientTransportTest
Since approximately the LBv2 API (the current API) was introduced, gRPC
won't use a transport until it is ready. Long ago, transports could be
used before they were ready and these old tests were not waiting for the
negotiator to complete before starting. We need them to wait for the
handshake to complete to avoid a test-only data race in getAttributes()
noticed by TSAN.

Throwing away data frames in the Noop handshaker is necessary to act
like a normal handshaker; they don't allow data frames to pass until the
handshake is complete. Without the handling, it goes through invalid
code paths in NettyClientHandler where a terminated transport becomes
ready, and a similar data race.

```
  Write of size 4 at 0x00008db31e2c by thread T37:
    #0 io.grpc.netty.NettyClientHandler.handleProtocolNegotiationCompleted(Lio/grpc/Attributes;Lio/grpc/InternalChannelz$Security;)V NettyClientHandler.java:517
    #1 io.grpc.netty.ProtocolNegotiators$GrpcNegotiationHandler.userEventTriggered(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V ProtocolNegotiators.java:937
    #2 io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(Ljava/lang/Object;)V AbstractChannelHandlerContext.java:398
    #3 io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(Lio/netty/channel/AbstractChannelHandlerContext;Ljava/lang/Object;)V AbstractChannelHandlerContext.java:376
    #4 io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(Ljava/lang/Object;)Lio/netty/channel/ChannelHandlerContext; AbstractChannelHandlerContext.java:368
    #5 io.grpc.netty.ProtocolNegotiators$ProtocolNegotiationHandler.fireProtocolNegotiationEvent(Lio/netty/channel/ChannelHandlerContext;)V ProtocolNegotiators.java:1107
    #6 io.grpc.netty.ProtocolNegotiators$WaitUntilActiveHandler.channelActive(Lio/netty/channel/ChannelHandlerContext;)V ProtocolNegotiators.java:1011
    ...

  Previous read of size 4 at 0x00008db31e2c by thread T4 (mutexes: write M0, write M1, write M2, write M3):
    #0 io.grpc.netty.NettyClientHandler.getAttributes()Lio/grpc/Attributes; NettyClientHandler.java:345
    #1 io.grpc.netty.NettyClientTransport.getAttributes()Lio/grpc/Attributes; NettyClientTransport.java:387
    #2 io.grpc.netty.NettyClientTransport.newStream(Lio/grpc/MethodDescriptor;Lio/grpc/Metadata;Lio/grpc/CallOptions;[Lio/grpc/ClientStreamTracer;)Lio/grpc/internal/ClientStream; NettyClientTransport.java:198
    #3 io.grpc.netty.NettyClientTransportTest$Rpc.<init>(Lio/grpc/netty/NettyClientTransport;Lio/grpc/Metadata;)V NettyClientTransportTest.java:953
    #4 io.grpc.netty.NettyClientTransportTest.huffmanCodingShouldNotBePerformed()V NettyClientTransportTest.java:631
    ...
```

```
  Read of size 4 at 0x00008f983a3c by thread T4 (mutexes: write M0, write M1):
    #0 io.grpc.netty.NettyClientHandler.getAttributes()Lio/grpc/Attributes; NettyClientHandler.java:345
    #1 io.grpc.netty.NettyClientTransport.getAttributes()Lio/grpc/Attributes; NettyClientTransport.java:387
    #2 io.grpc.netty.NettyClientTransport.newStream(Lio/grpc/MethodDescriptor;Lio/grpc/Metadata;Lio/grpc/CallOptions;[Lio/grpc/ClientStreamTracer;)Lio/grpc/internal/ClientStream; NettyClientTransport.java:198
    #3 io.grpc.netty.NettyClientTransportTest$Rpc.<init>(Lio/grpc/netty/NettyClientTransport;Lio/grpc/Metadata;)V NettyClientTransportTest.java:973
    #4 io.grpc.netty.NettyClientTransportTest$Rpc.<init>(Lio/grpc/netty/NettyClientTransport;)V NettyClientTransportTest.java:969
    #5 io.grpc.netty.NettyClientTransportTest.handlerExceptionDuringNegotiatonPropagatesToStatus()V NettyClientTransportTest.java:425
    ...

  Previous write of size 4 at 0x00008f983a3c by thread T56:
    #0 io.grpc.netty.NettyClientHandler$FrameListener.onSettingsRead(Lio/netty/channel/ChannelHandlerContext;Lio/netty/handler/codec/http2/Http2Settings;)V NettyClientHandler.java:960
    ...
```
2025-01-07 10:25:22 -08:00
MV Shiva ae109727d3
Start 1.71.0 development cycle (#11801) 2025-01-07 14:33:09 +05:30
MV Shiva 1edc4d84d4
xds: Parsing xDS Cluster Metadata (#11741) 2025-01-07 10:03:13 +05:30
Larry Safran 4222f77587
xds:Move creating the retry timer in handleRpcStreamClosed to as late as possible and call close() (#11776)
* Move creating the retry timer in handleRpcStreamClosed to as late as possible and call `close` so that the `call` is cancelled.
Also add some debug logging.
2025-01-06 13:09:42 -08:00
Eric Anderson 6c12c2bd24 xds: Remember nonces for unknown types
If the control plane sends a resource type the client doesn't understand
at-the-moment, the control plane will still expect the client to include
the nonce if the client subscribes to the type in the future.

This most easily happens when unsubscribing the last resource of a type.
Which meant 1cf1927d1 was insufficient.
2025-01-06 11:54:35 -08:00
Eric Anderson 4a0f707331 xds: Avoid depending on io.grpc.xds.Internal* classes
Internal* classes should generally be accessors that are used outside of
the package/project. Only one attribute was used outside of xds, so
leave only that one attribute in InternalXdsAttributes. One attribute
was used by the internal.security package, so move the definition to the
same package to reduce the circular dependencies.
2025-01-03 16:01:10 -08:00
Eric Anderson 1cf1927d1a
xds: Preserve nonce when unsubscribing type
This fixes a regression introduced in 19c9b998.

b/374697875
2025-01-03 12:34:47 -08:00
Eric Anderson 9a712c3f77 xds: Make XdsClient.ResourceStore package-private
There's no reason to use the interface outside of
XdsClientImpl/ControlPlaneClient. Since XdsClientImpl implements the
interface directly, its methods are still public. That can be a future
cleanup.
2025-01-03 11:45:55 -08:00
Benjamin Peterson bac8b32043
Fix equality and hashcode of CancelServerStreamCommand. (#11785)
In e036b1b198, CancelServerStreamCommand got another field. But, its hashCode and equals methods were not updated.
2025-01-03 10:42:42 -08:00
Eric Anderson b272f634c1 Disable Gradle Module Metadata resolution
The module metadata in Guava causes the -jre version to be selected even
when you choose the -android version. Gradle did not give any clues that
this was happening, and while
`println(configurations.compileClasspath.resolve())` shows the different
jar in use, most other diagonstics don't. dependencyInsight can show you
this is happening, but only if you know which dependency has a problem
and read Guava's module metadata first to understand the significance of
the results.

You could argue this is a Guava-specific problem. I was able to get
parts of our build working with attributes and resolutionStrategy
configurations mentioned at
https://github.com/google/guava/releases/tag/v32.1.0 , so that only
Guava would be changed. But it was fickle giving poor error messages or
silently swapping back to the -jre version.

Given the weak debuggability, the added complexity, and the lack of
value module metadata is providing us, disabling module metadata for our
entire build seems prudent.

See https://github.com/google/guava/issues/7575
2025-01-03 09:29:31 -08:00