This PR tries to improve observability of xDS workflow for some extent. Users can configure Java logger `io.grpc.xds` (or `io.grpc.xds.XdsLogger`) level to enable different verbosities of log messages.
Verbosity of logging:
- FINE: mostly nothing useless there is something abnormal happens such xDS RPC stream closed.
- FINER: informative log messages showing the main xDS workflow happening under the hood.
- FINEST: verbose log messages for debugging purposes, original RPC messages and data types are printed.
Previously, the internal prod test name resolver will give grpclb balancer addresses in `ResolutionResult.addresses`. So we have this filtering code to avoid those addresses being used. We've changed the internal resolver, it will never mix grpclb balancer addresses with normal backend addresses. Therefore, we no longer need this piece of code.
Fixes load reporting integration due to LRS design flaws.
- Updated LRS protocol. The Node sent in LRS requests use a special metadata "PROXYLESS_CLIENT_HOSTNAME" with value being the hostname (including port) for creating the gRPC channel. Management server is able to infer clusters that the gRPC client potentially sends load to. LRS initial request does not need to populate clusters it wants to report load for.
- Each ClusterStats message in LRS requests represents the loads for each (cluster, cluster_service), where cluster_service field is optional. EDS LB policy should track loads per (cluster, cluster_service) and populate cluster name from upstream CDS policy.
- Modified CdsUpdate, which is the converted data of a CDS response. edsServiceName field can be null when an CDS response does not give it. We want to preserve the null value for LRS requests.
Current implementation of client side load reporting is incorrect. Mainly each gRPC channel should have at most one LRS stream based on the current design of using a single management server. In this change:
- Each LoadStatsStore instance is associated with clusterName:clusterServiceName. clusterName and clusterServiceName (nullable) is required to construct an LoadStatsStore instance.
- The semantics is that an LoadStatsStore is responsible for recording loads sent to that cluster service of the cluster.
The queried load report (via LoadStatsStore#generateLoadReport()) will have cluster_name and cluster_service_name (if not null) set.
- A LoadReportClient is responsible for reporting loads for all clusters. Add LoadStatsStore to LoadReportClient via LoadReportClient#addLoadStatsStore(clusterName, clusterServiceName, loadStatsStore). This should be done before LoadReportClient#startLoadReporting() is called due to the above open question.
- An XdsClient contains a single LoadReportClient instance. Its APIs XdsClient#reportClientStats(clusterName, clusterServiceName, loadStatsStore) calls LoadReportClient#addLoadStatsStore(clusterName, clusterServiceName, loadStatsStore) and then starts it. XdsClient#cancelClientStatsReport(clusterName, clusterServiceName) calls LoadReportClient#removeLoadStatsStore(clusterName, clusterServiceName) and stops it. LoadReportClient#addLoadStatsStore(clusterName, clusterServiceName, loadStatsStore) cannot be called repeatedly as once the load reporting started, we cannot change the cluster to report loads for. However, we are able to do report then cancel then report then cancel and so on.
- Refactored EdsLoadBalancer a bit, to accommodate the new APIs of enabling/disabling load reporting. The ClusterEndpointsLoadBalancer instance carries its own LoadStatsStore and controls start/cancel of load reporting.
- The interface for LoadReportClient is eliminated. LoadReportClient will completely be a subcomponent of XdsClient.
Note: Currently we assume no cluster/eds service switch, which means we will report load for a single cluster/eds service. So we make the restriction to LoadReportClient#addLoadStatsStore() API that it cannot be called after load reporting has already started. This restriction will be removed after the above open question is resolved.
Previously when eds service name is changed, the old endpoint watcher is canceled immediately even it's in graceful switch period, so the old ClusterEndpointsBalancer won't receive any new updates. This behavior is not as good/clean as cancelling the old watch only once the old ClusterEndpointsBalancer is shutdown.
Previously when CdsConfig is changed, the old cluster watcher is canceled immediately even it's in graceful switch period, so the old cluster balancer won't receive any new updates. This behavior is not as good/clean as cancelling the old watch only once the old cluster balancer is shutdown.
Although current LRS client API takes in load stats data for multiple cluster services, it only expects the management server to ask loads for a single cluster services (the LRS response will be ignored if management server asks for more than one). This change removes that assumption/restriction, the actual loads to be reported will be the intersection of services that we have loads for and services that management server asks for.
This change also cleans up LRS client's tests.
This noticed that load_balancer.proto had local changes introduced
in #6549. This was not noticed by Bazel because grpclb was not using
the io_grpc_grpc_proto repository. These issues have been fixed.
In v1.27.0 release the grpc-interop-testing artifact in maven includes grpc-xds, but grpc-xds is not yet published. It should be removed from the dependency list in maven artifact.
Creates an internal accessor for attribute keys in grpclb package that is used by name resolver implementations to set balancer addresses as name resolution result attributes.
First take for grpclb selection stabilization:
1. Changed DnsNameResolver to return balancer addresses as a GrpcAttributes.ATTR_LB_ADDRS attribute in ResolutionResult, instead of among the addresses.
2. AutoConfiguredLoadBalancerFactory decides LB policy solely based on parsed service config without looking at resolved addresses. Behavior changes:
- If no LB policy is specified in service config, default to pick_first, even if there exist balancer addresses (in attributes).
- If grpclb specified but not available and no other specified policies available, it will fail without fallback to round_robin.
3. GrpclbLoadBalancer populates balancer addresses from ResolvedAddresses's attribute (GrpclbConstants.ATTR_LB_ADDRS) instead of sieving from addresses.
Implements the precise logic for choosing the virtual host in RouteConfiguration of RDS responses. Specifically, fixes logic for domain search order. Minor fix for checking match field in RouteConfiguration. See RouteConfiguration Proto section in gRPC Client xDS API Flow design doc for specification.
These methods were used to migrate the Java toolchains to use toolchain
resolution. Now that the migration is complete, the toolchain providers
can be used directly.