what traffic mirroring is and how to use this feature across clusters at the service mesh layer - Alibaba Cloud Service Mesh

You can use the traffic mirroring feature to mirror production traffic to a test cluster or test service version. Testing that uses the mirrored production traffic mitigates risks involved in version changes without affecting the production environment. This topic describes what traffic mirroring is and how to use this feature across clusters at the service mesh layer.

What is traffic mirroring?

The microservice architecture makes application development and deployment faster, but risks exist in changing service versions. Service Mesh (ASM) provides the traffic mirroring feature to mitigate the risks. The feature, also called traffic shadowing, sends production traffic to a mirrored service in real time. The mirrored traffic happens out of band of the critical request path for the production service. When traffic is mirrored, the requests that are sent to the mirrored service version have their Host/Authority headers appended with -shadow. This distinguishes production traffic and mirrored traffic. You can use the feature to mirror production traffic to a test cluster or test service version before the cluster or service version is running in the production environment. This mitigates risks involved in version changes.

Benefits

Benefit	Description
Less-risky version deployment with more production-like test environment	You can copy production traffic to a test cluster or test service version and perform tests by using the mirrored use cases and traffic. More accurate test results mitigate deployment risks in the production environment.
Unaffected production environment	The mirrored traffic happens out of band of the critical request path for the production service. Any issues caused by the mirrored traffic do not affect the production environment. Requests are mirrored as "fire and forget", which means that the responses are discarded.

Scenarios

Traffic mirroring allows you to test a service that is running in the production environment without affecting the end users. You can perform benchmark testing for two versions of a service to determine whether the new version can process inbound requests in the same way as the existing version.

The following table describes typical scenarios where you can use traffic mirroring.

Scenario	Description
Production traffic mirroring for trial runs and simulation tests	You can mirror traffic from a production cluster to a test cluster for testing. This does not affect the critical request path in the production environment. Assume that you want to replace or transform an old system with or into a new system. You can mirror and import the production traffic in the old system to the new system for a trial run. If you want to perform an experimental architecture adjustment, you can also mirror the production traffic for simulation tests.
New version verification	You can compare the output results of production traffic and mirrored traffic in real time. You can use the mirrored traffic in drills before you release a new service. All the production traffic can be mirrored. Traditional manual drills are performed based on sample data. (It would be hard to predict how a service will respond to production traffic.) With mirrored production traffic, you can simulate all the situations in the production environment, such as exceptional special characters and tokens that suffer malicious attacks. This helps you understand the processing and troubleshooting capabilities of the service to be released.
Isolation of database data from test data	If you want to test data processing performance, you can import test data to an empty database and then mirror production traffic to this test database. This isolates test data from data in the production database.
Running service troubleshooting	When an unexpected issue occurs to a running service, it is hard to reproduce the issue on an on-premises network. In this case, you can start a temporary service, and mirror traffic from the running service to the temporary service for debugging. This troubleshooting way does not affect the running service.
User behavior logging	Samples and data are critical for recommendation system algorithms. The biggest challenge of traditional automated testing for algorithm-dependent applications is the lack of real-world user behavior data. Traffic mirroring allows you to store user behavior data in logs. The log data can be used in simulation tests for building recommendation system algorithms. It can also be used as a big data source for user profile analysis.

Sample code for using traffic mirroring

The following example YAML file shows how to use traffic mirroring in Istio. In the example, VirtualService routes all traffic to the v1 subset and mirrors the traffic to the v1-mirroring subset. When requests are sent to the v1 subset, the requests are copied and sent to the v1-mirroring subset.

After the v1-mirroring subset sends requests to the v1 version of the application, you can view the application logs. You can see that when the application is invoked, the response is from the v1 subset. You can also see that requests are mirrored to the v1-mirroring subset.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-traffic-mirroring
spec:
  hosts:
    - myapp
  http:
    - route:
        - destination:
            host: myapp.default.svc.cluster.local
            port:
              number: 8000
            subset: v1
          weight: 100
      mirror:
        host: myapp.default.svc.cluster.local
        port:
          number: 8000
        subset: v1-mirroring

Enable traffic mirroring across clusters

Traffic mirroring at the service mesh layer is mostly used in scenarios where production traffic needs to be mirrored to the environment to be released. Therefore, cross-cluster traffic mirroring is common. In this example, Cluster A is the production environment and Cluster B is the test environment. Requests are sent to Cluster A, and the ingress gateway in Cluster A mirrors the traffic to Cluster B. 基于集群内服务层使用流量镜像

Step 1: Deploy a sample application service in Cluster B

Create an httpbin.yaml file that contains the following content:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: httpbin
---
apiVersion: v1
kind: Service
metadata:
  name: httpbin
  labels:
    app: httpbin
    service: httpbin
spec:
  ports:
  - name: http
    port: 8000
    targetPort: 80
  selector:
    app: httpbin
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin-v1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpbin
      version: v1
  template:
    metadata:
      labels:
        app: httpbin
        version: v1
    spec:
      serviceAccountName: httpbin
      containers:
      - image: docker.io/kennethreitz/httpbin
        imagePullPolicy: IfNotPresent
        name: httpbin
        ports:
        - containerPort: 80
---

Run the following command to deploy the httpbin application service of the v1 version:
```
kubectl apply -f httpbin.yaml
```

Step 2: Configure a routing rule for the ingress gateway in Cluster B

Create an httpbin-gateway.yaml file that contains the following content:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: httpbin-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
  - "*"
  gateways:
  - httpbin-gateway
  http:
  - match:
    - uri:
        prefix: /headers
    route:
    - destination:
        host: httpbin
        port:
          number: 8000

Run the following command to deploy the routing rule:
```
kubectl apply -f httpbin-gateway.yaml
```

Run the following command to access the ingress gateway in Cluster B and check whether the service works as expected:

curl http://{IP address of the ingress gateway in Cluster B}/headers

Sample output:

{
  "headers": {
    "Accept": "*/*",
    "Host": "47.99.XX.XX",
    "User-Agent": "curl/7.79.1",
    "X-Envoy-Attempt-Count": "1",
    "X-Envoy-External-Address": "120.244.XXX.XXX",
    "X-Forwarded-Client-Cert": "By=spiffe://cluster.local/ns/default/sa/httpbin;Hash=158e4ef69876550c34d10e3bfbd8d43f5ab481b16ba0e90b4e38a2d53ac****;Subject=\"\";URI=spiffe://cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account"
  }
}

If the preceding result is returned, it indicates that the service works as expected.

Step 3: Configure an external access rule in the service mesh of Cluster A

The host of the mirrored service uses an external domain name. You need to create a service entry to specify the DNS resolution method of the host.

Create an httpbin-cluster-b.yaml file that contains the following content:

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: httpbin-cluster-b
spec:
  hosts:
  - httpbin.mirror.cluster-b
  location: MESH_EXTERNAL
  ports:
  - number: 80   # Specifies the port of the ingress gateway in Cluster B. 
    name: http
    protocol: HTTP
  resolution: STATIC
  endpoints:
  - address: 47.95.XX.XX # Specifies the IP address of the ingress gateway in Cluster B.

Run the following command to create the service entry:
```
kubectl apply -f httpbin-cluster-b.yaml
```
Create an httpbin-gateway.yaml file that contains the following content:
The YAML configurations route all traffic to the v1 version of the httpbin service in Cluster A, and mirror the traffic to the httpbin service in Cluster B. httpbin.mirror.cluster-b is the address used to access the external service.
```
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: httpbin-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  gateways:
    - httpbin-gateway
  hosts:
    - '*'
  http:
    - match:
        - uri:
            prefix: /headers
      mirror:
        host: httpbin.mirror.cluster-b
        port:
          number: 80
      mirrorPercentage:
        value: 50
      route:
        - destination:
            host: httpbin
            port:
              number: 8000
            subset: v1
```
Note: The traffic destined for httpbin.mirror.cluster-b is the same as the traffic destined for the original destination. The only difference is that the Host/Authority header is suffixed with -shadow. In the preceding YAML configurations, the Host/Authority header of the mirrored traffic is not httpbin.mirror.cluster-b, but the original request header with the -shadow suffix. The host field in the mirror section is used to only find the destination address to which traffic is forwarded and does not change the original Host header.
Run the following command to deploy a routing rule:
```
kubectl apply -f httpbin-gateway.yaml
```

View the Envoy config dump of the ingress gateway pod in Cluster A.

"routes": [
         {
          "match": {
           "prefix": "/headers",
           "case_sensitive": true
          },
          "route": {
           "cluster": "outbound|8000|v1|httpbin.default.svc.cluster.local",
           "timeout": "0s",
           "retry_policy": {
            "retry_on": "connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes",
            "num_retries": 2,
            "retry_host_predicate": [
             {
              "name": "envoy.retry_host_predicates.previous_hosts",
              "typed_config": {
               "@type": "type.googleapis.com/envoy.extensions.retry.host.previous_hosts.v3.PreviousHostsPredicate"
              }
             }
            ],
            "host_selection_retry_max_attempts": "5",
            "retriable_status_codes": [
             503
            ]
           },
           "request_mirror_policies": [
            {
             "cluster": "outbound|80||httpbin.mirror.cluster-b",
             "runtime_fraction": {
              "default_value": {
               "numerator": 500000,
               "denominator": "MILLION"
              }
             },
             "trace_sampled": false
            }
           ],

In the preceding sample code, the request_mirror_policies field specifies the policy for request traffic mirroring, the cluster field specifies the service to which mirrored traffic is sent, and the runtime_fraction field specifies the ratio of traffic to be mirrored. The numerator field is set to 500000 and the denominator field to MILLION, which indicates 50%.