×
Community Blog Istio-Based Exploration and the Practice of the Comprehensive-Procedure Canary Solution

Istio-Based Exploration and the Practice of the Comprehensive-Procedure Canary Solution

This article discusses the core capabilities of ASM Pro, including traffic tagging, tag routing, and traffic fallback.

By Zeng Yuxing (Yuzeng)
Auditing & Proofreading by Zeng Yuxing (Yuzeng)

Background

It is time-consuming to build a complete testing system for verification before you launch new business features in the microservice software architecture. The difficulty increases as the number of split microservices continue to increase. The machine cost required for the testing system is high. This system must be maintained exclusively to ensure the efficiency of feature correctness verification before you launch the new version of an application. When the business becomes large and complex, multiple systems are required. This is a cost and efficiency challenge that the entire industry faces. If the feature verification of the new version can be completed in the same production system before the new version is launched, the cost of human and financial resources can be reduced significantly.

In addition to the feature verification in the development phase, the introduction of canary release in the production environment can help control the risk and blast radius of the new version of the software. Canary release allocates production traffic with specific characteristics by certain proportions to the service version that needs to be verified. In this process, you can observe whether the running state of the new version meets expectations after it is launched.

Alibaba Cloud ASM Pro is a comprehensive-procedure canary solution built based on Service Mesh that can help solve the problems in the preceding two scenarios.

The following diagram shows the feature-oriented architecture of ASM Pro services:

1

The diagram shows the core capabilities of ASM Pro, including traffic tagging, tag routing, and traffic fallback. We will discuss them in detail throughout this article.

Scenario

The following figure shows the common scenarios of the comprehensive-procedure canary release:

2

Let's take Bookinfo as an example. The inbound traffic contains the expected tag group. The sidecar routes and distributes traffic to the corresponding tag group by obtaining the expected tag in the context (Header or Context). If the corresponding tag group does not exist, the traffic fallbacks are routed to the base group by default. You can configure the fallback policy as needed.

The tag of the inbound traffic is added by tagging the request traffic at the gateway in a way similar to using a tag plug-in. For example, you can add a tag that means canary to userids in a certain range. Considering the diversity of implementations and the selection of gateways in the actual environment, the implementation of gateways will not be discussed in this article.

The following part describes how to realize comprehensive-procedure traffic tagging and canary using ASM Pro.

Implementation

3

Inbound refers to the inbound traffic of requests sent to the application, and outbound refers to the outbound traffic of requests sent by the application.

The preceding figure shows a typical traffic path of a business application after the mesh is enabled. The application receives an external request p1 and calls the operation of the service that the application depends on. The traffic path of the request is p1->p2->p3->p4. The Sidecar forwards p1 to generate p2 and forwards p3 to generate p4. If you want to achieve comprehensive-procedure canary, both p3 and p4 need to obtain the traffic tag from p1 to route the request to the backend service instance corresponding to the tag. Besides, p3 and p4 must carry the same tag. The key technology here is to make the passing of tags imperceptible to the application, the pass-through of tags. ASM Pro uses traceId in distributed tracing analysis technologies to realize this feature, such as OpenTracing and OpenTelemetry.

The distributed tracing analysis technology uses a traceId to uniquely identify a complete call trace. Fanout calls issued by each application on the trace carry the source traceId through the distributed tracing analysis SDK. The implementation of the comprehensive-procedure canary solution by using ASM Pro is based on this widely adopted practice of distributed application architecture.

In the preceding figure, the inbound and outbound traffic are independent of the Sidecar. The Sidecar cannot perceive the correspondence between the two, and it is unclear whether one inbound request causes multiple outbound requests. In other words, the Sidecar does not know whether there is a correspondence between p1 and p3 requests.

In the comprehensive-procedure canary solution of ASM Pro, p1 and p3 requests are associated by traceId, specifically by the trace header x-request-id in the Sidecar. The Sidecar maintains a mapping table that records the correspondence between traceId and tags. When Sidecar receives the p1 request, it stores the traceId and tags in the table. When the Sidecar receives the p3 request, it queries the tag corresponding to the traceId from the mapping table and adds the tag to the p4 request. The following figure shows this implementation principle:

4

In other words, the comprehensive-procedure canary feature of ASM Pro requires applications to use distributed tracing analysis technologies. If the application does not use distributed tracing analysis technologies, transformation is involved for the application to use the canary feature. Java applications can use Java Agent to realize the pass-through of traceIDs between inbound and outbound traffic without transformation through Aspect Oriented Programming (AOP).

Traffic Tagging

ASM Pro introduces the new TrafficLabel CRD to define where the Sidecar obtains the traffic tag that needs to be passed through. The following sample YAML file defines the source of the traffic tag and specifies the requirement to store the tag in OpenTracing (specifically, the x-trace header). In the example, the traffic tag is named trafficLabel, and the values are obtained from $getContext(x-request-id) to $(localLabel) in the local environment.

apiVersion: istio.alibabacloud.com/v1beta1
kind: TrafficLabel
metadata:
  name: default
spec:
  rules:
  - labels:
      - name: trafficLabel
        valueFrom:
        - $getContext(x-request-id) // Aliyun arms, if used, corresponds to x-b3-traceid.
        -$(localLabel)
    attachTo:
    - opentracing
    # indicates the effective agreements. Blank indicates that all agreements are ineffective, and an asterisk (*) indicates that all agreements are effective.
protocols: "*"

The CR definition consists of two parts, namely tag acquisition and storage.

  • Acquisition Logic: First, the traffic tag is obtained based on the fields defined in the protocol context or header. If the tag is not found, the traffic tag is obtained based on the traceId from the map that is locally recorded by the Sidecar. The map table stores the mapping between traffic tags and traceIds. If the mapping is found in the map table, the traffic is tagged with the corresponding traffic tag. If the mapping is not found, the traffic tag is set to the localLabel of the local deployment environment. The localLabel corresponds to the associated label named ASM_TRAFFIC_TAG of the local deployment.

The label name of the local deployment environment is ASM_TRAFFIC_TAG. You can use the CI/CD system for actual deployment.

  • Storage Logic: The attachTo parameter specifies the field to be used to store the traffic tag in the protocol context. For example, the Header field corresponds to HTTP and the rpc context corresponds to Dubbo. You can configure the fields to which you want to store the traffic tag.

After TrafficLabel is defined, we know how to tag traffic and pass tags. However, this is not enough for the comprehensive-procedure canary feature. We also need to route based on trafficLabel, which is tag routing. Meanwhile, we need logic, such as routing fallback, so degradation can be realized if the destination of the route does not exist.

Routing by Traffic Tag

The implementation of this feature extends the VirtualService and DestinationRule of Istio.

Define Subset in DestinationRule

The custom group subset corresponds to the value of the trafficLabel.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: myapp
spec:
  host: myapp/*
  subsets:
  -name: myproject # Project Environment
    labels:
      env: abc
  -name: isolation # Isolated Environment
    labels:
      env: xxx # Machine Group
  -name: testing-trunk # Trunk Environment
    labels:
      env: yyy
  -name: testing # Daily Environment
    labels:
      env: zzz
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: myapp
spec:
  hosts:
        -myapp/*
  ports:
  - number: 12200
    name: http
    protocol: HTTP
    endpoints:
      - address: 0.0.0.0
        labels:
            env: abc
      - address: 1.1.1.1
        labels:
            env: xxx
      - address: 2.2.2.2
        labels:
            env: zzz
      - address: 3.3.3.3
        labels:
            env: yyy

You can use one of the following methods to specify Subset:

  • Use labels to match the endpoints that have specific labels in the application
  • Use ServiceEntry to specify the IP address that belongs to the specific subset. Note: This method is different from using labels in the label designation logic. The IP address can be directly specified by configuration instead of being obtained from the Kubernetes registry or other registries. This method applies to the Mock environment, in which nodes are not registered with the service registry.

Subset-Based in VirtualService

1. Global Default Configuration

  • You can specify multiple destinations in sequence in the route section. Traffic is distributed among multiple destinations based on the ratio of the weight value.
  • You can specify a fallback policy for each destination to determine which scenarios fallback should be executed for the case identifier. Valid values: noinstances (no service resources) and noavailabled (service resources exist but unavailable). The target parameter specifies the environment that you want for fallback. If you do not specify the parameter, the fallback is executed in the destination environment.
  • According to the routing logic, we transform VirtualService to support the placeholder $trafficLabel for subset. This placeholder indicates that the environment you want is obtained from the traffic label of the request, which corresponds to the definition in TrafficLabel CR.

The global default mode corresponds to a lane, indicating a single closed environment. Environment-level fallback policies are also specified. The custom group subset corresponds to the value of the trafficLabel.

The following sample code provides an example of the configuration:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: default-route
spec:
  hosts:                     # take effect for all applications
  - */*
  http:
  - name: default-route
    route:
    - destination:
        subset: $trafficLabel
      weight: 100
      fallback:
        case: noinstances
        target: testing-trunk
    - destination:
            host: */*
        subset: testing-trunk    # Trunk Environment
      weight: 0
      fallback:
        case: noavailabled
        target: testing
    - destination:
        subset: testing #          Daily Environment
      weight: 0
      fallback:
        case: noavailabled
        target: mock
    - destination:
            host: */*
        subset: mock             # Mock Center
       weight: 0

2. Customization of Personal Development Environment

  • Traffic is first tagged to the daily environment. If the daily environment does not have service resources, traffic is tagged to the trunk environment.
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: projectx-route
spec:
  hosts:                   # only effective for myapp
  -myapp/*
  http:
  - name: dev-x-route
    match:
      trafficLabel:
      -exact: dev-x       # dev Environment: x
    route:
    - destination:
            host: myapp/*
        subset: testing          # Daily Environment
      weight: 100
      fallback:
        case: noinstances
        target: testing-trunk
    - destination:
            host: myapp/*
        subset: testing-trunk    # Trunk Environment
      weight: 0

3. Support Weight Configuration

80% of traffic tagged to the trunk environment is allocated to the trunk environment, and 20% is allocated to the daily environment when the local environment is dev-x. If no service resources are available in the trunk environment, traffic is allocated to the daily environment.

sourceLabels is the label corresponding to the local workload.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: dev-x-route
spec:
  hosts:                   # effective for which applications (multi-application configuration is not supported)
  -myapp/*
  http:
  - name: dev-x-route
    match:
      trafficLabel:
      -exact: testing-trunk # Trunk Environment Label
      sourceLabels:
      -exact: dev-x  # indicates that traffic comes from a project environment
    route:
    - destination:
            host: myapp/*
        subset: testing-trunk# 80% of the traffic is allocated to the trunk environment
      weight: 80
      fallback:
        case: noavailabled
        target: testing
    -destination:
           host: myapp/*
       subset: testing       # 20% of the traffic is allocated to the daily environment
      weight: 20

Routing by (Environment) Tag

This solution relies on the relevant identifier when the service is deployed. In the example, the corresponding label is ASM_TRAFFIC_TAG: xxx. A common identifier is the environment. The identifier is the relevant meta-information of the service deployment. This depends on the connection of the upstream deployment system: CI/CD. The following figure shows the general process.

  • In a Kubernetes scenario, the corresponding environment/group label can be automatically added during service deployment. Kubernetes itself is used as the metadata management center.
  • In non-Kubernetes scenarios, you can implement the solution through the service registry or metadata server that has been integrated into microservices.

5

Note: ASM Pro has exclusively developed ServiceDiretory components (please see the featured architecture diagram of ASM Pro services) to connect between multiple registries and dynamically acquire deployment meta-information.

More Application Scenarios

The following figure shows a typical multi-set governance feature for the development environment based on traffic tagging and tag routing. Each developer only needs to deploy services that have version updates in the corresponding Dev X environment. If you need to coordinate with other developers, you can forward the service fallback request to the required development environment by configuring fallback. In the following example, a fallback request is forwarded from B in the Dev Y environment to C in the Dev X environment.

Similarly, you can equate the Dev X environment with the online canary version environment. This can help solve the problems of comprehensive-procedure canary release in the online environment.

6

Summary

The traffic tagging and tag routing capabilities described in this article are general solutions. They can help solve problems related to testing environment governance and online comprehensive-procedure canary release. You can make the solution independent from development languages using service mesh technologies. The solution also applies to different 7-layer protocols. Currently, HTTP/gRpc and Dubbo protocols are supported.

Other service providers also offer solutions to realize comprehensive-procedure canary. ASM Pro has the following advantages:

  • Supports multiple languages and protocols
  • Supports unified configuration template TrafficLabel to facilitate simple and flexible configuration and multi-level configurations (globally, for namespaces and pods)
  • Supports routing fallback to degrade a service

The traffic tagging and tag routing capabilities can also be used in the following scenarios:

  • The Performance Stress Test before a Big Promotion: A common method to isolate stress testing data from formal online data in online stress testing scenarios is to use shadows for message queues, caches, and databases. This requires the technology of traffic tagging to distinguish whether a request is testing traffic or production traffic by tags. However, this requires the Sidecar to support middleware such as Redis and RocketMQ.
  • Unitized Routing: In a typical unitized routing scenario, you are required to obtain the corresponding unit based on the metadata of the request traffic through configuration, such as uid. In this scenario, you can extend the TrafficLabel definition to obtain a function of unit tag, tag the traffic with the unit tag, and then route the traffic to the corresponding service unit.

Related Links

1) Alibaba Cloud ASM Console

2) ASM

0 0 0
Share on

You may also like

Comments