自适应过载保护 PART1

背景

当思考服务可用性时，传统上会考虑 TPS，通常会进行压力测试来确定服务崩溃的 TPS峰值。然后将 TPS 限制设置在此崩溃点以下的某个位置（比如该值的 75%），并通过令牌桶（Rate Limiting Alg）进行强制执行

然而，在大型分布式系统中，由于auto-scaling，这个值很快就会过时，并且服务会因为无法优雅地摆脱过多负载而变得不可响应而崩溃

我们不应该按照 TPS 的方式思考，而应该考虑并发请求，通过应用排队理论来确定服务在排队开始、延迟增加并最终耗尽诸如 CPU、内存、磁盘或网络等硬限制之前可以处理的并发请求数量。这种关系非常好地解释了 Little's Law:

Limit = Average TPS * Average Latency

同时，可以得到另一个版本的Little’s Law，Concurrency=Throughput * Latency

利用这个公式，我们可以进一步去分析计算机系统处理请求时的吞吐量、处理时延、并发处理能力之间的关系

实践

Concurrency limits

Vegas Limit

借鉴自TCP拥塞算法Vegas

简介
1. 观察RTT、接收速率等指标，预判拥塞，并提前调整
2. BaseRTT = min(SampleRTT)
3. ExpectedRate = CWnd / BaseRTT
4. ActualRate: CWnd / RTT
5. Diff = (ExpectedRate - ActualRate) * BaseRTT
6. 如果Diff < alpha 增加CWnd
7. 如果Diff > beta 减少CWnd
8. 否则CWnd - Congestion Window 不变
Netflix 实现

/**
 * Limiter based on TCP Vegas where the limit increases by alpha if the queue_use is small ({@literal <} alpha)
 * and decreases by alpha if the queue_use is large ({@literal >} beta).
 * 
 * Queue size is calculated using the formula, 
 *  queue_use = limit − BWE×RTTnoLoad = limit × (1 − RTTnoLoad/RTTactual)
 *
 * For traditional TCP Vegas alpha is typically 2-3 and beta is typically 4-6.  To allow for better growth and 
 * stability at higher limits we set alpha=Max(3, 10% of the current limit) and beta=Max(6, 20% of the current limit)
 */

private Function<Integer, Integer> alphaFunc = (limit) -> 3 * LOG10.apply(limit.intValue());
private Function<Integer, Integer> betaFunc = (limit) -> 6 * LOG10.apply(limit.intValue());
private Function<Integer, Integer> thresholdFunc = (limit) -> LOG10.apply(limit.intValue());
private Function<Double, Double> increaseFunc = (limit) -> limit + LOG10.apply(limit.intValue());
private Function<Double, Double> decreaseFunc = (limit) -> limit - LOG10.apply(limit.intValue());

final int queueSize = (int) Math.ceil(estimatedLimit * (1 - (double)rtt_noload / rtt));
int alpha = alphaFunc.apply((int)estimatedLimit);
int beta = betaFunc.apply((int)estimatedLimit);
int threshold = this.thresholdFunc.apply((int)estimatedLimit);
// Aggressive increase when no queuing
if (queueSize <= threshold) {
    newLimit = estimatedLimit + beta;
// Increase the limit if queue is still manageable
} else if (queueSize < alpha) {
    newLimit = increaseFunc.apply(estimatedLimit);
// Detecting latency so decrease
} else if (queueSize > beta) {
    newLimit = decreaseFunc.apply(estimatedLimit);
// We're within he sweet spot so nothing to do
} else {
    return (int)estimatedLimit;
}
newLimit = Math.max(1, Math.min(maxLimit, newLimit));
newLimit = (1 - smoothing) * estimatedLimit + smoothing * newLimit;
estimatedLimit = newLimit;
//使用
 VegasLimit.newBuilder()
                .alpha(3)
                .beta(6)
                .smoothing(1.0)
                .initialLimit(10)
                .maxConcurrency(20)
                .build();

estimatedLimit 即为动态调整的CWnd

初始值 estimatedLimit 用户设置10 ，maxLimit 用户设置20,queueSize 初始值estimatedLimit=10

queueSize即 Diff

rtt_noload 即min(SampleRTT)

rtt 即 lastSampleRTT

alpha 3 * LOG10(limit) limit如👇代码设置

betaFunc 6 * LOG10(limit)

smoothing 平滑参数设置，如上1.0,无平滑

缺点
- RTT 是一个宽泛值，包括请求发起-请求处理-ACK用户大致三个阶段。如果仅是ack阶段出现RTT增大，导致queuesize增大，从而降低limit，显得过于谨慎
- 基于minRTT，不够精确反应服务级RTT状况。很可能某个request 处理过快(业务数据不符合要求之类)，即请求无法均匀的反应RTT

Gradient Limit

Gradient算法通过跟踪当前平均往返时延和长期指数平滑的平均往返时延之间的差异来动态地调整并发限制。与Vegas拥塞控制算法不同，Gradient算法使用平均值而不是最小值，以应对由于RPC方法的突发性造成的异常情况

RPC方法可能由于各种因素（例如非均匀的请求处理复杂性以及数据大小的广泛分布）RTT变得波动较大使用最小值可能会导致对不切实际的低基础RTT的偏向，从而导致过度保护，不利于资源利用率

核心算法每个采样窗口（例如1秒）重新计算限制，使用以下公式：

// 计算Gradient，限制在[0.5, 1.0]范围内以过滤异常值

gradient = max(0.5, min(1.0, longtermRtt / currentRtt));

// 通过应用Gradient并允许一些排队来计算新限制

newLimit = gradient * currentLimit + queueSize;

// 使用smoothing（默认0.2）更新限制

newLimit = currentLimit (1-smoothing) + newLimit smoothing

        final double queueSize = this.queueSize.apply((int)this.estimatedLimit);

        this.lastRtt = rtt;
        final double shortRtt = (double)rtt;
        final double longRtt = this.longRtt.add(rtt).doubleValue();

        shortRttSampleListener.addSample(shortRtt);
        longRttSampleListener.addSample(longRtt);
        queueSizeSampleListener.addSample(queueSize);

        // If the long RTT is substantially larger than the short RTT then reduce the long RTT measurement.
        // This can happen when latency returns to normal after a prolonged prior of excessive load.  Reducing the
        // long RTT without waiting for the exponential smoothing helps bring the system back to steady state.
        if (longRtt / shortRtt > 2) {
            this.longRtt.update(current -> current.doubleValue() * 0.95);
        }

        // Don't grow the limit if we are app limited
        if (inflight < estimatedLimit / 2) {
            return (int) estimatedLimit;
        }

        // Rtt could be higher than rtt_noload because of smoothing rtt noload updates
        // so set to 1.0 to indicate no queuing.  Otherwise calculate the slope and don't
        // allow it to be reduced by more than half to avoid aggressive load-shedding due to 
        // outliers.
        final double gradient = Math.max(0.5, Math.min(1.0, tolerance * longRtt / shortRtt));
        double newLimit = estimatedLimit * gradient + queueSize;
        newLimit = estimatedLimit * (1 - smoothing) + newLimit * smoothing;
        newLimit = Math.max(minLimit, Math.min(maxLimit, newLimit));
        estimatedLimit = newLimit;

注意：

如果longRtt过大，超过shortRtt 2倍，适当减少 - currentRtt*0.95
inflight表示当前请求数，少于estimatedLimit一半，则不增加estimatedLimit

Sentinel BBR Limit

借鉴TCP BBR拥塞算法，利用吞吐量和延时控制请求访问

此算法没有完全抛弃QPS，而是加入了动态调整

根据系统能够处理的请求，和允许进来的请求，来做平衡，而不是根据一个间接的指标（系统 load）来做限流。最终我们追求的目标是在系统不被拖垮的情况下，提高系统的吞吐率，而不是 load 一定要到低于某个阈值。如果我们还是按照固有的思维，超过特定的 load 就禁止流量进入，系统 load 恢复就放开流量，这样做的结果是无论我们怎么调参数，调比例，都是按照果来调节因，都无法取得良好的效果

系统保护规则是从应用级别的入口流量进行控制，从单台机器的总体 Load、RT、入口 QPS 和线程数四个维度监控应用数据，让系统尽可能跑在最大吞吐量的同时保证系统整体的稳定性。
系统保护规则是应用整体维度的，而不是资源维度的，并且仅对入口流量生效。入口流量指的是进入应用的流量（EntryType.IN），比如 Web 服务或 Dubbo 服务端接收的请求，都属于入口流量。

系统规则支持以下的阈值类型：

Load（仅对 Linux/Unix-like 机器生效）：当系统 load1 超过阈值，且系统当前的并发线程数超过系统容量时才会触发系统保护。系统容量由系统的 maxQps minRt 计算得出。设定参考值一般是 CPU cores 2.5。 CPU usage（1.5.0+ 版本）：当系统 CPU 使用率超过阈值即触发系统保护（取值范围 0.0-1.0）。 RT：当单台机器上所有入口流量的平均 RT 达到阈值即触发系统保护，单位是毫秒。线程数：当单台机器上所有入口流量的并发线程数达到阈值即触发系统保护。入口 QPS：当单台机器上所有入口流量的 QPS 达到阈值即触发系统保护

// SystemRuleManager

private static boolean checkBbr(int currentThread) {
        if (currentThread > 1 &&
            currentThread > Constants.ENTRY_NODE.maxSuccessQps() * Constants.ENTRY_NODE.minRt() / 1000) {
            return false;
        }
        return true;
}

// StatisticNode
//rollingCounterInSecond 是一个ArrayMetric，详细见下面 SAMPLE_COUNT= 2 INTERVAL=1s
 private transient volatile Metric rollingCounterInSecond = new ArrayMetric(SampleCountProperty.SAMPLE_COUNT,
        IntervalProperty.INTERVAL);
 @Override
    public double maxSuccessQps() {
        return (double) rollingCounterInSecond.maxSuccess() * rollingCounterInSecond.getSampleCount()
                / rollingCounterInSecond.getWindowIntervalInSec();//窗口数量
    }
     @Override
    public double minRt() {
        return rollingCounterInSecond.minRt();
    }

// ArrayMetric
    @Override
    public long maxSuccess() {
        data.currentWindow();
        long success = 0;

        List<MetricBucket> list = data.values();
        for (MetricBucket window : list) {
            if (window.success() > success) {
                success = window.success();
            }
        }
        return Math.max(success, 1);
    }
     @Override
    public long minRt() {
        data.currentWindow();
        long rt = SentinelConfig.statisticMaxRt();
        List<MetricBucket> list = data.values();
        for (MetricBucket window : list) {
            if (window.minRt() < rt) {
                rt = window.minRt();
            }
        }

        return Math.max(1, rt);
    }

在 Sentinel 中估算系统的容量是以 1s时间窗口为度量，用该秒内通过的最大 qps 与最小响应时间的乘积来表示

maxSuccessQps 取当前采样窗口的最大值乘以1s内滑动窗口的个数获得总数
minRt 最小响应时间取自当前采样窗口中的最小响应时间

缺点：

maxSuccessQps并不是一个准确值，可能偏大，也可能偏小 - 像是一句废话
基于slidewindow minRT，存在rt时间分布不均衡的问题，同vegas

参考

TCP BBR

Netflix Concurrency limits

Sentinel