Redis - rehash扩容分析

Redis - @Ted - 2021/07/09 2004

这篇文章主要分享的是 redis中的 rehash机制，结合着redis源码来学习

用过redis的朋友们都知道 redis是一个 KV类型的键值数据库，一个key 对应一个value

比如：

set testkey testvalue

键testkey ：对应值testvalue

这种一个key 对应一个value的结构组织其实是redis的索引模块，redis的索引模块是使用hash表（散列表）实现的

索引模块

一个hash表其实就是一个数组，数组每个元素称为一个hash桶，一个hash表由多个hash桶组成，每个hash桶中保存键值对数据

hash冲突

当redis写入大量数据的时候就会有一个风险点，就是hash冲突表示多个key被分配到了一个hash桶中

如何解决hash冲突？

redis会通过链表的方式来解决冲突，就是将冲突的数据以链表的方式连接起来对应桶的位置

查询的时候通过hash值定位到hash桶的位置后遍历链表进行查询

为什么要 rehash（扩容）？

随着写入的数据越来越多，冲突就会增多，链表也会越来越长，进而导致查询冲突链上的内容会耗时长，效率低

redis为了解决这个问题会对hash表做 rehash操作（hash表扩容），这个过程分三步：

1、创建一个新hash表2，给hash表2分配更大的空间，新hash表的大小至少是旧hash表1已使用节点数的两倍

2、把hash表1 中的数据重新映射并拷贝到hash表2中

3、释放hash表1的空间

渐进式 rehash

到这里完成了hash表的扩容，但是这里有个问题，如果一次性把hash表1中的数据都迁移完，就会造成redis线程阻塞，无法处理其他服务请求，为了避免这个问题，redis使用渐进式 rehash

这里概括一下什么是渐进式rehash，就是每处理一个请求时，从hash表1中的第一个索引位置开始，顺带着将这个索引位置上的所有内容拷贝到hash表2中，等处理下一个请求时，再顺带拷贝hash表1中的下一个索引为的内容

上源码：源码1

/* Performs N steps of incremental rehashing. Returns 1 if there are still
 * keys to move from the old to the new hash table, otherwise 0 is returned.
 *
 * 执行 N 步渐进式 rehash 。
 *
 * 返回 1 表示仍有键需要从 0 号哈希表移动到 1 号哈希表，
 * 返回 0 则表示所有键都已经迁移完毕。
 *
 * Note that a rehashing step consists in moving a bucket (that may have more
 * than one key as we use chaining) from the old to the new hash table.
 *
 * 注意，每步 rehash 都是以一个哈希表索引（桶）作为单位的，
 * 一个桶里可能会有多个节点，
 * 被 rehash 的桶里的所有节点都会被移动到新哈希表。
 *
 * T = O(N)
 */
int dictRehash(dict *d, int n) {

    // 只可以在 rehash 进行中时执行
    if (!dictIsRehashing(d)) return 0;

    // 进行 N 步迁移
    // T = O(N)
    while(n--) {
        dictEntry *de, *nextde;

        /* Check if we already rehashed the whole table... */
        // 如果 0 号哈希表为空，那么表示 rehash 执行完毕
        // T = O(1)
        if (d->ht[0].used == 0) {
            // 释放 0 号哈希表
            zfree(d->ht[0].table);
            // 将原来的 1 号哈希表设置为新的 0 号哈希表
            d->ht[0] = d->ht[1];
            // 重置旧的 1 号哈希表
            _dictReset(&d->ht[1]);
            // 关闭 rehash 标识
            d->rehashidx = -1;
            // 返回 0 ，向调用者表示 rehash 已经完成
            return 0;
        }

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        // 确保 rehashidx 没有越界
        assert(d->ht[0].size > (unsigned)d->rehashidx);

        // 略过数组中为空的索引，找到下一个非空索引
        while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;

        // 指向该索引的链表表头节点
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        // 将链表中的所有节点迁移到新哈希表
        // T = O(1)
        while(de) {
            unsigned int h;

            // 保存下个节点的指针
            nextde = de->next;

            /* Get the index in the new hash table */
            // 计算新哈希表的哈希值，以及节点插入的索引位置
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;

            // 插入节点到新哈希表
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;

            // 更新计数器
            d->ht[0].used--;
            d->ht[1].used++;

            // 继续处理下个节点
            de = nextde;
        }
        // 将刚迁移完的哈希表索引的指针设为空
        d->ht[0].table[d->rehashidx] = NULL;
        // 更新 rehash 索引
        d->rehashidx++;
    }

    return 1;
}

什么时候 rehash？

以下两个条件之一为真时，对字典进行扩展

1）字典已使用节点数和字典大小之间的比率接近 1：1

并且 dict_can_resize 为真

2）已使用节点数和字典大小之间的比率超过 dict_force_resize_ratio

dict_force_resize_ratio在源码里默认为 5

上源码：源码2

// 指示字典是否启用 rehash 的标识
static int dict_can_resize = 1;
// 强制 rehash 的比率
static unsigned int dict_force_resize_ratio = 5;

/*
 * 根据需要，初始化字典（的哈希表），或者对字典（的现有哈希表）进行扩展
 *
 * T = O(N)
 */
static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
    // 渐进式 rehash 已经在进行了，直接返回
    if (dictIsRehashing(d)) return DICT_OK;

    /* If the hash table is empty expand it to the initial size. */
    // 如果字典（的 0 号哈希表）为空，那么创建并返回初始化大小的 0 号哈希表
    // T = O(1)
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    // 一下两个条件之一为真时，对字典进行扩展
    // 1）字典已使用节点数和字典大小之间的比率接近 1：1
    //    并且 dict_can_resize 为真
    // 2）已使用节点数和字典大小之间的比率超过 dict_force_resize_ratio
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        // 新哈希表的大小至少是目前已使用节点数的两倍
        // T = O(N)
        return dictExpand(d, d->ht[0].used*2);
    }

    return DICT_OK;
}

会不会存在hash表3？

有一个小可爱提出了这个问题，如果hash表2满了会不会创建hash表3？这里我们来一起分析一下

首先渐进式rehash是在每次执行查询和更新操作的时候将hash表1中的一个hash桶内的数据同步分配到hash表2中，在源码1中有展示

hash表扩展的规则是扩展后的大小是扩展前的2倍

所以说根据渐进式rehash的规则当hash表2满了的时候都够做同步hash表1的2轮操作了

得出的结论是 rehash只会有两张hash表交替使用

上源码：源码3

/* This function performs just a step of rehashing, and only if there are
 * no safe iterators bound to our hash table. When we have iterators in the
 * middle of a rehashing we can't mess with the two hash tables otherwise
 * some element can be missed or duplicated.
 *
 * 在字典不存在安全迭代器的情况下，对字典进行单步 rehash 。
 *
 * 字典有安全迭代器的情况下不能进行 rehash ，
 * 因为两种不同的迭代和修改操作可能会弄乱字典。
 *
 * This function is called by common lookup or update operations in the
 * dictionary so that the hash table automatically migrates from H1 to H2
 * while it is actively used. 
 *
 * 这个函数被多个通用的查找、更新操作调用，
 * 它可以让字典在被使用的同时进行 rehash 。
 *
 * T = O(1)
 */
static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}

Ted's Blog

Redis - rehash扩容分析

索引模块

hash冲突

如何解决hash冲突？

为什么要 rehash（扩容）？

渐进式 rehash

什么时候 rehash？

会不会存在hash表3？

评论列表

笔名 : 安琪拉的小迷妹

写评论

Hot

Contact ME

友情链接