【阅读源码系列】HashMap源码分析(JDK1.7和JDK1.8)
【HashMap源码分析】(19888字)
一、JDK1.7的HashMap
[1] 数据结构
HashMap 是最简单的,一来我们非常熟悉,二来就是它不支持并发操作,所以源码也非常简单。
首先,我们用下面这张图来介绍 HashMap 的结构。
这个仅仅是示意图,因为没有考虑到数组要扩容的情况,具体的后面再说。
大方向上,HashMap 里面是一个数组,然后数组中每个元素是一个单向链表。
上图中,每个绿色的实体是嵌套类 Entry 的实例,Entry 包含四个属性:key, value, hash 值和用于单向链表的 next。
capacity:当前数组容量,始终保持 2^n,可以扩容,扩容后数组大小为当前的 2 倍。
loadFactor:负载因子,默认为 0.75。
threshold:扩容的阈值,等于 capacity * loadFactor
[2] put过程
还是比较简单的,跟着代码走一遍吧。
public V put(K key, V value) {
// 当插入第一个元素的时候,需要先初始化数组大小
if (table == EMPTY_TABLE) {
inflateTable(threshold);
}
// 如果 key 为 null,感兴趣的可以往里看,最终会将这个 entry 放到 table[0] 中
if (key == null)
return putForNullKey(value);
// 1. 求 key 的 hash 值
int hash = hash(key);
// 2. 找到对应的数组下标
int i = indexFor(hash, table.length);
// 3. 遍历一下对应下标处的链表,看是否有重复的 key 已经存在,
// 如果有,直接覆盖,put 方法返回旧值就结束了
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
// 4. 不存在重复的 key,将此 entry 添加到链表中,细节后面说
addEntry(hash, key, value, i);
return null;
}
1.数组初始化
在第一个元素插入 HashMap 的时候做一次数组的初始化,就是先确定初始的数组大小,并计算数组扩容的阈值。
private void inflateTable(int toSize) {
// 保证数组大小一定是 2 的 n 次方。
// 比如这样初始化:new HashMap(20),那么处理成初始数组大小是 32
int capacity = roundUpToPowerOf2(toSize);
// 计算扩容阈值:capacity * loadFactor
threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
// 算是初始化数组吧
table = new Entry[capacity];
initHashSeedAsNeeded(capacity); //ignore
}
这里有一个将数组大小保持为 2 的 n 次方的做法,Java7 和 Java8 的 HashMap 和 ConcurrentHashMap 都有相应的要求,只不过实现的代码稍微有些不同,后面再看到的时候就知道了。
2.计算具体数组位置
这个简单,我们自己也能 YY 一个:使用 key 的 hash 值对数组长度进行取模就可以了。
static int indexFor(int hash, int length) {
// assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
return hash & (length-1);
}
这个方法很简单,简单说就是取 hash 值的低 n 位。如在数组长度为 32 的时候,其实取的就是 key 的 hash 值的低 5 位,作为它在数组中的下标位置。
3.添加节点到链表中
找到数组下标后,会先进行 key 判重,如果没有重复,就准备将新值放入到链表的表头。
void addEntry(int hash, K key, V value, int bucketIndex) {
// 如果当前 HashMap 大小已经达到了阈值,并且新值要插入的数组位置已经有元素了,那么要扩容
if ((size >= threshold) && (null != table[bucketIndex])) {
// 扩容,后面会介绍一下
resize(2 * table.length);
// 扩容以后,重新计算 hash 值
hash = (null != key) ? hash(key) : 0;
// 重新计算扩容后的新的下标
bucketIndex = indexFor(hash, table.length);
}
// 往下看
createEntry(hash, key, value, bucketIndex);
}
// 这个很简单,其实就是将新值放到链表的表头,然后 size++
void createEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
size++;
}
这个方法的主要逻辑就是先判断是否需要扩容,需要的话先扩容,然后再将这个新的数据插入到扩容后的数组的相应位置处的链表的表头。
[3] 扩容方法
前面我们看到,在插入新值的时候,如果当前的 size 已经达到了阈值,并且要插入的数组位置上已经有元素,那么就会触发扩容,扩容后,数组大小为原来的 2 倍。
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
// 新的数组
Entry[] newTable = new Entry[newCapacity];
// 将原来数组中的值迁移到新的更大的数组中
transfer(newTable, initHashSeedAsNeeded(newCapacity));
table = newTable;
threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
扩容就是用一个新的大数组替换原来的小数组,并将原来数组中的值迁移到新的数组中。
由于是双倍扩容,迁移过程中,会将原来 table[i] 中的链表的所有节点,分拆到新的数组的 newTable[i] 和 newTable[i + oldLength] 位置上。如原来数组长度是 16,那么扩容后,原来 table[0] 处的链表中的所有元素会被分配到新数组中 newTable[0] 和 newTable[16] 这两个位置。代码比较简单,这里就不展开了。
[4] get 过程
相对于 put 过程,get 过程是非常简单的。
- 根据 key 计算 hash 值。
- 找到相应的数组下标:hash & (length - 1)。
- 遍历该数组位置处的链表,直到找到相等(==或equals)的 key。
public V get(Object key) {
// 之前说过,key 为 null 的话,会被放到 table[0],所以只要遍历下 table[0] 处的链表就可以了
if (key == null)
return getForNullKey();
//
Entry<K,V> entry = getEntry(key);
return null == entry ? null : entry.getValue();
}
getEntry(key):
final Entry<K,V> getEntry(Object key) {
if (size == 0) {
return null;
}
int hash = (key == null) ? 0 : hash(key);
// 确定数组下标,然后从头开始遍历链表,直到找到为止
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}
二、JDK1.8的HashMap
[1] 数据结构
Java8 对 HashMap 进行了一些修改,最大的不同就是利用了红黑树,所以其由 数组+链表+红黑树 组成。
根据 Java7 HashMap 的介绍,我们知道,查找的时候,根据 hash 值我们能够快速定位到数组的具体下标,但是之后的话,需要顺着链表一个个比较下去才能找到我们需要的,时间复杂度取决于链表的长度,为 O(n)。
为了降低这部分的开销,在 Java8 中,当链表中的元素达到了 8 个,且链表长度大于64时,会将链表转换为红黑树,在这些位置进行查找的时候可以降低时间复杂度为 O(logN)。
Java7 中使用 Entry 来代表每个 HashMap 中的数据节点,Java8 中使用 Node,基本没有区别,都是 key,value,hash 和 next 这四个属性,不过,Node 只能用于链表的情况,红黑树的情况需要使用 TreeNode。
我们根据数组元素中,第一个节点数据类型是 Node 还是 TreeNode 来判断该位置下是链表还是红黑树的。
[2] 官方文档
/**
* Hash table based implementation of the <tt>Map</tt> interface. This
* implementation provides all of the optional map operations, and permits
* <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt>
* class is roughly equivalent to <tt>Hashtable</tt>, except that it is
* unsynchronized and permits nulls.) This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.
*
* <p>This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets. Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings). Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.
*
* <p>An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>. The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created. The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased. When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.
*
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
*
* <p>If many mappings are to be stored in a <tt>HashMap</tt>
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table. Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.
*
* <p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a hash map concurrently, and at least one of
* the threads modifies the map structurally, it <i>must</i> be
* synchronized externally. (A structural modification is any operation
* that adds or deletes one or more mappings; merely changing the value
* associated with a key that an instance already contains is not a
* structural modification.) This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.
*
* If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
* Map m = Collections.synchronizedMap(new HashMap(...));</pre>
*
* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.
*
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
*
* <p>This class is a member of the
* <a href="{@docRoot}/../technotes/guides/collections/index.html">
* Java Collections Framework</a>.
*
* @param <K> the type of keys maintained by this map
* @param <V> the type of mapped values
*
* @author Doug Lea
* @author Josh Bloch
* @author Arthur van Hoff
* @author Neal Gafter
* @see Object#hashCode()
* @see Collection
* @see Map
* @see TreeMap
* @see Hashtable
* @since 1.2
*/
/*
* Implementation notes.
*
* This map usually acts as a binned (bucketed) hash table, but
* when bins get too large, they are transformed into bins of
* TreeNodes, each structured similarly to those in
* java.util.TreeMap. Most methods try to use normal bins, but
* relay to TreeNode methods when applicable (simply by checking
* instanceof a node). Bins of TreeNodes may be traversed and
* used like any others, but additionally support faster lookup
* when overpopulated. However, since the vast majority of bins in
* normal use are not overpopulated, checking for existence of
* tree bins may be delayed in the course of table methods.
*
* Tree bins (i.e., bins whose elements are all TreeNodes) are
* ordered primarily by hashCode, but in the case of ties, if two
* elements are of the same "class C implements Comparable<C>",
* type then their compareTo method is used for ordering. (We
* conservatively check generic types via reflection to validate
* this -- see method comparableClassFor). The added complexity
* of tree bins is worthwhile in providing worst-case O(log n)
* operations when keys either have distinct hashes or are
* orderable, Thus, performance degrades gracefully under
* accidental or malicious usages in which hashCode() methods
* return values that are poorly distributed, as well as those in
* which many keys share a hashCode, so long as they are also
* Comparable. (If neither of these apply, we may waste about a
* factor of two in time and space compared to taking no
* precautions. But the only known cases stem from poor user
* programming practices that are already so slow that this makes
* little difference.)
*
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
*
* The root of a tree bin is normally its first node. However,
* sometimes (currently only upon Iterator.remove), the root might
* be elsewhere, but can be recovered following parent links
* (method TreeNode.root()).
*
* All applicable internal methods accept a hash code as an
* argument (as normally supplied from a public method), allowing
* them to call each other without recomputing user hashCodes.
* Most internal methods also accept a "tab" argument, that is
* normally the current table, but may be a new or old one when
* resizing or converting.
*
* When bin lists are treeified, split, or untreeified, we keep
* them in the same relative access/traversal order (i.e., field
* Node.next) to better preserve locality, and to slightly
* simplify handling of splits and traversals that invoke
* iterator.remove. When using comparators on insertion, to keep a
* total ordering (or as close as is required here) across
* rebalancings, we compare classes and identityHashCodes as
* tie-breakers.
*
* The use and transitions among plain vs tree modes is
* complicated by the existence of subclass LinkedHashMap. See
* below for hook methods defined to be invoked upon insertion,
* removal and access that allow LinkedHashMap internals to
* otherwise remain independent of these mechanics. (This also
* requires that a map instance be passed to some utility methods
* that may create new nodes.)
*
* The concurrent-programming-like SSA-based coding style helps
* avoid aliasing errors amid all of the twisty pointer operations.
*/
[3] put()
- 调用
putVal
方法添加元素,并计算key的hash值。 - 如果 table 为空或长度为 0 就调用reszie()方法进行初始化。
- 然后使用
(n - 1) & hash
计算应该插入数组的位置,如果插入位置上没有节点,则新建一个节点赋值到数组上。 - 首节点与新节点比较,如果哈希值相等,key也相等,则是覆盖value操作。
- 如果不满足此条件,则判断首节点是否 TreeNode 类型,如果是则交由putTreeVal()方法插入红黑树中。
- 如果存在且是链表,则遍历链表,如果首节点和待插入元素的 hash 和 key 都一样,更新节点的 value。如果不存在的话,就将新节点插在链表的尾部。插入后要判断链表长度是否超过树化的阈值,如果超过,就要将链表树化。
- 最后检查数组中是否插入了新元素,如果超过阀值还要调用reszie()方法进行扩容操作。
if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))?
先用hashCode值作快速判断,hashCode值相同,再通过equals来确认是否相等,因为hash是整数,比较的性能要比equls高很多,hash不同就没必要比较equals了。
if ((tab = table) == null || (n = tab.length) == 0)?
这步不可以交换||前后代码的位置,因为先判断(n = tab.length) == 0可能产生空指针异常,当执行(tab = table) == null 为false时,它一定是有长度的,那为什么还要判断其长度?这是因为前者是arrays数组没有被赋值对象也就是没有实例化…在内存中不存在地址空间,而后者是实例化了…内存有以分配了地址空间…但是长度为0。
int[] n; //只声明了一数组变量;//使用length时,Error:(29, 29) java: 可能尚未初始化变量n int [] num=null;//声明一数组变量,赋值 null,不指向任何对象;使用length,NullPointerException int [] nums= new int[0];//声明并创建一数组对象,长度是0;使用length时,length为0
putTreeVal ?
调用
putTreeVal
方法增加一个树节点,每一次都比较插入节点和当前节点的大小,待插入节点小就往左子树查找,否则往右子树查找,找到空位后执行两个方法:balanceInsert
方法,插入节点并调整平衡、moveRootToFront
方法,由于调整平衡后根节点可能变化,需要重置根节点。Map.Entry?
Map.Entry是Map声明的一个内部接口,此接口为泛型,定义为Entry。它表示Map中的一个实体(一个key-value对)。
参考:https://segmentfault.com/a/1190000012926722?utm_source=tag-newest
// 添加节点
public V put(K key, V value) {
// 添加节点实际是调用 putVal() 方法
return putVal(hash(key), key, value, false, true);
}
//计算hash值
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
//哈希桶数组
transient Node<K,V>[] table;
//存储键值对的Set,存储的类为Map中的内部类
transient Set<Map.Entry<K,V>> entrySet;
//键值对的数量
transient int size;
//HashMap结构修改的次数
transient int modCount;
//扩容的阀值,当键值对的数量超过这个阀值会产生扩容
int threshold;
//负载因子
final float loadFactor;
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab;
Node<K,V> p;
int n, i;
//如果 table 为空代表没有没初始化,数组长度为0则代表数组还没插入节点,此时
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
//如果所在位置,没有发生哈希冲突,在此位置放入一个链式节点
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
//如果哈希桶数组索引处的节点与新加入的节点具有重复的key则直接覆盖该处节点的value值,并e指向p。
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
/*如果原本存储在桶中的节点是树节点,就交由红黑树去解决这个哈希冲突,如果插入过程中发现新节点的key与已有的树中节点冲突则覆盖该处value*/
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
/*
如果存储在桶中的节点是链式节点。就遍历整个链表去寻找链表中是否有和新加入的节点具有重复的key的节点
*/
else {
/*
如果存储在桶中的节点是链式节点。就遍历整个链表去寻找链表中是否有和新加入的节点具有重复的key的节点
*/
for (int binCount = 0; ; ++binCount) {
//如果不存在的话,就将新节点插在链表的尾部。插入后要判断链表长度是否超过树化的阈值,如果超过,就要将链表树化。
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
//如果发现key重复的节点的话,就直接将该原节点的value值更改为新加入节点的value,并结束循环
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
/*
如果e!=null,说明只是替换了原节点中的value值,因此,HashMap长度没有发生改变,因此不需要判断是否扩充,直接 return oldvalue 即可。
*/
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
//走到这步,说明HashMap中添加了新的键值对,检测数组是否需要进行扩容
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
[4] get()
通过 key 的 hash 值找到在 table 数组中的索引处的 Entry,然后返回该 key 对应的 value 即可。
(1)首先会调用hash方法返回 key 的 hash 值,然后判断哈希桶数组是否为空,为空直接返回null。
(2)不为空,通过tab[(n - 1) & hash]定位到哈希桶数组的下标,判断对应下标处哈希桶是否没有存储节点,没有则返回null。
(3)有存储节点就要去对应结构(链表或树)中按顺序寻找相同key值的节点,如果有的话就会返回该节点,否则返回null。
在这里能够根据 key 快速的取到 value 除了和 HashMap 的数据结构密不可分外,还和 Entry 有莫大的关系。HashMap 在存储过程中并没有将 key,value 分开来存储,而是当做一个整体 key-value 来处理的,这个整体就是Entry 对象。同时 value 也只相当于 key 的附属而已。在存储的过程中,系统根据 key 的 HashCode 来决定 Entry 在 table 数组中的存储位置,在取的过程中同样根据 key 的 HashCode 取出相对应的 Entry 对象(value 就包含在里面)。
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
[5] resize()
调用resize函数有三种情况:
- 第一次调用 HashMap 的 put 方法时,会调用 resize 方法对 table 数组进行初始化,无参数构造,默认大小为 16。有参构造,数组的新容量等于旧阈值。
- 当链表长度大于8,并且数组长度小于64时,会进行扩容而非树化。
- put()方法内,如果新占用了数组空间,会判断是否扩容,如果满足扩容条件,将数组新容量扩充至旧数组容量的2倍,并将数组新扩充阈值也扩充至2倍。
扩容的具体操作是:
(1) 创建一个新哈希桶数组,容量为原来的两倍。需要将旧数组中的节点映射至新数组,并重新分配节点位置。
(2) 遍历旧数组,对应每个位置的哈希桶,如果桶中存在节点,则根据节点类型进行处理:
-
如果只有一个节点,则直接把该节点放置到新数组的[hash&(newcap-1)]位置;
-
如果是树形结构,则将树拆分并映射到新数组中;
-
如果是多个节点的链表,则将链表拆分为2个链表,分别放置在新数组的[原下标]处与[原下标+原数组长度]处。如果 hash & n == 0,那么当前这个 node 被连接到 l1 链表;否则连接到 l2 链表
如何理解(e.hash & oldCap) == 0?
https://blog.csdn.net/u010425839/article/details/106620440
(e.hash & oldCap) == 0为true的话,则可以理解为e这个元素的索引位置是不变的。假如e.hash为14,oldCap为16。16是2的次幂,所以16的二进制除了高位为1,其他位都为0,和14做&操作 结果肯定为0。如果 (e.hash & oldCap) == 0为false的话,则e这个元素 的索引位置肯定是发生了变化,变为了旧位置+oldCap。例如e.hash为16,oldCap为16,newCap则为32, e.hash&oldCap==0为false,e.hash&(oldCap-1)的值为0,但是e.hash&(newCap-1)的值为16。 扩容的同时解决了hash冲突,也很快的计算出新的索引位置。
因为e.hash是旧的,所以和oldCap进行与操作,结果要么是0,旧链表,要么是oldCap,新链表。oldCap形式一定是10000…,&是两个都为1,才是1,所以e.hash&oldCap实际上是判断e.hash最高位是否为1,如果不是则不动,如果是1则移动到新链表。
如果e.hash&oldCap为0,那么说明e.hash最高位为0,e.hash&(newCap-1)仍然还是此位置。
如果e.hash&oldCap为1,那么说明e.hash最高位为1,e.hash&(newCap-1)则会是新位置。
链表扩容的时候,会充分分配节点位置,其原则是,在新数组仍然处于e.hash & (newCap - 1)的位置。这里巧妙的利用了hash & n == 0的判断,就是这一行判断旧数组的节点在扩容前后的数组索引位置是否一致。
https://www.jianshu.com/p/45474a025c50
HashMap的扩容机制是怎样的?
HashMap初始容量是16。
HashMap会使用size记录当前数组的占用个数,当size大于扩容阈值threshold时,将会使用resize()方法进行扩容操作,扩容增量是原容量的1倍(在put方法结束后就有这个函数)。
if (++size > threshold)//特别注意,此处的size是指键值对的个数,而不是当前哈希表的长度 resize(); //特别注意,此处的threshold=capacity*loadFactor,阀值=桶长*负载因子
**举个例子:**初始情况下,HashMap的初始容量(capacity)16,加载因子(loadFactor)为0.75,那么当前的扩容阀值(threshold)为16*0.75=12。当前键值对的个数(size)大于扩容阀值(threshold),扩容成原容量的2倍,即从16扩容到32、64、128 … 但是他不会无限制扩容下去,扩充阈值参数为Integer.MAX_VAULE=1 << 30; 。
/**
* 扩容
*/
final Node<K, V>[] resize() {
Node<K, V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
// 如果table节点数组不为空
if (oldCap > 0) {
// 如果当前容量已经到达上限
if (oldCap >= MAXIMUM_CAPACITY) {
// 设置阈值是2的31次方-1
threshold = Integer.MAX_VALUE;
return oldTab;
} else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
// 新数组的容量为旧数组的容量两倍。
newThr = oldThr << 1;
} else if (oldThr > 0) // table节点数组未被初始化,但有阀值,代表new HashMap()时指定了容量
newCap = oldThr; // 新数组的容量就等于旧的阈值
else { // 执行到这里就代表上面的判断都没有执行,newCap没有被赋值,所以设置新数组的容量和阀值为默认大小
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int) (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// 执行到这里就代表执行了上面判断的else if (oldThr > 0) 语段,则为新数组计算阀值
if (newThr == 0) {
float ft = (float) newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY ?
(int) ft : Integer.MAX_VALUE);
}
// 更新阈值
threshold = newThr;
@SuppressWarnings({"rawtypes", "unchecked"})
Node<K, V>[] newTab = (Node<K, V>[]) new Node[newCap]; // 根据上面计算的容量大小和阀值,创建新的数组
table = newTab;
// 将旧的节点数组迁移到新的数组
if (oldTab != null) {
// 遍历旧的节点数组
for (int j = 0; j < oldCap; ++j) {
Node<K, V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null) // 如果该节点没有下一个节点连接,直接将节点放到新的数组中
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode) // 如果该节点是红黑树结构则添加到树节点
((TreeNode<K, V>) e).split(this, newTab, j, oldCap);
else { // 如果该节点是链表结构则循环链表判断是否需要迁移节点到新索引上
// 低位的头和尾
Node<K, V> loHead = null, loTail = null;
// 高位的头和尾
Node<K, V> hiHead = null, hiTail = null;
// 下一个节点
Node<K, V> next;
// 循环链表,大概的思路为:节点的 hash 值与旧的容量与运算等于 0 则代表扩容前和扩容后的索引一样,不用重新计算索引值(JDK1.8后的优化)
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
} else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
// 如果低位链表不为空,证明扩容前和扩容后的索引一样,直接添加到原始下标
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
// 如果高位链表不为空,证明扩容前和扩容后的索引不一样,添加到原始下标+旧的容量
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}