2019独角兽企业重金招聘Python工程师标准>>>
构建高效且可伸缩的结果缓存
像许多“重复发明的轮子”一样,缓存看上去非常简单。然而,简单的缓存可能将性能瓶颈变成可伸缩瓶颈,即使缓存是用于提升单线程的性能。首先看以下代码实现,
Computable.java
package com.usoft;
/**
* @author: Lenovo(2015-05-27 13:42)
*/
public interface Computable<A, V> {
V compute(A arg) throws InterruptedException;
}
Memoizer2.java
package com.usoft;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
/**
* @author: Lenovo(2015-05-27 13:42)
*/
public class Memoizer2<A, V> implements Computable<A, V> {
private final Map<A, V> cache = new ConcurrentHashMap<>();
private final Computable<A, V> c;
public Memoizer2(Computable<A, V> c) {
this.c = c;
}
@Override
public V compute(A arg) throws InterruptedException {
V result = cache.get(arg);
if (result == null) {
result = c.compute(arg);
cache.put(arg, result);
}
return result;
}
}
cache 的类型是 ConcurrentHashMap,并发的map,可以满足多线程并发的访问map,具有很好的并发性,但它仍然存在不足——当两个线程同时调用compute方法时,可能会导致计算得到相同的值。对于通用的缓存机制来说,这种情况将更为糟糕。尤其是对于提供单词初始化的对象缓存来说,这个漏洞会带来安全风险。
Memoizer2的问题在于 ,如果某个线程启动了一个开销很大的计算,而其他线程并不知道这个计算正在进行,那么可能会重复这个计算。我们希望通过某种方法来表达“线程X正在计算f(27) ”这种情况,这样当另一个线程查找 f(27) 时,它能够知道最高效的方法是等待 X 计算结束,然后再去缓存查找 f(27) 的结果是多少。
我们已经知道有一个类能基本实现这个功能: FutureTask 。 FutureTask 表示一个计算的过程,这个过程可能已经计算完成,也可能正在进行。如果有结果可用 ,那么FutureTask.get 将立即返回结果,否则它一直阻塞,直到结果计算出来再将其返回。
根据FutureTask的思路,我们重新将其实现,如下代码,
package com.usoft;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;
/**
* @author: Lenovo(2015-05-27 13:58)
*/
public class Memoizer3<A, V> implements Computable<A, V> {
private final Map<A, Future<V>> cache = new ConcurrentHashMap<>();
private final Computable<A, V> c;
public Memoizer3(Computable<A, V> c) {
this.c = c;
}
@Override
public V compute(final A arg) throws InterruptedException {
Future<V> f = cache.get(arg);
if (f == null) {
FutureTask<V> ft = new FutureTask<>(() -> c.compute(arg)); //使用java8的lambda的表达式
f = ft;
cache.put(arg, ft);
ft.run(); //调用这个计算过程compute
}
try {
return f.get();
} catch (ExecutionException e) {
e.printStackTrace();
throw new InterruptedException(e.getMessage());
}
}
}
Memoizer3的实现几乎是完美的:它表现出了非常好的并发性(源于ConcurrentHashMap的并发性),若结果已经计算出来,那么将立即返回。如果其他线程正在计算结果,那么
新到的线程将一直等待这个结果计算出来。他只有一个缺陷,即仍然存在两个线程计算出相同的值。这个漏洞发生的概率要远小于Memoizer2 中发生的概率,但由于compute方法中的if代码块仍然是非原子的 “先检查后执行”操作,因此两个线程仍有可能在同一时间内调用 compute 方法来计算系统的值 。即二者都没在缓存中 找到期望的值 ,因此都开始计算。
Memoizer3 中存在这个问题的根本原因是,复合操作(若没有则添加)是在底层的Map对象上执行的,而这个对象无法通过加锁来确保原子性(加锁限制了并发)。
那么如何解决?使用 ConcurrentHashMap中的原子方法 putIfAbsent。如下代码实现,
package com.usoft;
import java.util.Map;
import java.util.concurrent.CancellationException;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;
/**
* @author: Lenovo(2015-05-27 14:31)
*/
public class Memoizer<A, V> implements Computable<A, V> {
private final Map<A, Future<V>> cache = new ConcurrentHashMap<>();
private final Computable<A, V> c;
public Memoizer(Computable<A, V> c) {
this.c = c;
}
@Override
public V compute(final A arg) throws InterruptedException {
while (true) {
Future<V> f = cache.get(arg);
if (f == null) {
FutureTask<V> ft = new FutureTask<>(() -> c.compute(arg)); //使用java8的lambda的表达式
f = cache.putIfAbsent(arg, ft);
if (f == null) {
f = ft;
ft.run();
}
ft.run(); //调用这个计算过程compute
}
try {
return f.get();
} catch (CancellationException e) {
cache.remove(arg, f); //当任务删除时,重新计算,通过while循环
} catch (ExecutionException e) {
e.printStackTrace();
throw new InterruptedException(e.getMessage());
}
}
}
}
当缓存的是 Future 而不是值时,将导致缓存污染的问题:如果某个计算取消或者失败,那么在计算这个结果时将指明计算过程被取消或者失败。为了避免这种情况,如果Memoizer发现计算被取消,那么将把Future从缓存中移除。如果检测到RuntimeException,那么也会移除Future,这样将来的计算才可能成功。
虽然以上 Memoizer 在并发性和安全性上已经很完美了,但不足以是一个完整的缓存方案,他还有以下几个缺点:没有解决缓存过期问题;没有解决缓存清理的问题,即移除旧的计算结果以便为新的计算结果腾出空间,从而使缓存不会消耗过多的内存导致OOM。
针对这两个问题,可以定义一个FutureTask的子类,并指定一个过期时间,通过一个线程定期的扫描过期的缓存并清除掉。如果更完美一些,可以使用SoftReference软引用优化缓存过多导致OOM的问题。
以下是代码实现,SoftHashMap.java
package com.usoft;
import java.lang.ref.ReferenceQueue;
import java.lang.ref.SoftReference;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Queue;
import java.util.Set;
import java.util.WeakHashMap;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.locks.ReentrantLock;
/**
* A <code><em>Soft</em>HashMap</code> is a memory-constrained map that stores its <em>values</em> in
* {@link SoftReference SoftReference}s. (Contrast this with the JDK's
* {@link WeakHashMap WeakHashMap}, which uses weak references for its <em>keys</em>, which is of little value if you
* want the cache to auto-resize itself based on memory constraints).
* <p>
* Having the values wrapped by soft references allows the cache to automatically reduce its size based on memory
* limitations and garbage collection. This ensures that the cache will not cause memory leaks by holding strong
* references to all of its values.
* <p>
* This class is a generics-enabled Map based on initial ideas from Heinz Kabutz's and Sydney Redelinghuys's
* <a href="http://www.javaspecialists.eu/archive/Issue015.html">publicly posted version (with their approval)</a>, with
* continued modifications.
* <p>
* This implementation is thread-safe and usable in concurrent environments.
*
* @since 1.0
*/
public class SoftHashMap<K, V> implements Map<K, V> {
/**
* The default value of the RETENTION_SIZE attribute, equal to 100.
*/
private static final int DEFAULT_RETENTION_SIZE = 100;
/**
* The internal HashMap that will hold the SoftReference.
*/
private final Map<K, SoftValue<V, K>> map;
/**
* The number of strong references to hold internally, that is, the number of instances to prevent
* from being garbage collected automatically (unlike other soft references).
*/
private final int RETENTION_SIZE;
/**
* The FIFO list of strong references (not to be garbage collected), order of last access.
*/
private final Queue<V> strongReferences; //guarded by 'strongReferencesLock'
private final ReentrantLock strongReferencesLock;
/**
* Reference queue for cleared SoftReference objects.
*/
private final ReferenceQueue<? super V> queue;
/**
* Creates a new SoftHashMap with a default retention size size of
* {@link #DEFAULT_RETENTION_SIZE DEFAULT_RETENTION_SIZE} (100 entries).
*
* @see #SoftHashMap(int)
*/
public SoftHashMap() {
this(DEFAULT_RETENTION_SIZE);
}
/**
* Creates a new SoftHashMap with the specified retention size.
* <p>
* The retention size (n) is the total number of most recent entries in the map that will be strongly referenced
* (ie 'retained') to prevent them from being eagerly garbage collected. That is, the point of a SoftHashMap is to
* allow the garbage collector to remove as many entries from this map as it desires, but there will always be (n)
* elements retained after a GC due to the strong references.
* <p>
* Note that in a highly concurrent environments the exact total number of strong references may differ slightly
* than the actual <code>retentionSize</code> value. This number is intended to be a best-effort retention low
* water mark.
*
* @param retentionSize the total number of most recent entries in the map that will be strongly referenced
* (retained), preventing them from being eagerly garbage collected by the JVM.
*/
@SuppressWarnings({"unchecked"})
public SoftHashMap(int retentionSize) {
super();
RETENTION_SIZE = Math.max(0, retentionSize);
queue = new ReferenceQueue<V>();
strongReferencesLock = new ReentrantLock();
map = new ConcurrentHashMap<K, SoftValue<V, K>>();
strongReferences = new ConcurrentLinkedQueue<V>();
}
/**
* Creates a {@code SoftHashMap} backed by the specified {@code source}, with a default retention
* size of {@link #DEFAULT_RETENTION_SIZE DEFAULT_RETENTION_SIZE} (100 entries).
*
* @param source the backing map to populate this {@code SoftHashMap}
* @see #SoftHashMap(Map, int)
*/
public SoftHashMap(Map<K, V> source) {
this(DEFAULT_RETENTION_SIZE);
putAll(source);
}
/**
* Creates a {@code SoftHashMap} backed by the specified {@code source}, with the specified retention size.
* <p>
* The retention size (n) is the total number of most recent entries in the map that will be strongly referenced
* (ie 'retained') to prevent them from being eagerly garbage collected. That is, the point of a SoftHashMap is to
* allow the garbage collector to remove as many entries from this map as it desires, but there will always be (n)
* elements retained after a GC due to the strong references.
* <p>
* Note that in a highly concurrent environments the exact total number of strong references may differ slightly
* than the actual <code>retentionSize</code> value. This number is intended to be a best-effort retention low
* water mark.
*
* @param source the backing map to populate this {@code SoftHashMap}
* @param retentionSize the total number of most recent entries in the map that will be strongly referenced
* (retained), preventing them from being eagerly garbage collected by the JVM.
*/
public SoftHashMap(Map<K, V> source, int retentionSize) {
this(retentionSize);
putAll(source);
}
public V get(Object key) {
processQueue();
V result = null;
SoftValue<V, K> value = map.get(key);
if (value != null) {
//unwrap the 'real' value from the SoftReference
result = value.get();
if (result == null) {
//The wrapped value was garbage collected, so remove this entry from the backing map:
//noinspection SuspiciousMethodCalls
map.remove(key);
} else {
//Add this value to the beginning of the strong reference queue (FIFO).
addToStrongReferences(result);
}
}
return result;
}
private void addToStrongReferences(V result) {
strongReferencesLock.lock();
try {
strongReferences.add(result);
trimStrongReferencesIfNecessary();
} finally {
strongReferencesLock.unlock();
}
}
//Guarded by the strongReferencesLock in the addToStrongReferences method
private void trimStrongReferencesIfNecessary() {
//trim the strong ref queue if necessary:
while (strongReferences.size() > RETENTION_SIZE) {
strongReferences.poll();
}
}
/**
* Traverses the ReferenceQueue and removes garbage-collected SoftValue objects from the backing map
* by looking them up using the SoftValue.key data member.
*/
private void processQueue() {
SoftValue sv;
while ((sv = (SoftValue) queue.poll()) != null) {
//noinspection SuspiciousMethodCalls
map.remove(sv.key); // we can access private data!
}
}
public boolean isEmpty() {
processQueue();
return map.isEmpty();
}
public boolean containsKey(Object key) {
processQueue();
return map.containsKey(key);
}
public boolean containsValue(Object value) {
processQueue();
Collection values = values();
return values != null && values.contains(value);
}
public void putAll(Map<? extends K, ? extends V> m) {
if (m == null || m.isEmpty()) {
processQueue();
return;
}
for (Entry<? extends K, ? extends V> entry : m.entrySet()) {
put(entry.getKey(), entry.getValue());
}
}
public Set<K> keySet() {
processQueue();
return map.keySet();
}
public Collection<V> values() {
processQueue();
Collection<K> keys = map.keySet();
if (keys.isEmpty()) {
//noinspection unchecked
return Collections.EMPTY_SET;
}
Collection<V> values = new ArrayList<V>(keys.size());
for (K key : keys) {
V v = get(key);
if (v != null) {
values.add(v);
}
}
return values;
}
/**
* Creates a new entry, but wraps the value in a SoftValue instance to enable auto garbage collection.
*/
public V put(K key, V value) {
processQueue(); // throw out garbage collected values first
SoftValue<V, K> sv = new SoftValue<V, K>(value, key, queue);
SoftValue<V, K> previous = map.put(key, sv);
addToStrongReferences(value);
return previous != null ? previous.get() : null;
}
@Override
public V putIfAbsent(K key, V value) {
processQueue(); // throw out garbage collected values first
SoftValue<V, K> sv = new SoftValue<V, K>(value, key, queue);
SoftValue<V, K> previous = map.putIfAbsent(key, sv);
addToStrongReferences(value);
return previous != null ? previous.get() : null;
}
public V remove(Object key) {
processQueue(); // throw out garbage collected values first
SoftValue<V, K> raw = map.remove(key);
return raw != null ? raw.get() : null;
}
public void clear() {
strongReferencesLock.lock();
try {
strongReferences.clear();
} finally {
strongReferencesLock.unlock();
}
processQueue(); // throw out garbage collected values
map.clear();
}
public int size() {
processQueue(); // throw out garbage collected values first
return map.size();
}
public Set<Entry<K, V>> entrySet() {
processQueue(); // throw out garbage collected values first
Collection<K> keys = map.keySet();
if (keys.isEmpty()) {
//noinspection unchecked
return Collections.EMPTY_SET;
}
Map<K, V> kvPairs = new HashMap<K, V>(keys.size());
for (K key : keys) {
V v = get(key);
if (v != null) {
kvPairs.put(key, v);
}
}
return kvPairs.entrySet();
}
/**
* We define our own subclass of SoftReference which contains
* not only the value but also the key to make it easier to find
* the entry in the HashMap after it's been garbage collected.
*/
private static class SoftValue<V, K> extends SoftReference<V> {
private final K key;
/**
* Constructs a new instance, wrapping the value, key, and queue, as
* required by the superclass.
*
* @param value the map value
* @param key the map key
* @param queue the soft reference queue to poll to determine if the entry had been reaped by the GC.
*/
private SoftValue(V value, K key, ReferenceQueue<? super V> queue) {
super(value, queue);
this.key = key;
}
}
}
Memoizer4.java
package com.usoft;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.CancellationException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.TimeUnit;
/**
* @author: Lenovo(2015-05-27 14:31)
*/
public class Memoizer4<A, V> implements Computable<A, V> {
/**
* 初始化参数默认值
* 方便测试,过期 40 秒
*/
private static final int DEFAULT_OVER_DUE_TIME = 40 * 1000; //millisecond
/**
* 方便测试,周期 30 秒
*/
private static final int DEFAULT_PERIOD = 60 * 1 / 2; //second
/**
* 方便测试 1 秒延迟
*/
private static final int DEFAULT_INITIAL_DELAY = 1; //second
/**
* SoftHashMap内部使用ConcurrentHashMap实现,也能够保证并发性
*/
private final Map<A, OverdueFutureTask<V>> cache = new SoftHashMap<>();
private final Computable<A, V> c;
private int initialDelay;
private int period;
private ScheduledExecutorService scheduledExecutorService;
/**
* 使用默认的构造方法
*
* @param c
*/
public Memoizer4(Computable<A, V> c) {
this(DEFAULT_INITIAL_DELAY, DEFAULT_PERIOD, c);
}
/**
* 自定义构造函数的参数
*
* @param initialDelay
* @param period
* @param c
*/
public Memoizer4(int initialDelay, int period, Computable<A, V> c) {
this.initialDelay = initialDelay;
this.period = period;
this.c = c;
scheduledExecutorService = Executors.newSingleThreadScheduledExecutor
(new OverdueHandlerThreadFactory());
processOverDue();
}
/**
* 调用该方法得到计算结果
* 从缓存中或者通过计算
*
* @param arg
* @return
* @throws InterruptedException
*/
@Override
public V compute(final A arg) throws InterruptedException {
while (true) {
Future<V> f = cache.get(arg);
if (f == null) {
//使用java8的lambda的表达式
OverdueFutureTask<V> ft = new OverdueFutureTask<>(() -> c.compute(arg));
ft.setTimeout(DEFAULT_OVER_DUE_TIME + System.currentTimeMillis()); //过期的时间
ft.setExpired(false);
f = cache.putIfAbsent(arg, ft);
if (f == null) {
f = ft;
ft.run();
}
//调用计算过程compute
ft.run();
}
try {
return f.get();
} catch (CancellationException e) {
cache.remove(arg, f);
} catch (ExecutionException e) {
e.printStackTrace();
throw new InterruptedException(e.getMessage());
}
}
}
/**
* 处理过期的缓存
*/
private void processOverDue() {
// 使用lambda表达式
this.scheduledExecutorService.scheduleAtFixedRate(() -> {
System.out.println("overdue handler exec start,the cache size is " + this.cache.size());
try {
Set<Map.Entry<A, OverdueFutureTask<V>>> entries = this.cache.entrySet();
for (Map.Entry<A, OverdueFutureTask<V>> entry : entries) {
A key = entry.getKey();
OverdueFutureTask<V> overdueFutureTask = entry.getValue();
long timeout = overdueFutureTask.getTimeout();
long current = System.currentTimeMillis();
if (current >= timeout) {
System.out.println("overdue item,key is " + key + ",value is " + overdueFutureTask.get());
overdueFutureTask.setExpired(true);
this.cache.remove(key); //移除过期缓存
}
}
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
} finally {
//保证周期任务的执行
System.out.println("overdue handler exec end,the cache size is " + this.cache.size());
}
}, initialDelay, period, TimeUnit.SECONDS);
}
/**
* 处理过期缓存的线程工厂
* 需要定制thread的特征,比如守护线程和线程的优先级
* 优先处理过期的缓存
*/
private static class OverdueHandlerThreadFactory implements ThreadFactory {
@Override
public Thread newThread(Runnable r) {
System.out.println("newThread>>>>>>>>>>");
ThreadGroup tg = Thread.currentThread().getThreadGroup();
for (ThreadGroup tgn = tg; tgn != null; tg = tgn, tgn = tg.getParent()) ;
Thread t = new Thread(tg, r, "Overdue Handler");
if (!t.isDaemon()) //设置守护线程
t.setDaemon(true);
t.setPriority(Thread.MAX_PRIORITY); //设置线程的优先级
return t;
}
}
/**
* 可以判断过期的FutureTask
*
* @author: Lenovo(2015-05-27 15:30)
*/
public static class OverdueFutureTask<V> extends FutureTask<V> {
/**
* The time the task is enabled to execute in second
*/
private long timeout;
private boolean expired; //是否终止
public OverdueFutureTask(Callable<V> callable) {
super(callable);
}
public long getTimeout() {
return timeout;
}
public void setTimeout(long timeout) {
this.timeout = timeout;
}
public boolean isExpired() {
return expired;
}
public void setExpired(boolean expired) {
this.expired = expired;
}
}
}
测试方法,
package com.usoft;
/**
* @author: Lenovo(2015-05-28 16:16)
*/
public class Memoizer4Test {
private static final Memoizer4<Integer, Integer> memory =
new Memoizer4<>(new ComparableImpl());
public static void main(String args[]) throws InterruptedException {
for (int i = 0; i < 10; i++) {
final int finalI = i;
Runnable r = () -> {
try {
memory.compute(finalI);
} catch (InterruptedException e) {
e.printStackTrace();
}
};
new Thread(r).start();
}
Thread.sleep(60 * 1000);
System.out.println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>");
for (int i = 20; i < 40; i++) {
final int finalI = i;
Runnable r = () -> {
try {
memory.compute(finalI);
} catch (InterruptedException e) {
e.printStackTrace();
}
};
new Thread(r).start();
}
Thread.sleep(Long.MAX_VALUE);
}
public static class ComparableImpl implements Computable<Integer, Integer> {
@Override
public Integer compute(Integer key) throws InterruptedException {
return key * key; //as value
}
}
}
参考资料:《并发编程实战》
==============END==============