当前位置：首页 > news >正文

构建高效且可伸缩的结果缓存

news 来源：原创 2024/5/4 3:30:16

2019独角兽企业重金招聘Python工程师标准>>>

构建高效且可伸缩的结果缓存

像许多“重复发明的轮子”一样，缓存看上去非常简单。然而，简单的缓存可能将性能瓶颈变成可伸缩瓶颈，即使缓存是用于提升单线程的性能。首先看以下代码实现，

Computable.java

package com.usoft;

/**
 * @author: Lenovo(2015-05-27 13:42)
 */
public interface Computable<A, V> {
    V compute(A arg) throws InterruptedException;
}

Memoizer2.java

package com.usoft;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

/**
 * @author: Lenovo(2015-05-27 13:42)
 */
public class Memoizer2<A, V> implements Computable<A, V> {

    private final Map<A, V> cache = new ConcurrentHashMap<>();
    private final Computable<A, V> c;

    public Memoizer2(Computable<A, V> c) {
        this.c = c;
    }

    @Override
    public V compute(A arg) throws InterruptedException {
        V result = cache.get(arg);
        if (result == null) {
            result = c.compute(arg);
            cache.put(arg, result);
        }
        return result;
    }
}

cache 的类型是 ConcurrentHashMap，并发的map，可以满足多线程并发的访问map，具有很好的并发性，但它仍然存在不足——当两个线程同时调用compute方法时，可能会导致计算得到相同的值。对于通用的缓存机制来说，这种情况将更为糟糕。尤其是对于提供单词初始化的对象缓存来说，这个漏洞会带来安全风险。

Memoizer2的问题在于，如果某个线程启动了一个开销很大的计算，而其他线程并不知道这个计算正在进行，那么可能会重复这个计算。我们希望通过某种方法来表达“线程X正在计算f(27) ”这种情况，这样当另一个线程查找 f(27) 时，它能够知道最高效的方法是等待 X 计算结束，然后再去缓存查找 f(27) 的结果是多少。

我们已经知道有一个类能基本实现这个功能： FutureTask 。 FutureTask 表示一个计算的过程，这个过程可能已经计算完成，也可能正在进行。如果有结果可用，那么FutureTask.get 将立即返回结果，否则它一直阻塞，直到结果计算出来再将其返回。

根据FutureTask的思路，我们重新将其实现，如下代码，

package com.usoft;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;

/**
 * @author: Lenovo(2015-05-27 13:58)
 */
public class Memoizer3<A, V> implements Computable<A, V> {

    private final Map<A, Future<V>> cache = new ConcurrentHashMap<>();
    private final Computable<A, V> c;

    public Memoizer3(Computable<A, V> c) {
        this.c = c;
    }

    @Override
    public V compute(final A arg) throws InterruptedException {
        Future<V> f = cache.get(arg);
        if (f == null) {
            FutureTask<V> ft = new FutureTask<>(() -> c.compute(arg)); //使用java8的lambda的表达式
            f = ft;
            cache.put(arg, ft);
            ft.run(); //调用这个计算过程compute
        }

        try {
            return f.get();
        } catch (ExecutionException e) {
            e.printStackTrace();
            throw new InterruptedException(e.getMessage());
        }
    }
}

Memoizer3的实现几乎是完美的：它表现出了非常好的并发性（源于ConcurrentHashMap的并发性），若结果已经计算出来，那么将立即返回。如果其他线程正在计算结果，那么

新到的线程将一直等待这个结果计算出来。他只有一个缺陷，即仍然存在两个线程计算出相同的值。这个漏洞发生的概率要远小于Memoizer2 中发生的概率，但由于compute方法中的if代码块仍然是非原子的 “先检查后执行”操作，因此两个线程仍有可能在同一时间内调用 compute 方法来计算系统的值。即二者都没在缓存中找到期望的值，因此都开始计算。

Memoizer3 中存在这个问题的根本原因是，复合操作（若没有则添加）是在底层的Map对象上执行的，而这个对象无法通过加锁来确保原子性（加锁限制了并发）。

那么如何解决？使用 ConcurrentHashMap中的原子方法 putIfAbsent。如下代码实现，

package com.usoft;

import java.util.Map;
import java.util.concurrent.CancellationException;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;

/**
 * @author: Lenovo(2015-05-27 14:31)
 */
public class Memoizer<A, V> implements Computable<A, V> {

    private final Map<A, Future<V>> cache = new ConcurrentHashMap<>();
    private final Computable<A, V> c;

    public Memoizer(Computable<A, V> c) {
        this.c = c;
    }

    @Override
    public V compute(final A arg) throws InterruptedException {
        while (true) {
            Future<V> f = cache.get(arg);
            if (f == null) {
                FutureTask<V> ft = new FutureTask<>(() -> c.compute(arg)); //使用java8的lambda的表达式
                f = cache.putIfAbsent(arg, ft);
                if (f == null) {
                    f = ft;
                    ft.run();
                }
                ft.run(); //调用这个计算过程compute
            }
            
            try {
                return f.get();
            } catch (CancellationException e) {
                cache.remove(arg, f); //当任务删除时，重新计算，通过while循环
            } catch (ExecutionException e) {
                e.printStackTrace();
                throw new InterruptedException(e.getMessage());
            }
        }
    }
}

当缓存的是 Future 而不是值时，将导致缓存污染的问题：如果某个计算取消或者失败，那么在计算这个结果时将指明计算过程被取消或者失败。为了避免这种情况，如果Memoizer发现计算被取消，那么将把Future从缓存中移除。如果检测到RuntimeException，那么也会移除Future，这样将来的计算才可能成功。

虽然以上 Memoizer 在并发性和安全性上已经很完美了，但不足以是一个完整的缓存方案，他还有以下几个缺点：没有解决缓存过期问题；没有解决缓存清理的问题，即移除旧的计算结果以便为新的计算结果腾出空间，从而使缓存不会消耗过多的内存导致OOM。

针对这两个问题，可以定义一个FutureTask的子类，并指定一个过期时间，通过一个线程定期的扫描过期的缓存并清除掉。如果更完美一些，可以使用SoftReference软引用优化缓存过多导致OOM的问题。

以下是代码实现，SoftHashMap.java

package com.usoft;

import java.lang.ref.ReferenceQueue;
import java.lang.ref.SoftReference;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Queue;
import java.util.Set;
import java.util.WeakHashMap;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.locks.ReentrantLock;


/**
 * A <code><em>Soft</em>HashMap</code> is a memory-constrained map that stores its <em>values</em> in
 * {@link SoftReference SoftReference}s.  (Contrast this with the JDK's
 * {@link WeakHashMap WeakHashMap}, which uses weak references for its <em>keys</em>, which is of little value if you
 * want the cache to auto-resize itself based on memory constraints).
 * <p>
 * Having the values wrapped by soft references allows the cache to automatically reduce its size based on memory
 * limitations and garbage collection.  This ensures that the cache will not cause memory leaks by holding strong
 * references to all of its values.
 * <p>
 * This class is a generics-enabled Map based on initial ideas from Heinz Kabutz's and Sydney Redelinghuys's
 * <a href="http://www.javaspecialists.eu/archive/Issue015.html">publicly posted version (with their approval)</a>, with
 * continued modifications.
 * <p>
 * This implementation is thread-safe and usable in concurrent environments.
 *
 * @since 1.0
 */
public class SoftHashMap<K, V> implements Map<K, V> {

    /**
     * The default value of the RETENTION_SIZE attribute, equal to 100.
     */
    private static final int DEFAULT_RETENTION_SIZE = 100;

    /**
     * The internal HashMap that will hold the SoftReference.
     */
    private final Map<K, SoftValue<V, K>> map;

    /**
     * The number of strong references to hold internally, that is, the number of instances to prevent
     * from being garbage collected automatically (unlike other soft references).
     */
    private final int RETENTION_SIZE;

    /**
     * The FIFO list of strong references (not to be garbage collected), order of last access.
     */
    private final Queue<V> strongReferences; //guarded by 'strongReferencesLock'
    private final ReentrantLock strongReferencesLock;

    /**
     * Reference queue for cleared SoftReference objects.
     */
    private final ReferenceQueue<? super V> queue;

    /**
     * Creates a new SoftHashMap with a default retention size size of
     * {@link #DEFAULT_RETENTION_SIZE DEFAULT_RETENTION_SIZE} (100 entries).
     *
     * @see #SoftHashMap(int)
     */
    public SoftHashMap() {
        this(DEFAULT_RETENTION_SIZE);
    }

    /**
     * Creates a new SoftHashMap with the specified retention size.
     * <p>
     * The retention size (n) is the total number of most recent entries in the map that will be strongly referenced
     * (ie 'retained') to prevent them from being eagerly garbage collected.  That is, the point of a SoftHashMap is to
     * allow the garbage collector to remove as many entries from this map as it desires, but there will always be (n)
     * elements retained after a GC due to the strong references.
     * <p>
     * Note that in a highly concurrent environments the exact total number of strong references may differ slightly
     * than the actual <code>retentionSize</code> value.  This number is intended to be a best-effort retention low
     * water mark.
     *
     * @param retentionSize the total number of most recent entries in the map that will be strongly referenced
     *                      (retained), preventing them from being eagerly garbage collected by the JVM.
     */
    @SuppressWarnings({"unchecked"})
    public SoftHashMap(int retentionSize) {
        super();
        RETENTION_SIZE = Math.max(0, retentionSize);
        queue = new ReferenceQueue<V>();
        strongReferencesLock = new ReentrantLock();
        map = new ConcurrentHashMap<K, SoftValue<V, K>>();
        strongReferences = new ConcurrentLinkedQueue<V>();
    }

    /**
     * Creates a {@code SoftHashMap} backed by the specified {@code source}, with a default retention
     * size of {@link #DEFAULT_RETENTION_SIZE DEFAULT_RETENTION_SIZE} (100 entries).
     *
     * @param source the backing map to populate this {@code SoftHashMap}
     * @see #SoftHashMap(Map, int)
     */
    public SoftHashMap(Map<K, V> source) {
        this(DEFAULT_RETENTION_SIZE);
        putAll(source);
    }

    /**
     * Creates a {@code SoftHashMap} backed by the specified {@code source}, with the specified retention size.
     * <p>
     * The retention size (n) is the total number of most recent entries in the map that will be strongly referenced
     * (ie 'retained') to prevent them from being eagerly garbage collected.  That is, the point of a SoftHashMap is to
     * allow the garbage collector to remove as many entries from this map as it desires, but there will always be (n)
     * elements retained after a GC due to the strong references.
     * <p>
     * Note that in a highly concurrent environments the exact total number of strong references may differ slightly
     * than the actual <code>retentionSize</code> value.  This number is intended to be a best-effort retention low
     * water mark.
     *
     * @param source        the backing map to populate this {@code SoftHashMap}
     * @param retentionSize the total number of most recent entries in the map that will be strongly referenced
     *                      (retained), preventing them from being eagerly garbage collected by the JVM.
     */
    public SoftHashMap(Map<K, V> source, int retentionSize) {
        this(retentionSize);
        putAll(source);
    }

    public V get(Object key) {
        processQueue();

        V result = null;
        SoftValue<V, K> value = map.get(key);

        if (value != null) {
            //unwrap the 'real' value from the SoftReference
            result = value.get();
            if (result == null) {
                //The wrapped value was garbage collected, so remove this entry from the backing map:
                //noinspection SuspiciousMethodCalls
                map.remove(key);
            } else {
                //Add this value to the beginning of the strong reference queue (FIFO).
                addToStrongReferences(result);
            }
        }
        return result;
    }

    private void addToStrongReferences(V result) {
        strongReferencesLock.lock();
        try {
            strongReferences.add(result);
            trimStrongReferencesIfNecessary();
        } finally {
            strongReferencesLock.unlock();
        }

    }

    //Guarded by the strongReferencesLock in the addToStrongReferences method

    private void trimStrongReferencesIfNecessary() {
        //trim the strong ref queue if necessary:
        while (strongReferences.size() > RETENTION_SIZE) {
            strongReferences.poll();
        }
    }

    /**
     * Traverses the ReferenceQueue and removes garbage-collected SoftValue objects from the backing map
     * by looking them up using the SoftValue.key data member.
     */
    private void processQueue() {
        SoftValue sv;
        while ((sv = (SoftValue) queue.poll()) != null) {
            //noinspection SuspiciousMethodCalls
            map.remove(sv.key); // we can access private data!
        }
    }

    public boolean isEmpty() {
        processQueue();
        return map.isEmpty();
    }

    public boolean containsKey(Object key) {
        processQueue();
        return map.containsKey(key);
    }

    public boolean containsValue(Object value) {
        processQueue();
        Collection values = values();
        return values != null && values.contains(value);
    }

    public void putAll(Map<? extends K, ? extends V> m) {
        if (m == null || m.isEmpty()) {
            processQueue();
            return;
        }
        for (Entry<? extends K, ? extends V> entry : m.entrySet()) {
            put(entry.getKey(), entry.getValue());
        }
    }

    public Set<K> keySet() {
        processQueue();
        return map.keySet();
    }

    public Collection<V> values() {
        processQueue();
        Collection<K> keys = map.keySet();
        if (keys.isEmpty()) {
            //noinspection unchecked
            return Collections.EMPTY_SET;
        }
        Collection<V> values = new ArrayList<V>(keys.size());
        for (K key : keys) {
            V v = get(key);
            if (v != null) {
                values.add(v);
            }
        }
        return values;
    }

    /**
     * Creates a new entry, but wraps the value in a SoftValue instance to enable auto garbage collection.
     */
    public V put(K key, V value) {
        processQueue(); // throw out garbage collected values first
        SoftValue<V, K> sv = new SoftValue<V, K>(value, key, queue);
        SoftValue<V, K> previous = map.put(key, sv);
        addToStrongReferences(value);
        return previous != null ? previous.get() : null;
    }


    @Override
    public V putIfAbsent(K key, V value) {
        processQueue(); // throw out garbage collected values first
        SoftValue<V, K> sv = new SoftValue<V, K>(value, key, queue);
        SoftValue<V, K> previous = map.putIfAbsent(key, sv);
        addToStrongReferences(value);
        return previous != null ? previous.get() : null;
    }

    public V remove(Object key) {
        processQueue(); // throw out garbage collected values first
        SoftValue<V, K> raw = map.remove(key);
        return raw != null ? raw.get() : null;
    }

    public void clear() {
        strongReferencesLock.lock();
        try {
            strongReferences.clear();
        } finally {
            strongReferencesLock.unlock();
        }
        processQueue(); // throw out garbage collected values
        map.clear();
    }

    public int size() {
        processQueue(); // throw out garbage collected values first
        return map.size();
    }

    public Set<Entry<K, V>> entrySet() {
        processQueue(); // throw out garbage collected values first
        Collection<K> keys = map.keySet();
        if (keys.isEmpty()) {
            //noinspection unchecked
            return Collections.EMPTY_SET;
        }

        Map<K, V> kvPairs = new HashMap<K, V>(keys.size());
        for (K key : keys) {
            V v = get(key);
            if (v != null) {
                kvPairs.put(key, v);
            }
        }
        return kvPairs.entrySet();
    }

    /**
     * We define our own subclass of SoftReference which contains
     * not only the value but also the key to make it easier to find
     * the entry in the HashMap after it's been garbage collected.
     */
    private static class SoftValue<V, K> extends SoftReference<V> {

        private final K key;

        /**
         * Constructs a new instance, wrapping the value, key, and queue, as
         * required by the superclass.
         *
         * @param value the map value
         * @param key   the map key
         * @param queue the soft reference queue to poll to determine if the entry had been reaped by the GC.
         */
        private SoftValue(V value, K key, ReferenceQueue<? super V> queue) {
            super(value, queue);
            this.key = key;
        }

    }
}

Memoizer4.java

package com.usoft;

import java.util.Map;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.CancellationException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.TimeUnit;

/**
 * @author: Lenovo(2015-05-27 14:31)
 */
public class Memoizer4<A, V> implements Computable<A, V> {

    /**
     * 初始化参数默认值
     * 方便测试，过期 40 秒
     */
    private static final int DEFAULT_OVER_DUE_TIME = 40 * 1000; //millisecond
    /**
     * 方便测试，周期 30 秒
     */
    private static final int DEFAULT_PERIOD = 60 * 1 / 2; //second
    /**
     * 方便测试 1 秒延迟
     */
    private static final int DEFAULT_INITIAL_DELAY = 1; //second
    /**
     * SoftHashMap内部使用ConcurrentHashMap实现，也能够保证并发性
     */
    private final Map<A, OverdueFutureTask<V>> cache = new SoftHashMap<>();
    private final Computable<A, V> c;
    private int initialDelay;
    private int period;
    private ScheduledExecutorService scheduledExecutorService;

    /**
     * 使用默认的构造方法
     *
     * @param c
     */
    public Memoizer4(Computable<A, V> c) {
        this(DEFAULT_INITIAL_DELAY, DEFAULT_PERIOD, c);
    }

    /**
     * 自定义构造函数的参数
     *
     * @param initialDelay
     * @param period
     * @param c
     */
    public Memoizer4(int initialDelay, int period, Computable<A, V> c) {
        this.initialDelay = initialDelay;
        this.period = period;
        this.c = c;
        scheduledExecutorService = Executors.newSingleThreadScheduledExecutor
                (new OverdueHandlerThreadFactory());
        processOverDue();
    }

    /**
     * 调用该方法得到计算结果
     * 从缓存中或者通过计算
     *
     * @param arg
     * @return
     * @throws InterruptedException
     */
    @Override
    public V compute(final A arg) throws InterruptedException {
        while (true) {
            Future<V> f = cache.get(arg);
            if (f == null) {
                //使用java8的lambda的表达式
                OverdueFutureTask<V> ft = new OverdueFutureTask<>(() -> c.compute(arg));
                ft.setTimeout(DEFAULT_OVER_DUE_TIME + System.currentTimeMillis()); //过期的时间
                ft.setExpired(false);
                f = cache.putIfAbsent(arg, ft);
                if (f == null) {
                    f = ft;
                    ft.run();
                }
                //调用计算过程compute
                ft.run();
            }

            try {
                return f.get();
            } catch (CancellationException e) {
                cache.remove(arg, f);
            } catch (ExecutionException e) {
                e.printStackTrace();
                throw new InterruptedException(e.getMessage());
            }
        }
    }

    /**
     * 处理过期的缓存
     */
    private void processOverDue() {
        // 使用lambda表达式
        this.scheduledExecutorService.scheduleAtFixedRate(() -> {
            System.out.println("overdue handler exec start,the cache size is " + this.cache.size());
            try {
                Set<Map.Entry<A, OverdueFutureTask<V>>> entries = this.cache.entrySet();
                for (Map.Entry<A, OverdueFutureTask<V>> entry : entries) {
                    A key = entry.getKey();
                    OverdueFutureTask<V> overdueFutureTask = entry.getValue();
                    long timeout = overdueFutureTask.getTimeout();
                    long current = System.currentTimeMillis();
                    if (current >= timeout) {
                        System.out.println("overdue item,key is " + key + ",value is " + overdueFutureTask.get());
                        overdueFutureTask.setExpired(true);
                        this.cache.remove(key); //移除过期缓存
                    }
                }
            } catch (InterruptedException e) {
                e.printStackTrace();
            } catch (ExecutionException e) {
                e.printStackTrace();
            } finally {
                //保证周期任务的执行
                System.out.println("overdue handler exec end,the cache size is " + this.cache.size());
            }
        }, initialDelay, period, TimeUnit.SECONDS);
    }

    /**
     * 处理过期缓存的线程工厂
     * 需要定制thread的特征，比如守护线程和线程的优先级
     * 优先处理过期的缓存
     */
    private static class OverdueHandlerThreadFactory implements ThreadFactory {
        @Override
        public Thread newThread(Runnable r) {
            System.out.println("newThread>>>>>>>>>>");
            ThreadGroup tg = Thread.currentThread().getThreadGroup();
            for (ThreadGroup tgn = tg; tgn != null; tg = tgn, tgn = tg.getParent()) ;
            Thread t = new Thread(tg, r, "Overdue Handler");
            if (!t.isDaemon()) //设置守护线程
                t.setDaemon(true);
            t.setPriority(Thread.MAX_PRIORITY); //设置线程的优先级
            return t;
        }
    }

    /**
     * 可以判断过期的FutureTask
     *
     * @author: Lenovo(2015-05-27 15:30)
     */
    public static class OverdueFutureTask<V> extends FutureTask<V> {
        /**
         * The time the task is enabled to execute in second
         */
        private long timeout;

        private boolean expired; //是否终止

        public OverdueFutureTask(Callable<V> callable) {
            super(callable);
        }

        public long getTimeout() {
            return timeout;
        }

        public void setTimeout(long timeout) {
            this.timeout = timeout;
        }

        public boolean isExpired() {
            return expired;
        }

        public void setExpired(boolean expired) {
            this.expired = expired;
        }
    }

}

测试方法，

package com.usoft;

/**
 * @author: Lenovo(2015-05-28 16:16)
 */
public class Memoizer4Test {

    private static final Memoizer4<Integer, Integer> memory =
            new Memoizer4<>(new ComparableImpl());

    public static void main(String args[]) throws InterruptedException {
        for (int i = 0; i < 10; i++) {
            final int finalI = i;
            Runnable r = () -> {
                try {
                    memory.compute(finalI);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            };
            new Thread(r).start();
        }

        Thread.sleep(60 * 1000);

        System.out.println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>");
        for (int i = 20; i < 40; i++) {
            final int finalI = i;
            Runnable r = () -> {
                try {
                    memory.compute(finalI);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            };
            new Thread(r).start();
        }

        Thread.sleep(Long.MAX_VALUE);
    }

    public static class ComparableImpl implements Computable<Integer, Integer> {
        @Override
        public Integer compute(Integer key) throws InterruptedException {
            return key * key; //as value
        }
    }
}

参考资料：《并发编程实战》

==============END==============

转载于:https://my.oschina.net/xinxingegeya/blog/420765