当前位置: 首页 > news >正文

时序分析 45 -- 时序数据转为空间数据 (四) 格拉姆角场 python 实践 (下)

格拉姆角场

python实践(下)

… 接上
下面我们将采用CNN模型来基于格拉姆角场python实践上的结果(时序数据已转化为图像数据)进行金融预测。

import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.layers import *
import datetime as dt
import glob
import os

定义函数来处理数据

# Chunks DataFrames in a way that part of the data points is found in the previous chunk
def chunker(seq: pd.DataFrame, size: int, loops: int) -> Generator:
    """
    :param seq: As DataFrame
    :param size: As Integer
    :param loops: As integer
    :return: Generator with overlapping index DataFrames
    """
    rem = (seq.shape[0] - size)
    rem_split = rem // loops
    for i in range(10):
        yield seq.iloc[(i * rem_split): -(rem - (i * rem_split))]
def ensemble_data(networks_chunks: int, path: str) -> List[pd.DataFrame]:
    """
    :param networks_chunks: As Integer
    :param path: As String
    :return: List of overlapping index DataFrames
    """
    dataframes = []
    for sub_folder in ['LONG', 'SHORT']:
        images = glob.glob(path + '/{}/*.png'.format(sub_folder))  # Get path to images
        dates = [dt.split('/')[-1].split('\\')[-1].split('.')[0].replace('_', '-') for dt in images]
        data_slice = pd.DataFrame({'Images': images, 'Labels': [sub_folder] * len(images), 'Dates': dates})
        data_slice['Dates'] = pd.to_datetime(data_slice['Dates'])
        dataframes.append(data_slice)
    data = pd.concat(dataframes)
    data.sort_values(by='Dates', inplace=True)
    del data['Dates']
    shape = (data.shape[0] // 5) * 4
    loops = networks_chunks
    return list(chunker(data, shape, loops))
#  Ensemble CNN network to train a CNN model on GAF images labeled Long and Short
PATH = "G:\\financial_data"
IMAGES_PATH = os.path.join(PATH, 'TRAIN')
REPO = os.path.join(PATH, 'Models')
PATH_DOC = os.path.join(PATH, 'Documents')
PATH_OUT = os.path.join(PATH, 'Output')
EPOCHS = 5
SPLIT = 0.30
LR = 0.001
TIMESTAMP = dt.datetime.now().strftime("%Y%m%d%H%M%S")

我们采用了集成模型,训练了三个cnn模型并取其平均分数。cnn模型有8层,并由一个密集层进行二分类预测,激化函数采用relu,dropout rate为0.4来标准化各层输入,且包含了批规范化。

cnn_networks = 3
model = []
for j in range(cnn_networks):
    model.append(
        tf.keras.models.Sequential([
            #  First Convolution
            Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(255, 255, 3)),
            BatchNormalization(),
            Conv2D(32, kernel_size=(3, 3), activation='relu'),
            BatchNormalization(),
            Conv2D(32, kernel_size=(3, 3), strides=2, padding='same', activation='relu'),
            BatchNormalization(),
            Dropout(0.4),
            # Second Convolution
            Conv2D(64, kernel_size=(3, 3), activation='relu'),
            BatchNormalization(),
            Conv2D(64, kernel_size=(3, 3), activation='relu'),
            BatchNormalization(),
            Conv2D(64, kernel_size=(3, 3), strides=2, padding='same', activation='relu'),
            BatchNormalization(),
            Dropout(0.4),
            # Third Convolution
            Conv2D(128, kernel_size=4, activation='relu'),
            BatchNormalization(),
            Flatten(),
            Dropout(0.4),
            # Output layer
            Dense(1, activation='sigmoid')]
        ))
    # Compile each model
    model[j].compile(optimizer=Adam(learning_rate=LR), loss='binary_crossentropy', metrics=['acc'])

编译模型,使用Adam优化算子,学习率为0.001.

# All images will be rescaled by 1./255
train_validate_datagen = ImageDataGenerator(rescale=1/255, validation_split=SPLIT)  # set validation split
test_datagen = ImageDataGenerator(rescale=1/255)
data_chunks = ensemble_data(cnn_networks, IMAGES_PATH)
for j in range(cnn_networks):
    print('Net : {}'.format(j+1))
    df_train = data_chunks[j].iloc[:-60]
    df_test = data_chunks[j].iloc[-60:]
    train_generator = train_validate_datagen.flow_from_dataframe(
        dataframe=df_train,
        directory=IMAGES_PATH,
        target_size=(255, 255),
        x_col='Images',
        y_col='Labels',
        batch_size=32,
        class_mode='binary',
        subset='training')

    validation_generator = train_validate_datagen.flow_from_dataframe(
        dataframe=df_train,
        directory=IMAGES_PATH,
        target_size=(255, 255),
        x_col='Images',
        y_col='Labels',
        batch_size=32,
        class_mode='binary',
        subset='validation')

    test_generator = test_datagen.flow_from_dataframe(
        dataframe=df_test,
        x_col='Images',
        y_col='Labels',
        directory=IMAGES_PATH,
        target_size=(255, 255),
        class_mode='binary')

Net : 1
Found 3495 validated image filenames belonging to 2 classes.
Found 1497 validated image filenames belonging to 2 classes.
Found 60 validated image filenames belonging to 2 classes.
Net : 2
Found 3495 validated image filenames belonging to 2 classes.
Found 1497 validated image filenames belonging to 2 classes.
Found 60 validated image filenames belonging to 2 classes.
Net : 3
Found 3495 validated image filenames belonging to 2 classes.
Found 1497 validated image filenames belonging to 2 classes.
Found 60 validated image filenames belonging to 2 classes.

前面所定义的ensemble_data用来把数据分割成cnn网络的数量,也就是3。
然后使用keras的ImageGenerator,修改图像尺寸并把图像数据分为训练、验证和测试集。

模型训练和验证

steps_per_epoch = train_generator.n // train_generator.batch_size
validation_steps = validation_generator.n // validation_generator.batch_size
learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=0, factor=0.5, min_lr=0.00001)
history = model[j].fit_generator(train_generator,
                                     epochs=EPOCHS,
                                     steps_per_epoch=steps_per_epoch,
                                     validation_data=validation_generator,
                                     callbacks=[learning_rate_reduction],
                                     verbose=0)
print('CNN Model {0:d}: '
          'Epochs={1:d}, '
          'Training Accuracy={2:.5f}, '
          'Validation Accuracy={3:.5f}'.format(j + 1,
                                               EPOCHS,
                                               max(history.history['acc']),
                                               max(history.history['val_acc'])))

scores = model[j].evaluate_generator(test_generator, steps=5)
print("{0}s: {1:.2f}%".format(model[j].metrics_names[1], scores[1]*100))
string_list = []
model[j].summary(print_fn=lambda x: string_list.append(x))
summary = "\n".join(string_list)
logging = ['{0}: {1}'.format(key, val[-1]) for key, val in history.history.items()]
log = 'Results:\n' + '\n'.join(logging)
model[j].save(os.path.join(REPO, 'computer_vision_model_{0}_{1}_of_{2}.h5'.format(TIMESTAMP, j+1, cnn_networks)))
f = open(os.path.join(REPO, 'computer_vision_summary_{0}_{1}_of_{2}.h5'.format(TIMESTAMP, j+1, cnn_networks)), 'w')
f.write("EPOCHS: {0}\nSteps per epoch: {1}\nValidation steps: {2}\nVal Split:{3}\nLearning RT:{5}\n\n\n{4}"
            "\n\n=========TRAINING LOG========\n{6}".format(EPOCHS, steps_per_epoch, validation_steps,  SPLIT, summary,
                                                            LR, log))
f.close()

CNN Model 3: Epochs=5, Training Accuracy=0.55270, Validation Accuracy=0.53173

在模型拟合的过程中建立了ReduceROnPlateau作为回调函数在模型性能经过一段时间没有改善的情况下缩减学习率以优化性能。

最后,模型取得了超过53%的正确率,这个成绩并不能认为很好。但如果在股票交易市场中能够长期稳定的得到高于50%的正确率,那就是非常令人振奋的成绩啦。另外也可在卷积神经网络中增加层数,融合基本面分析、风险因子、场景分析、ESG分数等,以使模型可以应对更为复杂的情况。

相关文章:

  • qmake *.prf文件 自定义features
  • 深度学习目标检测入门论文合集讲解
  • c++11 日期和时间工具(std::chrono::duration)(二)
  • DataStructure_树的基本性质(m叉树和二叉树)
  • Flask 学习-76.Flask-RESTX 处理异常@api.errorhandler
  • Java Boolean类中booleanValue()方法具有什么功能呢?
  • c# 与stm32之间结构体的收发
  • java集合专题Map接口及HashMap/Hashtable/Properties使用方法底层结构及源码分析
  • Vue(六)——vuex
  • JavaScript 学习-47.export 和 import 的使用
  • Kafka 生产者
  • Spring核心IOC的核心类解析
  • 【数据挖掘】恒生金融有限公司2023届秋招数据ETL工程师笔试题解析
  • 软件测试分类
  • (附源码)spring boot儿童教育管理系统 毕业设计 281442
  • 深入了解以太坊
  • 【css3】浏览器内核及其兼容性
  • 【跃迁之路】【641天】程序员高效学习方法论探索系列(实验阶段398-2018.11.14)...
  • ABAP的include关键字,Java的import, C的include和C4C ABSL 的import比较
  • canvas 绘制双线技巧
  • JavaScript 是如何工作的:WebRTC 和对等网络的机制!
  • java概述
  • LintCode 31. partitionArray 数组划分
  • PermissionScope Swift4 兼容问题
  • PHP 程序员也能做的 Java 开发 30分钟使用 netty 轻松打造一个高性能 websocket 服务...
  • use Google search engine
  • 得到一个数组中任意X个元素的所有组合 即C(n,m)
  • 开发了一款写作软件(OSX,Windows),附带Electron开发指南
  • 前端设计模式
  • 前端知识点整理(待续)
  • 手写一个CommonJS打包工具(一)
  • 我从编程教室毕业
  • 资深实践篇 | 基于Kubernetes 1.61的Kubernetes Scheduler 调度详解 ...
  • #常见电池型号介绍 常见电池尺寸是多少【详解】
  • ${factoryList }后面有空格不影响
  • (10)Linux冯诺依曼结构操作系统的再次理解
  • (c语言)strcpy函数用法
  • (Repost) Getting Genode with TrustZone on the i.MX
  • (二) Windows 下 Sublime Text 3 安装离线插件 Anaconda
  • (心得)获取一个数二进制序列中所有的偶数位和奇数位, 分别输出二进制序列。
  • ./mysql.server: 没有那个文件或目录_Linux下安装MySQL出现“ls: /var/lib/mysql/*.pid: 没有那个文件或目录”...
  • .Net CF下精确的计时器
  • .Net CoreRabbitMQ消息存储可靠机制
  • .NET DataGridView数据绑定说明
  • .net 打包工具_pyinstaller打包的exe太大?你需要站在巨人的肩膀上-VC++才是王道
  • .net 提取注释生成API文档 帮助文档
  • .NET/C# 在代码中测量代码执行耗时的建议(比较系统性能计数器和系统时间)
  • .NET关于 跳过SSL中遇到的问题
  • .NET国产化改造探索(一)、VMware安装银河麒麟
  • .NET面试题(二)
  • .Net下C#针对Excel开发控件汇总(ClosedXML,EPPlus,NPOI)
  • /var/log/cvslog 太大
  • ??如何把JavaScript脚本中的参数传到java代码段中
  • [ IO.File ] FileSystemWatcher
  • [android]-如何在向服务器发送request时附加已保存的cookie数据