[Python] 파이썬, 딥러닝 CNN을 이용한 주식 가격 예측(2)

컴퓨터/Python

[Python] 파이썬, 딥러닝 CNN을 이용한 주식 가격 예측(2)

sjblog 2022. 9. 29. 12:42

1. 전체 과정 요약

데이터 준비, 학습, 예측의 순서로 총 3번의 글을 작성하였습니다.

[Python] 파이썬, 딥러닝 CNN을 이용한 주식 가격 예측(1)
https://sjblog1.tistory.com/65

[Python] 파이썬, 딥러닝 CNN을 이용한 주식 가격 예측(1)

결론적으로, 본인은 CNN을 통한 주식 가격의 예측에 실패하였습니다. Epochs, Batch size 등 과 같은 Hyperparameter (사용자가 입맛(?)대로 설정하는)를 변경하시거나, CNN을 활용한 다른 주식 가격 예측 전

sjblog1.tistory.com

[Python] 파이썬, 딥러닝 CNN을 이용한 주식 가격 예측(3)
https://sjblog1.tistory.com/67

[Python] 파이썬, 딥러닝 CNN을 이용한 주식 가격 예측(3)

1. 전체 과정 요약 데이터 준비, 학습, 예측의 순서로 총 3번의 글을 작성하였습니다. [Python] 파이썬, 딥러닝 CNN을 이용한 주식 가격 예측(1) https://sjblog1.tistory.com/65 [Python] 파이썬, 딥러닝 CNN을..

sjblog1.tistory.com

2. 패키지 추천 버전

import tensorflow as tf
from tensorflow import keras
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
import os
from glob import glob
from datetime import datetime
import numpy as np
import math

python==3.10
tensorflow==2.10.0
keras==2.10.0

3. 출력 색상 변경

# 출력문자 색상 변경
formatters = {
    'Red': '\033[91m',
    'Green': '\033[92m',
    'Blue': '\033[94m',
    'END': '\033[0m'
}

def printlog(content, color):
    if color == 'Green':
        print('{Green}'.format(**formatters) + datetime.now().strftime('[%m/%d %H:%M:%S] ') + str(content) + '{END}'.format(**formatters))
    elif color == 'Blue':
        print('{Blue}'.format(**formatters) + datetime.now().strftime('[%m/%d %H:%M:%S] ') + str(content) + '{END}'.format(**formatters))
    else:
        print('{Red}'.format(**formatters) + datetime.now().strftime('[%m/%d %H:%M:%S] ') + str(content) + '{END}'.format(**formatters))

4. 모델 학습

CNN 모델 학습을 해보겠습니다.

변수 설정, 데이터 수집, 모델 만들기, 모델 컴파일, 모델 적용, Test 결과 분석으로 진행하겠습니다.

5. 변수 설정

Base_dir = 'data/'
img_height, img_width = 150, 150
batch_size = 8
epochs = 10

Base_dir은 Train, Test 데이터가 들어있는 폴더입니다.

따라서, [Python] 파이썬, 딥러닝 CNN을 이용한 주식 가격 예측(1)에서 만들어 놓은

"3.dataset/last_dataset_merge_20_150/train" => "data/train"

으로 이동시켜줍시다.

또한, test 폴더 역시

"data/test"로 이동합니다.

주가 데이터가 아니라, 고양이, 강아지 등 다른 데이터를 사용하실 분은 data 폴더에 train, test으로 잘 넣어주세요.

batch size와 epochs도 원하시는 만큼 설정해주세요. (기본 8,10으로 설정하겠습니다.)

6. Train, Test 데이터 가져오기

train_dir = Base_dir + 'train/'
test_dir = Base_dir + 'test/'

# train, test 이미지 개수
printlog('Number of images in train: ' + str(len(glob(train_dir + '*/*'))), 'Green')
printlog('Number of images in test: ' + str(len(glob(test_dir + '*/*'))), 'Green')

# Data loading
train_data = tf.keras.utils.image_dataset_from_directory(
    Base_dir + 'train/',
    labels='inferred',
    color_mode='rgb',
    validation_split=0.2,
    subset='training',
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)
val_data = tf.keras.utils.image_dataset_from_directory(
    Base_dir + 'train/',
    labels='inferred',
    color_mode='rgb',
    validation_split=0.2,
    subset='validation',
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)
test_data = tf.keras.utils.image_dataset_from_directory(
    Base_dir + 'test/',
    labels='inferred',
    color_mode='rgb',
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

data 폴더에 저장해 놓은 Train, Test를 불러오겠습니다.

프롬프트 출력창에 녹색 글씨로 train과 test 데이터 개수가 맞는지 확인해보세요!

6-1. Class 확인

class_names_train = train_data.class_names
printlog('Train class: ' + str(class_names_train), 'Green')
class_names_val = val_data.class_names
printlog('Validation class: ' + str(class_names_val), 'Green')
class_names_test = test_data.class_names
printlog('Test class: ' + str(class_names_test) + '\n', 'Green')

"0", "1" Class가 모두 인식되었는지 출력창을 확인해주세요.

6-2. Shape 확인

for image_batch, labels_batch in train_data:
    printlog('Train batch: ' + str(image_batch.shape), 'Green')
    #print(labels_batch.shape)
    break
for image_batch, labels_batch in val_data:
    printlog('Validation batch: ' + str(image_batch.shape), 'Green')
    #print(labels_batch.shape)
    break
for image_batch, labels_batch in test_data:
    printlog('Test batch: ' + str(image_batch.shape), 'Green')
    #print(labels_batch.shape)
    break

Shape는 데이터의 형태를 보여줍니다.

예를 들어, (8,150,150,3)은 각각 (bach size, image height, image width, channel) 을 의미합니다.

7. 모델 만들기

model = tf.keras.Sequential([
    tf.keras.layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
    # This is the first convolution
    tf.keras.layers.Conv2D(16, (3,3), activation='relu',padding='same'),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The second convolution
    tf.keras.layers.Conv2D(32, (3,3), activation='relu',padding='same'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The third convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu',padding='same'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The fourth convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu',padding='same'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The fifth convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu',padding='same'),
    tf.keras.layers.MaxPooling2D(2,2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(),
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

사용하고자 하는 CNN 모델입니다.

우리는 Class가 2개밖에 없으니, sigmoid를 사용하였습니다.

8. 모델 컴파일

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy','Precision','Recall'])

마찬가지로, Class가 2개 밖에 없으니, binary_crossentropy를 사용하였습니다.

AUTOTUNE = tf.data.AUTOTUNE

train_data = train_data.cache().prefetch(buffer_size=AUTOTUNE)
val_data = val_data.cache().prefetch(buffer_size=AUTOTUNE)
test_data = test_data.cache().prefetch(buffer_size=AUTOTUNE)

9. 모델 적용

model_result = model.fit(
    train_data,
    validation_data=val_data,
    epochs=epochs,
    verbose=1
)

acc = model_result.history['accuracy']
val_acc = model_result.history['val_accuracy']

train_precision = model_result.history['precision']
val_precision = model_result.history['val_precision']

train_recall = model_result.history['recall']
val_recall = model_result.history['val_recall']

loss = model_result.history['loss']
val_loss = model_result.history['val_loss']

Train(Train, Validation), Test 데이터를 모델에 적용하여 학습합니다.

또한, 학습 결과를 저장합니다.

10. Train 학습 결과 및 저장

os.makedirs('Results', exist_ok=True)
epochs_range = range(len(acc))

plt.plot(epochs_range, acc, 'r', label='Training accuracy')
plt.plot(epochs_range, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.tight_layout()
plt.savefig('Results/Training and validation accuracy' + '.png')
plt.figure()

plt.plot(epochs_range, train_precision, 'r', label='Training precision')
plt.plot(epochs_range, val_precision, 'b', label='Validation precision')
plt.title('Training and validation precision')
plt.legend()
plt.tight_layout()
plt.savefig('Results/Training and validation precision' + '.png')
plt.figure()

plt.plot(epochs_range, train_recall, 'r', label='Training recall')
plt.plot(epochs_range, val_recall, 'b', label='Validation recall')
plt.title('Training and validation recall')
plt.legend()
plt.tight_layout()
plt.savefig('Results/Training and validation recall' + '.png')
plt.figure()

plt.plot(epochs_range, loss, 'r', label='Training Loss')
plt.plot(epochs_range, val_loss, 'b', label='Validation Loss')
plt.title('Training and validation loss')
plt.legend()
plt.tight_layout()
plt.savefig('Results/Training and validation loss' + '.png')
plt.figure()
#plt.show()

model.save('Results/CNN.h5')

Results 폴더를 만들어 결과 파일을 저장합니다.

Accuracy, Precision, Recall, Loss와 학습된 모델 파일(CNN.h5)을 저장합니다.

11. Test 데이터로 모델 평가

test_eval = model.evaluate(test_data, verbose=1)
printlog('loss: ' + str(test_eval[0]), 'Green')
printlog('accuracy: ' + str(test_eval[1]), 'Green')
printlog('Precision: ' + str(test_eval[2]), 'Green')
printlog('Recall: ' + str(test_eval[3]), 'Green')

간단한 모델 평가를 출력해줍니다.

11-1. Test 데이터로 모델 평가, Confusion matrix

# binary classification 역치
threshold = 0.5

# 예측
score = model.predict(test_data, verbose=1)
Y_pred = np.where(score > threshold, 1,0)

# Test label
test_data = test_data.unbatch()
test_classes = np.array(list(test_data.map(lambda x, y: y)))

# Confusion matrix
cm = tf.math.confusion_matrix(test_classes, Y_pred)
report = classification_report(test_classes, Y_pred)

tn = cm[0][0]
fn = cm[1][0]
tp = cm[1][1]
fp = cm[0][1]
if tp == 0:
    tp = 1
if tn == 0:
    tn = 1
if fp == 0:
    fp = 1
if fn == 0:
    fn = 1
TPR = float(tp)/(float(tp)+float(fn))
FPR = float(fp)/(float(fp)+float(tn))
accuracy = round((float(tp) + float(tn))/(float(tp) + float(fp) + float(fn) + float(tn)), 3)
specitivity = round(float(tn)/(float(tn) + float(fp)), 3)
sensitivity = round(float(tp)/(float(tp) + float(fn)), 3)
mcc = round((float(tp)*float(tn) - float(fp)*float(fn))/math.sqrt(
    (float(tp)+float(fp))
    * (float(tp)+float(fn))
    * (float(tn)+float(fp))
    * (float(tn)+float(fn))
), 3)
f_output = open('Results/CNN_outputresult.txt', 'a')
f_output.write('============================\n')
f_output.write('{}epochs_{}batch_cnn_{}_{}\n'.format(epochs, batch_size, img_height, img_width))
f_output.write('TN: {}\n'.format(tn))
f_output.write('FN: {}\n'.format(fn))
f_output.write('TP: {}\n'.format(tp))
f_output.write('FP: {}\n'.format(fp))
f_output.write('TPR: {}\n'.format(TPR))
f_output.write('FPR: {}\n'.format(FPR))
f_output.write('accuracy: {}\n'.format(accuracy))
f_output.write('specitivity: {}\n'.format(specitivity))
f_output.write("sensitivity : {}\n".format(sensitivity))
f_output.write("mcc : {}\n".format(mcc))
f_output.write("{}".format(report))
f_output.write('============================\n')
f_output.close()

Test 데이터에 대한 모델 평가 confusion matrix를 만듭니다.

쉽게 말해,
맞는 걸 맞다고, 틀린 걸 틀리다고, 맞는 걸 틀리다고, 틀린걸 맞다고 예측한 수를 행렬로 나타냅니다.

마찬가지로, Results 폴더에 저장합니다.

11-2.Test 데이터로 모델 평가, ROC

def plot_roc(pred, y, model):
    fpr, tpr, _ = roc_curve(y, pred)
    roc_auc = auc(fpr, tpr)
    plt.figure()
    plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
    plt.plot([0, 1], [0, 1], 'k--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC)')
    plt.legend(loc="lower right")
    plt.savefig('Results/'+str(model)+'_ROC AUC_' + str(img_height) + '_' + str(img_width) + '.png')
    #plt.show()

plot_roc(Y_pred, test_classes, 'CNN')

ROC 커브를 그립니다.

마찬가지로, Results 폴더에 저장합니다.

다음 포스팅은 만들어 놓은 CNN 모델을 활용하여 임의의 실제 데이터에 적용하여 보겠습니다.

[Python] 파이썬, 딥러닝 CNN을 이용한 주식 가격 예측(3)
https://sjblog1.tistory.com/67

[Python] 파이썬, 딥러닝 CNN을 이용한 주식 가격 예측(3)

sjblog1.tistory.com

저작자표시 (새창열림)