Loss Function

DL/DL_Network_cal

Loss Function

성지우 2022. 8. 25. 20:14

Classification

분류는 class를 예측하는 것

어떤 text를 입력했을 때, 어떤 class에 속하는지 예측하는 것

즉, 예측해야 할 대상(class)이 정해져있다.

discrete한 값이 ouptut

미리 정의된 가능성이 있는 여러 클래스 레이블 중 하나를 예측하는 것

중간이 없다

classification의 종류
1. binary classification - 예측할 class가 2가지인 경우
2. multi-class classification - 예측할 class가 여러가지인 경우

Binary Classification

output을 T / F, 앞 / 뒤와 같이 두가지 그룹으로 분류하는 것

output y가 binary한 0 / 1 로 나옴

→ Output layer에 Activation function을 sigmoid로 넣어줘야함

→ sigmoid 함수는 0~1 사이의 값으로 확률로 해석가능해 binary value로 output이 나옴

$\left( \overrightarrow{x}\right) ^{1}=\left( x_{1}^{\left( 1\right) }x_{1}^{1}\ldots x_{l_{I}}^{\left( 1\right) }\right) \in \mathbb{R}, y_{1}\in \mathbb{B} $

minibatch input일 경우
$X^{T}\in R^{N\times l_{1}}, Y\in {B} ^{N\times 1}$

dataset for binary classificstion

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

import tensorflow as tf
 
# x값 5개 
N, n_feature = 8, 5
 
t_weights = tf.constant([1, 2, 3, 4, 5], dtype=tf.float32)
t_bias = tf.constant([10], dtype=tf.float32)
 
# X = 8 by 5 matrix
X = tf.random.normal(mean=0, stddev=1, shape=(N, n_feature))
# 행별로 곱셈, axis = 1 -> 
Y = tf.reduce_sum(t_weights*X, axis =1) + t_bias
print(Y)
Y = Y>5
print(Y)
#tf.cast() = casts a tensor to a new type
# binary cross entropy에서 y값
Y = tf.cast(Y, tf.int32)
print(Y)
 
 
print("X(shape/dtype/data): {} / {}\n{}\n".format(X.shape, X.dtype, X.numpy()))
print("Y(shape/dtype/data): {} / {}\n{}\n".format(Y.shape, Y.dtype, Y.numpy()))

cs

result
tf.Tensor( [17.701649 1.3944702 -6.108162 24.025766 21.450665 22.115883 16.152294 10.989155 ], shape=(8,), dtype=float32)
tf.Tensor([ True False False True True True True True], shape=(8,), dtype=bool)
tf.Tensor([1 0 0 1 1 1 1 1], shape=(8,), dtype=int32)
X(shape/dtype/data): (8, 5) / <dtype: 'float32'>
[[-0.19602071 3.5278618 -0.53532726 0.06394964 0.4384257 ]
[ 0.4994513 0.7593835 -0.23085591 -2.7708595 0.23045158]
[-2.3285213 -0.7243898 -0.3982318 0.61186767 -2.7167273 ]
[ 1.2796409 1.2185017 1.0416803 1.1382111 0.5262473 ]
[ 0.5170105 -0.5073481 0.8406941 0.11632203 1.792196 ]
[ 0.20692994 0.00453574 1.9546796 1.4234673 0.06839468]
[ 0.73877287 1.1789272 -0.43556634 2.7570875 -1.3331966 ]
[ 2.2626398 0.568935 0.36409232 -0.4555514 -0.33628532]]
Y(shape/dtype/data): (8,) / <dtype: 'int32'>
[1 0 0 1 1 1 1 1]

dataset for multi-class classification

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

import tensorflow as tf
import matplotlib.pyplot as plt
 
plt.style.use('seaborn')
N, n_feature = 8, 2
# 분류할 클래스 갯수
n_class = 3
 
X = tf.zeros(shape=(0, n_feature))
Y = tf.zeros(shape=(0,1), dtype=tf.int32)
 
#plt.subplots()=Create a figure and a set of subplots.
#fig size(가로길이, 세로길이) - 단위는 인치
fig, ax = plt.subplots(figsize=(5, 5))
 
for class_idx in range(n_class):
  #uniform = 균등 분포
  center = tf. random.uniform(minval=-15, maxval=15, shape=(2, ))
  
  # normal = 정규 분포
  # shape = N by 1 matrix
  x1 = center[0] + tf.random.normal(shape=(N, 1))
  x2 = center[1] + tf.random.normal(shape=(N, 1))
 
  #Concatenates tensors along one dimension= tf.concat()
  x = tf.concat((x1, x2), axis=1)
  #Creates a tensor with all elements set to one (1) = ones()
  y = class_idx*tf.ones(shape=(N, 1), dtype=tf.int32)
 
  # alpha = 투명도
  ax.scatter(x[:, 0].numpy(), x[:, 1].numpy(), alpha = 0.3)
 
  X = tf.concat((X, x), axis = 0)
  Y = tf.concat((Y, y), axis = 0)
 
print("X(shape/dtype/data): {} / {}\n{}\n".format(X.shape, X.dtype, X.numpy()))
print("Y(shape/dtype/data): {} / {}\n{}\n".format(Y.shape, Y.dtype, Y.numpy()))

cs

results
X(shape/dtype/data): (24, 2) / <dtype: 'float32'>
[[ -1.5621597 -1.9767454 ]
[ -0.6325841 -1.4243827 ]
[ -0.75733 -1.5698142 ]
[ -0.6095841 -3.2135358 ]
[ -1.6365058 -0.9978051 ]
[ -1.7076635 -2.727295 ]
[ -0.3138085 -1.4532664 ]
[ 0.12239981 -4.3673816 ]
[ -3.529436 -12.923902 ]
[ -3.3372843 -9.140044 ]
[ -3.1290126 -9.953018 ]
[ -3.70641 -8.324943 ]
[ -3.5355918 -10.14569 ]
[ -4.492819 -8.53738 ]
[ -3.4877377 -9.6594925 ]
[ -4.361145 -11.409944 ]
[ -6.056776 8.591808 ]
[ -6.1167493 6.2502174 ]
[ -4.878406 7.1599135 ]
[ -2.8514028 5.589812 ]
[ -6.0454645 6.8232994 ]
[ -6.2219915 5.519359 ]
[ -5.163896 6.400252 ]
[ -3.4757054 8.505857 ]]
Y(shape/dtype/data): (24, 1) / <dtype: 'int32'>
[[0] [0] [0] [0] [0] [0] [0] [0] [1] [1] [1] [1] [1] [1] [1] [1] [2] [2] [2] [2] [2] [2] [2] [2]]

Regression

연속적인 숫자, 즉 예측값이 float형태인 문제들을 해결하는데 사용

출력에 연속성이 있다.

지하철 역과의 거리, 학군의 수 등등 여러 특징으로 땅값 예측하는 문제

평당 1천만원, 1.2천만원등과 같이 특정 수로 결정되기는 하지만

실제로는 특정 범위의 무한한 실수들 중에서 대푯값을 선택한 것

이 연속성 중에 어디에 점을 찍을 수 있는가를 예측하는 것

연속된 값을 예측

연속적인 숫자, 실수를 예측 - 몸무게를 이용해 키 예측, 교육수준,나이등 이용해 연봉예측 등

분류와 회귀 구분

회귀는 확률을 예측하는 것이 아니다 회귀 출력은 연속성이 있고, 그 연속성 중에 어디에 점을 찍을지 결정하는 문제

확률은 사거늘의 연속, 독립, 반복 등의 시행에 따라 표본공간 속에서 사건이 발생할 경우의 수를 구하는 문제

regression dataset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

import tensorflow as tf
 
# x값 5개로 설정
N, n_feature = 8, 5
 
# 타겟 웨이트, 바이어스
# tf.constant()= create a constant tensor 
t_weights = tf.constant([1, 2, 3, 4, 5], dtype=tf.float32)
t_bias = tf.constant([10], dtype=tf.float32)
 
#normal = normal distribution(정규분포) stddev = 표준 편차
# 8 by 5 matrix
X = tf.random.normal(mean=0, stddev=1, shape=(N, n_feature))
 
# axis=1 -> 행
# y = 1*x1 + 2*x2+...+5*x5+10
Y = tf.reduce_sum(t_weights*X, axis =1) + t_bias
 
print("X(shape/dtype/data): {} / {}\n{}\n".format(X.shape, X.dtype, X.numpy()))
print("Y(shape/dtype/data): {} / {}\n{}\n".format(Y.shape, Y.dtype, Y.numpy()))

cs

results
X(shape/dtype/data): (8, 5) / <dtype: 'float32'>
[[-0.0189618 -1.3154941 -0.1602881 -2.2343993 0.76081836]
[ 0.40219802 0.74753827 0.80659896 1.4320812 -0.34528545]
[ 1.0887271 -0.21967593 -0.641356 -1.8579254 0.7775483 ]
[ 1.5513896 0.21550678 0.2684123 0.93970996 -1.0917537 ]
[ 1.5324857 -0.9955133 1.2049017 0.6821309 0.71609735]
[-0.39242515 -1.6992102 -0.46254358 -0.8380801 0.8570319 ]
[ 0.8124035 -1.8311317 -1.1388721 -1.1330283 -0.90182674]
[ 0.45129856 0.2747502 -0.5540982 1.2864829 0.11932994]]
Y(shape/dtype/data): (8,) / <dtype: 'float32'>
[ 1.7356796 18.31897 5.1813474 11.087711 19.465174 5.7543626 -5.307724 15.081085 ]

one hot vector

one hot encoding
데이터를 쉽게 중복없이 표현할 때 사용하는 방식
컴퓨터 입장에선 모든 것을 숫자로 표현하는 것이 편하다 → 사과 -1, 딸기 -2
데이터를 수많은 0과 한개의 1로 구별하는 인코딩
discrete value를 vector로 표현하는 방식

과정
1. 각 단어에 고유한 인덱스 부여
2. 표현하고 싶은 단어의 인덱스 위치에 1 부여, 나머지 위치엔 0 부여

데이터가 많아지면 size가 급격히 늘어남
단어 속성은 반영 안됨

dataset for multi-class classiifcstion with one-hot encoding

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

# Y를 matrix -> verctor로 바꿔주고 one_hot()
import tensorflow as tf
import matplotlib.pyplot as plt
 
plt.style.use('seaborn')
N, n_feature = 8, 2
n_class = 3
 
X = tf.zeros(shape=(0, n_feature))
Y = tf.zeros(shape=(0,), dtype=tf.int32)
 
fig, ax = plt.subplots(figsize=(5, 5))
for class_idx in range(n_class):
  center = tf. random.uniform(minval=-15, maxval=15, shape=(2, ))
 
  x1 = center[0] + tf.random.normal(shape=(N, 1))
  x2 = center[1] + tf.random.normal(shape=(N, 1))
 
  #Concatenates tensors along one dimension= tf.concat()
  x = tf.concat((x1, x2), axis=1)
  #Creates a tensor with all elements set to one (1) = ones()
  y = class_idx*tf.ones(shape=(N, ), dtype=tf.int32)
 
  # alpha = 투명도
  ax.scatter(x[:, 0].numpy(), x[:, 1].numpy(), alpha = 0.3)
 
  X = tf.concat((X, x), axis = 0)
  Y = tf.concat((Y, y), axis = 0)
 
#one_hot() = Returns a one-hot tensor, depth = 클래스 갯수
Y = tf.one_hot(Y, depth=n_class, dtype=tf.int32)
 
print("X(shape/dtype/data): {} / {}\n{}\n".format(X.shape, X.dtype, X.numpy()))
print("Y(shape/dtype/data): {} / {}\n{}\n".format(Y.shape, Y.dtype, Y.numpy()))

cs

results
X(shape/dtype/data): (24, 2) / <dtype: 'float32'>
[[ -4.3062663 13.379668 ]
[ -4.9414043 13.61981 ]
[ -4.875393 13.83252 ]
[ -5.4358377 13.022475 ]
[ -3.810069 13.920698]
[ -7.136821 12.596723 ]
[ -5.49953 13.314609 ]
[ -5.875171 13.166781 ]
[ -6.65676 -8.477809 ]
[ -8.260426 -8.53915 ]
[ -8.409581 -11.377134 ]
[ -9.125273 -9.4650955]
[ -6.875134 -9.213383 ]
[ -8.152935 -7.4656453]
[ -7.5853257 -8.724476 ]
[ -8.689825 -6.4988613]
[ 11.109833 13.5281105]
[ 12.069455 14.3689785]
[ 10.176313 14.365111 ]
[ 10.68454 17.21193 ]
[ 10.643147 14.787056 ]
[ 8.826957 15.8386 ]
[ 9.311601 14.334457 ]
[ 11.039435 14.324162 ]]
Y(shape/dtype/data): (24, 3) / <dtype: 'int32'>
[[1 0 0]
[1 0 0]
[1 0 0]
[1 0 0]
[1 0 0]
[1 0 0]
[1 0 0]
[1 0 0]
[0 1 0]
[0 1 0]
[0 1 0]
[0 1 0]
[0 1 0]
[0 1 0]
[0 1 0]
[0 1 0]
[0 0 1]
[0 0 1]
[0 0 1]
[0 0 1]
[0 0 1]
[0 0 1]
[0 0 1]
[0 0 1]]

Data Objects

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

#전체 데이터를 minibatch로 만들어 부분부분 사용
import tensorflow as tf
 
N, n_feature = 100, 5
batch_size = 32
 
t_weights = tf.constant([1, 2, 3, 4, 5], dtype=tf.float32)
t_bias = tf.constant([10], dtype=tf.float32)
 
X = tf.random.normal(mean=0, stddev=1, shape=(N, n_feature))
Y = tf.reduce_sum(t_weights*X, axis =1) + t_bias
 
# # 100 / 32 = 3mini개의 batch만들어짐
# for batch_idx in range(N // batch_size):
#   #X에서 x로 minibatch뽑기
#   x = X[batch_idx * batch_size : (batch_idx+1)*batch_size, ...]
#   y = Y[batch_idx * batch_size : (batch_idx+1)*batch_size, ...]
 
#   print(x.shape, y.shape)
 
# tensorflow에서 dataset을 만드는 방법 중 가장 간단
#직접 dataset만든 다음 data object를 만드는 방법, 데이터셋이 작을 경우 사용
#tf.data.Dataset() = Represents a potentially large set of elements
dataset = tf.data.Dataset.from_tensor_slices((X,Y))
dataset = dataset.batch(batch_size)
 
for x,y in dataset:
  print(x.shape, y.shape)
Colored by Color Scripter

cs

result
(32, 5) (32,)
(32, 5) (32,)
(32, 5) (32,)
(4, 5) (4,)

loss function

모델의 출력값과 사용자가 원하는 출력값의 오차를 의미

정답과 예측을 입력으로 받아 실숫값을 출력 값이 클수록 모델성능이 안좋은 것

손실함수의 함수값이 최소화 되도록하는 가중치와 바이어스를 찾는 것이 딥러닝 학습의 goal

Mean Squared Error (MSE)

Regretion에서 가장 흔하게 사용하는 loss function

공식이 간단

차가 커질수록 제곱연산으로 값이 뚜렷

output layer에서 예측된 $\widehat{y}$
dataset에 있던 실제 $y$

datasample이 1개일 경우
$J=\left( y-\widehat{y}\right) ^{2}$

minibatch input일 경우
$J=\dfrac{1}{N}\sum ^{N}_{i=1}\left( y^{\left( i\right) }-\widehat{y}^{\left( i\right) }\right) ^{2}$

$y-\widehat{y}$ 값이 작을수록 학습이 잘 된 모델

MSE Calculation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

import tensorflow as tf
from tensorflow.keras.losses import MeanSquaredError
 
loss_object = MeanSquaredError()
 
batch_size = 32
# predictions (y hat) = 32 by 1 matrix
predictions = tf.random.normal(shape=(batch_size, 1))
 
#dataset안에 들어있는 값 (y) = 32 by 1 matrix
labels = tf.random.normal(shape=(batch_size,1))
 
mse = loss_object(labels, predictions)
mse_manual = tf.reduce_mean(tf.math.pow(labels - predictions,2 ))
print("MSE(Tensorflow): ", mse.numpy())
print("MSE(Manual): ", mse_manual.numpy())
 
Colored by Color Scripter

cs

results
MSE(Tensorflow): 1.1205966 
MSE(Manual): 1.1205966 
MSE with Model/Dataset
 


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

import tensorflow as tf
 
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import MeanSquaredError
 
N, n_feature = 100, 5
 
X = tf.random.normal(shape=(N, n_feature))
Y = tf.random.normal(shape=(N, 1))
 
# dataset생성
dataset = tf.data.Dataset.from_tensor_slices((X,Y))
dataset = dataset.batch(batch_size)
 
model = Dense(units=1, activation = 'linear')
loss_object = MeanSquaredError()
 
# batch를 뽑을 때마다 model을 통과한 뒤 출력값과 실제 값 비교
for x, y in dataset:
  predictions = model(x)  # y hat
  loss = loss_object(y, predictions)
  print(loss.numpy())
Colored by Color Scripter

cs

result
3.991858 
4.9235067 
2.6310182 
3.5861173 

Binary Cross Entropy

Binary Classification에서 사용하는 loss function

Output Layer의 activaition function = sigmoid

→ 출력이 확률로 나와야 하기 때문 (0~1사이의 값)

$J=H_{b}\left( y,\widehat{y}\right) = -\left[ ylog\left( \widehat{y}\right) +\left( 1-y\right) log\left( 1-\widehat{y}\right) \right] $

minibatch input인 경우
$J=H_{b}\left( Y,\widehat{Y}\right) =-\dfrac{1}{N}\sum ^{N}_{i=1}\left[ ylog\left( \widehat{y}\right) +\left( 1-y\right) log\left( 1-\widehat{y}\right)\right] $

BCE Calculation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

import tensorflow as tf
 
from tensorflow.keras.losses import BinaryCrossentropy
 
batch_size = 4
#binary
n_class = 2 
 
predictions = tf.random.uniform(shape=(batch_size,1), minval = 0, maxval = 1, dtype = tf.float32)
labels = tf.random.uniform(shape=(batch_size, 1), minval = 0, maxval = n_class, dtype = tf.int32)
 
loss_object = BinaryCrossentropy()
loss = loss_object(labels, predictions)
 
labels = tf.cast(labels, tf.float32)
bce_man = -(labels*tf.math.log(predictions) + (1 - labels)*tf.math.log(1-predictions))
bce_man = tf.reduce_mean(bce_man)
 
print("BCE(Tensorflow): ", loss.numpy())
print("BCE(Manual): ", bce_man.numpy())
Colored by Color Scripter

cs

results
BCE(Tensorflow): 0.5767325
BCE(Manual): 0.57673275

BCE with Model/Dataset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

import tensorflow as tf
 
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import BinaryCrossentropy
 
N, n_feature = 100, 5
t_weights = tf.constant([1, 2, 3, 4, 5], dtype=tf.float32)
t_bias = tf.constant([10], dtype = tf.float32)
 
X = tf.random.normal(mean=0, stddev=1, shape=(N, n_feature))
Y = tf.reduce_sum(t_weights*X, axis=1) + t_bias
Y = tf.cast(Y > 5, tf.int32)
 
dataset = tf.data.Dataset.from_tensor_slices((X,Y))
dataset = dataset.batch(batch_size)
 
model = Dense(units=1, activation="sigmoid")
loss_object = BinaryCrossentropy()
 
for x, y in dataset:
  predictions = model(x)
  loss = loss_object(y, predictions)
  print(loss.numpy())
Colored by Color Scripter

cs

results
1.0593712 1.0552192 1.0264971 0.8616506 1.3043821 0.7344996 1.9628189 1.0690482 0.92835575 1.1381103 0.8492911 0.73413676 0.8294357 1.1688179 0.7638643 0.9190137 0.7453877 1.7516015 1.3086495 0.72122455 1.2472175 0.6259262 1.0984674 1.4808633 1.318607

Sparse Categorica lCross Entropy

CCE와 차이점 SCCE는 one-hot encoding을 안한다비교하는 값이 int type

label(= y)이 정수 형태를 띄고 있는 경우 사용하는 loss function

SCCE Calculation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

import tensorflow as tf
 
from tensorflow.keras.losses import SparseCategoricalCrossentropy
 
# batch_size = row의 수, n_class = column의 수
# x = 16 by 5 matrix
batch_size, n_class = 16,5
 
#predictions = y_hat,16 by 5 matrix, 0~1사이의 값
#마지막 뉴런의 갯수 = class의 갯수
predictions = tf.random.uniform(shape=(batch_size, n_class), minval=0, maxval=1, dtype=tf.float32)
 
# 행끼리의 합 구한 뒤, reshape해서 column vector로 만들어 줌 
pred_sum = tf.reshape(tf.reduce_sum(predictions, axis = 1), (-1,1))
predictions = predictions / pred_sum
# labels = y
labels = tf.random.uniform(shape=(batch_size, ),minval = 0, maxval = n_class, dtype=tf.int32)
loss_object = SparseCategoricalCrossentropy()
loss = loss_object(labels, predictions)
print(loss.numpy())
 
ce = 0
for label, prediction in zip(labels, predictions):
  ce += -tf.math.log(prediction[label])
ce /= batch_size
print(ce.numpy())
 
# loss = ce
Colored by Color Scripter

cs

results
1.8806874 (Tensorflow)
1.8806875 (Manual)

SCCE with Model/Dataset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

import tensorflow as tf
 
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import SparseCategoricalCrossentropy
 
N, n_feature = 100, 2
n_class = 5
 
X = tf.zeros(shape=(0, n_feature))
Y = tf.zeros(shape=(0,1), dtype=tf.int32)
 
for class_idx in range(n_class):
  center = tf. random.uniform(minval=-15, maxval=15, shape=(2, ))
 
  x1 = center[0] + tf.random.normal(shape=(N, 1))
  x2 = center[1] + tf.random.normal(shape=(N, 1))
 
  x = tf.concat((x1, x2), axis=1)
  y = class_idx*tf.ones(shape=(N, 1), dtype=tf.int32)
 
  X = tf.concat((X, x), axis = 0)
  Y = tf.concat((Y, y), axis = 0)
 
# tensor에서 만들거니깐, (X,Y) -> tuple로 넣어줌
# 열기준으로 slice -> dataset 2개 , dataset당 x,y는 batch_size개 들어있다
 
dataset = tf.data.Dataset.from_tensor_slices((X,Y))
print(dataset)
dataset = dataset.batch(batch_size)
 
model = Dense(units=n_class, activation='softmax')
loss_object = SparseCategoricalCrossentropy()
 
# loss가 16 * 2 개 나와야함
for x,y in dataset:
  predictions = model(x)
  loss = loss_object(y, predictions)
  print(loss.numpy())
Colored by Color Scripter

cs

 

results
<TensorSliceDataset element_spec=(TensorSpec(shape=(2,), dtype=tf.float32, name=None), TensorSpec(shape=(1,), dtype=tf.int32, name=None))>
7.0570326 7.0201874 6.9119596 6.921844 6.9838066 6.9927797 10.15341 11.246576 11.018984 11.169491 10.913704 11.154956 5.610979 0.09014476 0.07343918 0.075236864 0.06897332 0.080975465 3.604432 13.280009 13.460102 13.393946 13.663402 13.775893 13.933668 11.471234 11.71542 11.487019 12.066933 11.396588 12.009353 11.740828

Categorical Cross Entropy

Multi-class Classification에서 사용하는 loss function

우리가 비교하고자 하는 값이 one-hot encoding이 된 상태

label(= y)이 one-hot vector인 경우 사용

CCE Calculation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

import tensorflow as tf
 
from tensorflow.keras.losses import CategoricalCrossentropy
 
# x = 16 by 5 matrix
batch_size, n_class = 16, 5
 
#0 <= y_hat = predictons <= 1 
#마지막 뉴런의 갯수 = class의 갯수
predictions = tf.random.uniform(shape=(batch_size, n_class), minval=0, maxval=1, dtype=tf.float32)
 
# 행끼리의 합 구한 뒤, reshape해서 column vector로 만들어 줌 
# pred_sum = 16 by 1 matrix
pred_sum = tf.reshape(tf.reduce_sum(predictions, axis = 1), (-1,1))
predictions = predictions / pred_sum
print(predictions.shape)
# labels = y
labels = tf.random.uniform(shape=(batch_size, ),minval = 0, maxval = n_class, dtype=tf.int32)
# one_hot(indices, depth), labels by n_class matrix
labels = tf.one_hot(labels, n_class)
 
loss_object = CategoricalCrossentropy()
loss = loss_object(labels, predictions)
 
print("CCE(Tensroflow): ",loss.numpy())
 
cce_man = tf.reduce_mean(tf.reduce_sum(-labels*tf.math.log(predictions), axis = 1))
print("CCE(Manual): ",cce_man.numpy())
Colored by Color Scripter

cs

results
(16, 5)
CCE(Tensroflow): 1.4721774
CCE(Manual): 1.4721774

CCE with Model/Dataset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

import tensorflow as tf
from tensorflow.keras.losses import CategoricalCrossentropy
 
N, n_feature = 8, 2
n_class = 5
 
X = tf.zeros(shape=(0, n_feature))
Y = tf.zeros(shape=(0,), dtype=tf.int32)
 
for class_idx in range(n_class):
  center = tf.random.uniform(minval=-15, maxval=15, shape=(2, ))
 
  x1 = center[0] + tf.random.normal(shape=(N, 1))
  x2 = center[1] + tf.random.normal(shape=(N, 1))
 
  #Concatenates tensors along one dimension= tf.concat()
  x = tf.concat((x1, x2), axis=1)
  #Creates a tensor with all elements set to one (1) = ones()
  y = class_idx*tf.ones(shape=(N, ), dtype=tf.int32)
 
  X = tf.concat((X, x), axis = 0)
  Y = tf.concat((Y, y), axis = 0)
 
#one_hot() = Returns a one-hot tensor, depth = 클래스 갯수
Y = tf.one_hot(Y, depth=n_class, dtype=tf.int32)
 
# 열기준으로 slice -> dataset 2개 , dataset당 x,y는 batch_size개 들어있다 
dataset = tf.data.Dataset.from_tensor_slices((X,Y))
dataset = dataset.batch(batch_size)
 
model = Dense(units=n_class, activation='softmax')
loss_object = CategoricalCrossentropy()
 
for x,y in dataset:
  predictions = model(x)
  loss = loss_object(y, predictions)
  print(loss.numpy())
Colored by Color Scripter

cs

results
8.960773
4.633885
0.11723381

참고 및 출처 : https://nexablue.tistory.com/29 , https://knight76.tistory.com/entry/%ED%9A%8C%EA%B7%80regression%EC%99%80-%EB%B6%84%EB%A5%98classification-%EA%B0%9C%EB%85%90 , https://m.blog.naver.com/PostView.naver?isHttpsRedirect=true&blogId=qbxlvnf11&logNo=221528102803 , https://needjarvis.tistory.com/565 , https://velog.io/@rcchun/%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-%EC%86%90%EC%8B%A4%ED%95%A8%EC%88%98%EC%9D%98-%EC%A2%85%EB%A5%98 , https://guru.tistory.com/67 ,