ML | Rain Hu's Workspace

[ML] 選擇 loss function/ optimizer/ metrics

建構模型 Dense(units, activation) units 為 output_size，keras 已經處理好自動計算 input_size 的部分。 activation function relu: Rectified Linear Unit, ReLU softmax: 對陣列中所有元素做自然對數取值後，在做 normalize，目的是放大最大權重的元素，並且將所有值換成 0~1 的值，意義類似機率。 model.keras.Sequential([ Dense(32, activation="relu"), Dense(64, activation="relu"), Dense(32, activation="relu"), Dense(10, activation="softmax"), ]) 編譯損失函數(目標函數) loss function CategoricalCrossentropy SparseCategoricalCrossentropy BinaryCrossentropy MeanSquareError KLDivergence CosineSimilarity … 優化器 optimizer SGD (可搭配 momemtum) RMSprop Adam Adagrad … 評量指標 metrics CategoricalAccuracy SparseCategoricalAccuracy BinaryAccuracy AUC Precision Recall … 以下範例兩種型式都可以。其中物件的用法可以使用客製化的條件。 model.compile(optimizer="rmsprop", loss="mean_square_error", metics=["accuracy"]) model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-4), loss=keras.meanSquaredError(), metrics=[keras.metrics.BinaryAccuracy]) 洗牌收集完資料之後，我們目的並非只在訓練資料上取得良好的模型，而是要取得在大部分狀況下都表現良好的模型。故我們需要將收集完的資料分成訓練集與驗證集。以下透過 np.random.permutation() ，與 slice 來對資料做抽樣。 indices_permutation = np.random.permutation(len(data)) shuffled_inputs = data[indices_permutation] shuffled_targets = labels[indices_permutation] num_validation_samples = int(0.3 * len(data)) val_inputs = shuffled_inputs[:num_validation_samples] val_targets = shuffled_targets[:num_validation_samples] training_inputs = shuffled_inputs[num_validation_samples:] training_targets = shuffled_targets[num_validation_samples:] model.fit( training_inputs, training_targets, epochs=5, batch_size=16, validation_data=(val_inputs, val_targets) )

[ML] General guide on ML

Loss on training data large: model bias -> add features optimization -> change optimization methods small: loss on testing data large: overfitting: (1) more training data, data augmentation (2) make model simpler small: mismatch

[ML] Start Tensorflow Environment with Conda

環境建置安裝 Anaconda 創建虛擬環境 conda create -n tensorflow 進入虛擬環境 (macOS/Linux) source activate tensorflow 在環境內安裝 tensorflow pip install tensorflow 在環境內安裝 jupyter notebook pip install jupyter notebook 在環境內安裝 pandas pip install pandas 開啟 jupyter notebook jupyter notebook For terminal user 開始 Anaconda.Navigator 在 Environments 中安裝指定模組 ex.tensorflow, keras 在 terminal 中輸入 conda activate {環境名稱} conda activate tensorflow 開啟 python python 若成功便會顯示 python 安裝資訊 Python 3.11.5 (main, Sep 11 2023, 08:17:37) [Clang 14.0.6 ] on darwin Type "help", "copyright", "credits" or "license" for more information. 大功告成，接著嘗試訓練第一筆資料 ...

[ML] 01. 機器學習基本概念簡介

前言什麼是機器學習機器學習(Machine Learning)，就是利用機器的力量幫忙找出函式。 Input 可以是 vector matrix sequence Output 可以是 Regression Classification Structed Learning(令機器產生有結構的東西 eg. text, image) 示意圖什麼是深度學習深度學習(Deep Learning)，就是利用神經網路(neural network)的方式來產生函數。機器如何學習 1. 基本原理(訓練三步驟) Step 1: 使用合適的 Model \(y=f(\text{\red{data}})\) Function with unknown parameters Model: \(\boxed{y=b+wx_1}\) \(w: \text{weight}\) \(b: \text{bias}\) \(x: \text{feature}\) Step 2: 定義 Loss function Define loss from training data 以 Model 的參數 \(w,b\) 來計算 Loss 物理意義：Loss 愈大代表參數愈不好，Loss 愈小代表參數愈好。計算方法：求估計的值與實際的值(label)之間的差距 Loss function: \(\boxed{L=\frac{1}{N}\sum_ne_n}\) MAE (mean absolute error): \(e=|y-\hat{y}|\) MSE (mean square error): \(e=(y-\hat{y})^2\) Cross-entropy: 計算機率分布之間的差距 Error Surface: 根據不同的參數，計算出 loss 所畫出來的等高線圖。 Step 3: Optimization 找到 loss 最小的參數組合 \((w,b)\) 方法：Gradient Descent \(\boxed{w’ = w - \red{\eta}\frac{\partial L}{\partial w}|_{w=w^0,b=b^0}}\) \(\boxed{b’ = b - \red{\eta}\frac{\partial L}{\partial b}|_{w=w^0,b=b^0}}\) \(\red{\eta}\): 學習率 learning rate, 決定 gradient descent 的一步有多大步 2. Linear Model \(\boxed{f\leftarrow y=b+\sum_{j=1}^{n}{w_jx_j}}\) 不只考慮前一天的觀看人數 \(x_1\)，也考慮前二~七天 \(x_2, x_3, … , x_7\)。當參數變多時，命中率可望有效提升。 3. Piecewise Linear Curves(Sigmoid) \(\text{Sigmoid Function:} \boxed{y=\red{c}\frac{1}{1+e^{-(\green{b}+\blue{w}x_1)}}}=\boxed{\red{c}\text{ sigmoid}(\green{b}+\blue{w}x_1)}\) 將 \(w_ix_i\) 替換成 \(c_i\text{ sigmoid}(b_i+w_ix_i)\) 特徵為1時，\(\boxed{y=b+\sum_i{c_i\text{ sigmoid}(b_i+ w_ix_1)}}\) 特徵>1時，\(\boxed{y=b+\sum_i{c_i\text{ sigmoid}(b_i+\sum_j w_{ij}x_j)}}\) 意義：一條曲線可以由多個鋸齒狀的線段(hard sigmoid)的總合，我們可以用 sigmoid 函數來逼近 hard sigmoid。事實上，sigmoid 的個數就是神經網路中一層 neuron 的 node 數，至於使用幾個 sigmoid 是 hyper parameter。可將公式轉成矩陣計算+激勵函數的形式：以線性代數方式表示：\(\boxed{y=b+c^T\sigma(b_i+Wx)}\) 將 \(b\)、\(b_i\)、\(W\)、\(c^T\) 等所有參數統稱為 \(\theta\) 故 Loss 可表示成 \(L(\theta)\) 重覆 gradient descent 的方法，更新(update) 參數。梯度 gradient，\(g=\) \(\begin{bmatrix}\frac{\partial L}{\partial \theta_1}|_{\theta=\theta^0}\\\frac{\partial L}{\partial \theta_2}| _{\theta=\theta^0}\\\vdots\end{bmatrix}=\nabla L(\theta^0)\) 更新(update)計算：\(\begin{bmatrix}\theta_1^1\\\theta_2^1\\\vdots\end{bmatrix}\leftarrow\begin{bmatrix}\theta_1^0\\\theta_2^0\\\vdots\end{bmatrix}-\begin{bmatrix}\eta \frac{\partial L}{\partial \theta_1}|_{\theta=\theta^0}\\\eta\frac{\partial L}{\partial\theta_2}| _{\theta=\theta^0}\\\vdots\end{bmatrix}\) 或寫成 \(\theta^1\leftarrow \theta^0-\eta g\) batch training 將樣本依批次(batch)進行更新，當所有的 batches 都跑過一遍，稱為一個 epoch 4. ReLU 用 hard sigmoid 的方式來表示。其每一個 hard sigmoid 由兩個 Rectified Linear Unit(ReLU) 組成，每一個 ReLU 寫成：\(\boxed{\red{c}\text{ max}(0,\green{b}+\blue{w}x_1)}\) 故 Model 可以寫成：\(\boxed{y=b+\sum_{\red{2}i}\text{max}(0,b_i+\sum_j{w_{ij}x_j})}\) 其中我們選用來逼近的函式，稱為 Activation function。深度學習 Neural Network \(\boxed{y=b+c^T\sigma(b_i+Wx)}\) Multiple hidden layers -> Deep learning

[ML] 簡單實作測試

線性迴歸建模載入資料 import pandas as pd import matplotlib.pyplot as plt import matplotlib as mlp url = "sample.csv" data = pd.read_csv(url) x = data["x-axis"] y = data["y-axis"] 畫圖 def plot(x, y, w, b): line = w * x + b plt.plot(x, line, color="red", label="prediction") plt.scatter(x, y, color="blue", label="data", marker="x") plt.title("Title") plt.xlabel("x Axis") plt.ylabel("y Axis") plt.xlim([0,12]) plt.ylim([20,140]) plt.show() plot(x, y, 10, 20) 定義 cost function def cost_function(x, y, w, b): y2 = w * x + b cost = (y - y2) ** 2 return cost.mean() cost_function(x, y, 10, 20) 假設在 b = 20 的情形下，找 w 的最小值 w_arr = [] costs = [] for w in range(-100, 101): w2 = 10 + w/100 cost = cost_function(x, y, w2, 20) w_arr.append(w2) costs.append(cost) import matplotlib.pyplot as plt plt.title("cost function - when b = 20) plt.xlabel("w") plt.ylabel("cost function") plt.plot(w_arr, costs) plt.show() 利用 numpy 計算矩陣 import numpy as np ws = np.arange(-100, 101) bs = np.arange(-100, 101) costs = np.zeros((201, 201)) i = 0 for w in ws: j = 0 for b in bs: cost = cost_function(x, y, w, b) costs[i,j] = cost j = j+1 i = i+1 print(costs) 畫 3d 圖 ax = plt.axes(projection="3d") ax.xaxis.set_pane_color((1,1,1)) ax.yaxis.set_pane_color((1,1,1)) ax.zaxis.set_pane_color((1,1,1)) plt.figure(figsize=(7,7)) ax.view_init(30, -110) b_grid, w_grid = np.meshgrid(bs, ws) ax.plot_surface(w_grid, b_grid, costs, cmap="Spectral_r", alpha=0.7) ax.plot_wireframe(w_grid, b_grid, costs, alpha=0.1) ax.set_title("loss function") ax.set_xlabel("w") ax.set_ylabel("b") ax.set_zlabel("loss") w_index, b_index = np.where(costs == np.min(costs)) ax.scatter(ws[w_index], bs[b_index], costs[w_index, b_index], color="red", s=40) plt.show() 計算梯度 \(\text{cost} = (\text{y}_\text{pred}-\text{y})^2\\ \text{cost} = (\text{y}-(\text{w}\times\text{x}+\text{b}))^2\\ \text{m} _\text{w} = -2\times\text{x}(\text{y-wx-b})\\ \text{m} _\text{b} = -2\times(\text{y-wx-b})\\ \) def compute_gradient(x, y, w, b): w_gradient = 2*x*(w*x+b-y).mean() b_gradient = 2*(w*x+b-y).mean() return w_gradient, b_gradient 利用梯度下降計算 cost 最小值 \(\text{w}_2=\text{w}-\text{m} _\text{w} \times \text{learning\_rate}\) \(\text{b}_2=\text{b}-\text{m} _\text{b} \times \text{learning\_rate}\) learning_rate = 0.001 for i in range(10): w_gradient, b_gradient = compute_gradient(x, y, w, b) w = w - w_gradient * learning_rate b = b - b_gradient * learning_rate cost = cost_function(x, y, w, b) print(f"Iteration {i} : Cost {cost}, w: {w}, b: {b}") gradient_descent 函式 def gradient_descent(x, y, w_init, b_init, learning_rate, cost_function, gradient_function, run_iteration): c_hist = [] w_hist = [] b_hist = [] w = w_init b = b_init for i in range(run_iteration): w_gradient, b_gradient = gradient_function(x, y, w, b) w = w - w_gradient * learning_rate b = b - b_gradient * learning_rate cost = cost_function(x, y, w, b) w_hist.append(w) b_hist.append(b) c_hist.append(cost) return w, b, w_hist, b_hist, c_hist 多特徵的預測 from sklearn.model_selection import train_test_split scaler = StandardScaler() scaler.fit(x_train) x_train = scaler.transform(x_train) x_test = scaler.transform(x_test) x_real = np.array([[5.3, 2, 1, 0], [7,2, 0, 0, 1]]) x_real = scaler.transfrom(x_real) y_real = (w_final*x_real).sum(axis=1) + b_final y_real 「特徵縮放」加速 gradient descent w1x1+w2x2+w3x3+w4x4+b 因分布範圍不同，調整參數，最好令每一個乘積都相當相當於是標準化：\(\frac{\text{x-平均值}}{標準差}\) from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(x_train) x_train = scaler.transform(x_train) x_test = scaler.transform(x_test) 邏輯迴歸 Logistic Regression Sigmoid Function 當模性呈現 0-1 關係(邏輯迴歸)時可用 \(\text{Sigmoid Function}=\frac{1}{1+e^{-z}}\) def sigmoid(z): return 1/(1+np.exp(-z)) w = np.array([1,2,3,4]) b = 1 z = (w*x_train).sum(axis=1) + b sigmoid(z)