MLs | Rain Hu's Workspace

[ML] 選擇 loss function/ optimizer/ metrics

建構模型 Dense(units, activation) units 為 output_size，keras 已經處理好自動計算 input_size 的部分。 activation function relu: Rectified Linear Unit, ReLU softmax: 對陣列中所有元素做自然對數取值後，在做 normalize，目的是放大最大權重的元素，並且將所有值換成 0~1 的值，意義類似機率。 model.keras.Sequential([ Dense(32, activation="relu"), Dense(64, activation="relu"), Dense(32, activation="relu"), Dense(10, activation="softmax"), ]) 編譯損失函數(目標函數) loss function CategoricalCrossentropy SparseCategoricalCrossentropy BinaryCrossentropy MeanSquareError KLDivergence CosineSimilarity … 優化器 optimizer SGD (可搭配 momemtum) RMSprop Adam Adagrad … 評量指標 metrics CategoricalAccuracy SparseCategoricalAccuracy BinaryAccuracy AUC Precision Recall … 以下範例兩種型式都可以。其中物件的用法可以使用客製化的條件。 model.compile(optimizer="rmsprop", loss="mean_square_error", metics=["accuracy"]) model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-4), loss=keras.meanSquaredError(), metrics=[keras.metrics.BinaryAccuracy]) 洗牌收集完資料之後，我們目的並非只在訓練資料上取得良好的模型，而是要取得在大部分狀況下都表現良好的模型。故我們需要將收集完的資料分成訓練集與驗證集。以下透過 np.random.permutation() ，與 slice 來對資料做抽樣。 indices_permutation = np.random.permutation(len(data)) shuffled_inputs = data[indices_permutation] shuffled_targets = labels[indices_permutation] num_validation_samples = int(0.3 * len(data)) val_inputs = shuffled_inputs[:num_validation_samples] val_targets = shuffled_targets[:num_validation_samples] training_inputs = shuffled_inputs[num_validation_samples:] training_targets = shuffled_targets[num_validation_samples:] model.fit( training_inputs, training_targets, epochs=5, batch_size=16, validation_data=(val_inputs, val_targets) )

[ML] General guide on ML

Loss on training data large: model bias -> add features optimization -> change optimization methods small: loss on testing data large: overfitting: (1) more training data, data augmentation (2) make model simpler small: mismatch

[ML] sample1 - 手寫數字辨識

MNIST NIST(National Insitute of Standards and Technology) 是美國國家標準與技術研究院，MNIST 是由 NIST 所提供的一組經典的機器學習測資，可以想成是深度學習中的「Hello World!」，它由 60000張訓練圖片與 10000 張測試圖片所組成，為手寫數字的灰階圖片，大小為 28 * 28 像素，分類 0 到 9 共 10 個數字。可透過 keras 模組直接取得資料 >>> from tensorflow.keras.datasets import mnist 輸入 mnist.load_data() 可取得 mnist 資料集，回傳值為 2*2 的 tuple of ndarray。 >>> (train_images, train_labels), (test_images, test_labels) = mnist.load_data() tuple 裡面裝載的是 NumPy 的 ndarray 物件，我們可以利用 o.shape 來取得 ndarray 的屬性 len(o) 來取得陣列的個數 >>> train_images.shape (60000, 28, 28) # 3 軸陣列，其大小為 60000 * 28 * 28 >>> test_images.shape (10000, 28, 28) # 3 軸陣列，其大小為 10000 * 28 * 28 >>> len(train_labels), len(test_labels) (60000, 10000) # 訓練集與測試集各有 60000 與 10000 筆 labels >>> train_labels array([5, 0, 4, ..., 5, 6, 8], dtype=uint8) # train_labels 裝 60000 筆資料對應的解答(0-9 的數字) 我們可以利用 matlabplot 把圖片印出來看看 plt.matshow(train_images[0], cmap = plt.get_cmap('gray')) plt.show() 用 Dense 層建構神經網路首先我們需要建立神經網路架構，層(layer)是組成神經網路的基本元件，一個層就是一個資料處理的模組。具體而言，每一層都會從資料中萃取出特定的轉換或表示法，經過數層的資料萃取(data distillation)後，將資料「過瀘」成最後特定的轉換或表達(representation)。 ...

[ML] Start Tensorflow Environment with Conda

環境建置安裝 Anaconda 創建虛擬環境 conda create -n tensorflow 進入虛擬環境 (macOS/Linux) source activate tensorflow 在環境內安裝 tensorflow pip install tensorflow 在環境內安裝 jupyter notebook pip install jupyter notebook 在環境內安裝 pandas pip install pandas 開啟 jupyter notebook jupyter notebook For terminal user 開始 Anaconda.Navigator 在 Environments 中安裝指定模組 ex.tensorflow, keras 在 terminal 中輸入 conda activate {環境名稱} conda activate tensorflow 開啟 python python 若成功便會顯示 python 安裝資訊 Python 3.11.5 (main, Sep 11 2023, 08:17:37) [Clang 14.0.6 ] on darwin Type "help", "copyright", "credits" or "license" for more information. 大功告成，接著嘗試訓練第一筆資料 ...

[ML] 01. 機器學習基本概念簡介

前言什麼是機器學習機器學習(Machine Learning)，就是利用機器的力量幫忙找出函式。 Input 可以是 vector matrix sequence Output 可以是 Regression Classification Structed Learning(令機器產生有結構的東西 eg. text, image) 示意圖什麼是深度學習深度學習(Deep Learning)，就是利用神經網路(neural network)的方式來產生函數。機器如何學習 1. 基本原理(訓練三步驟) Step 1: 使用合適的 Model \(y=f(\text{\red{data}})\) Function with unknown parameters Model: \(\boxed{y=b+wx_1}\) \(w: \text{weight}\) \(b: \text{bias}\) \(x: \text{feature}\) Step 2: 定義 Loss function Define loss from training data 以 Model 的參數 \(w,b\) 來計算 Loss 物理意義：Loss 愈大代表參數愈不好，Loss 愈小代表參數愈好。計算方法：求估計的值與實際的值(label)之間的差距 Loss function: \(\boxed{L=\frac{1}{N}\sum_ne_n}\) MAE (mean absolute error): \(e=|y-\hat{y}|\) MSE (mean square error): \(e=(y-\hat{y})^2\) Cross-entropy: 計算機率分布之間的差距 Error Surface: 根據不同的參數，計算出 loss 所畫出來的等高線圖。 Step 3: Optimization 找到 loss 最小的參數組合 \((w,b)\) 方法：Gradient Descent \(\boxed{w’ = w - \red{\eta}\frac{\partial L}{\partial w}|_{w=w^0,b=b^0}}\) \(\boxed{b’ = b - \red{\eta}\frac{\partial L}{\partial b}|_{w=w^0,b=b^0}}\) \(\red{\eta}\): 學習率 learning rate, 決定 gradient descent 的一步有多大步 2. Linear Model \(\boxed{f\leftarrow y=b+\sum_{j=1}^{n}{w_jx_j}}\) 不只考慮前一天的觀看人數 \(x_1\)，也考慮前二~七天 \(x_2, x_3, … , x_7\)。當參數變多時，命中率可望有效提升。 3. Piecewise Linear Curves(Sigmoid) \(\text{Sigmoid Function:} \boxed{y=\red{c}\frac{1}{1+e^{-(\green{b}+\blue{w}x_1)}}}=\boxed{\red{c}\text{ sigmoid}(\green{b}+\blue{w}x_1)}\) 將 \(w_ix_i\) 替換成 \(c_i\text{ sigmoid}(b_i+w_ix_i)\) 特徵為1時，\(\boxed{y=b+\sum_i{c_i\text{ sigmoid}(b_i+ w_ix_1)}}\) 特徵>1時，\(\boxed{y=b+\sum_i{c_i\text{ sigmoid}(b_i+\sum_j w_{ij}x_j)}}\) 意義：一條曲線可以由多個鋸齒狀的線段(hard sigmoid)的總合，我們可以用 sigmoid 函數來逼近 hard sigmoid。事實上，sigmoid 的個數就是神經網路中一層 neuron 的 node 數，至於使用幾個 sigmoid 是 hyper parameter。可將公式轉成矩陣計算+激勵函數的形式：以線性代數方式表示：\(\boxed{y=b+c^T\sigma(b_i+Wx)}\) 將 \(b\)、\(b_i\)、\(W\)、\(c^T\) 等所有參數統稱為 \(\theta\) 故 Loss 可表示成 \(L(\theta)\) 重覆 gradient descent 的方法，更新(update) 參數。梯度 gradient，\(g=\) \(\begin{bmatrix}\frac{\partial L}{\partial \theta_1}|_{\theta=\theta^0}\\\frac{\partial L}{\partial \theta_2}| _{\theta=\theta^0}\\\vdots\end{bmatrix}=\nabla L(\theta^0)\) 更新(update)計算：\(\begin{bmatrix}\theta_1^1\\\theta_2^1\\\vdots\end{bmatrix}\leftarrow\begin{bmatrix}\theta_1^0\\\theta_2^0\\\vdots\end{bmatrix}-\begin{bmatrix}\eta \frac{\partial L}{\partial \theta_1}|_{\theta=\theta^0}\\\eta\frac{\partial L}{\partial\theta_2}| _{\theta=\theta^0}\\\vdots\end{bmatrix}\) 或寫成 \(\theta^1\leftarrow \theta^0-\eta g\) batch training 將樣本依批次(batch)進行更新，當所有的 batches 都跑過一遍，稱為一個 epoch 4. ReLU 用 hard sigmoid 的方式來表示。其每一個 hard sigmoid 由兩個 Rectified Linear Unit(ReLU) 組成，每一個 ReLU 寫成：\(\boxed{\red{c}\text{ max}(0,\green{b}+\blue{w}x_1)}\) 故 Model 可以寫成：\(\boxed{y=b+\sum_{\red{2}i}\text{max}(0,b_i+\sum_j{w_{ij}x_j})}\) 其中我們選用來逼近的函式，稱為 Activation function。深度學習 Neural Network \(\boxed{y=b+c^T\sigma(b_i+Wx)}\) Multiple hidden layers -> Deep learning

[ML] 簡單實作測試

線性迴歸建模載入資料 import pandas as pd import matplotlib.pyplot as plt import matplotlib as mlp url = "sample.csv" data = pd.read_csv(url) x = data["x-axis"] y = data["y-axis"] 畫圖 def plot(x, y, w, b): line = w * x + b plt.plot(x, line, color="red", label="prediction") plt.scatter(x, y, color="blue", label="data", marker="x") plt.title("Title") plt.xlabel("x Axis") plt.ylabel("y Axis") plt.xlim([0,12]) plt.ylim([20,140]) plt.show() plot(x, y, 10, 20) 定義 cost function def cost_function(x, y, w, b): y2 = w * x + b cost = (y - y2) ** 2 return cost.mean() cost_function(x, y, 10, 20) 假設在 b = 20 的情形下，找 w 的最小值 w_arr = [] costs = [] for w in range(-100, 101): w2 = 10 + w/100 cost = cost_function(x, y, w2, 20) w_arr.append(w2) costs.append(cost) import matplotlib.pyplot as plt plt.title("cost function - when b = 20) plt.xlabel("w") plt.ylabel("cost function") plt.plot(w_arr, costs) plt.show() 利用 numpy 計算矩陣 import numpy as np ws = np.arange(-100, 101) bs = np.arange(-100, 101) costs = np.zeros((201, 201)) i = 0 for w in ws: j = 0 for b in bs: cost = cost_function(x, y, w, b) costs[i,j] = cost j = j+1 i = i+1 print(costs) 畫 3d 圖 ax = plt.axes(projection="3d") ax.xaxis.set_pane_color((1,1,1)) ax.yaxis.set_pane_color((1,1,1)) ax.zaxis.set_pane_color((1,1,1)) plt.figure(figsize=(7,7)) ax.view_init(30, -110) b_grid, w_grid = np.meshgrid(bs, ws) ax.plot_surface(w_grid, b_grid, costs, cmap="Spectral_r", alpha=0.7) ax.plot_wireframe(w_grid, b_grid, costs, alpha=0.1) ax.set_title("loss function") ax.set_xlabel("w") ax.set_ylabel("b") ax.set_zlabel("loss") w_index, b_index = np.where(costs == np.min(costs)) ax.scatter(ws[w_index], bs[b_index], costs[w_index, b_index], color="red", s=40) plt.show() 計算梯度 \(\text{cost} = (\text{y}_\text{pred}-\text{y})^2\\ \text{cost} = (\text{y}-(\text{w}\times\text{x}+\text{b}))^2\\ \text{m} _\text{w} = -2\times\text{x}(\text{y-wx-b})\\ \text{m} _\text{b} = -2\times(\text{y-wx-b})\\ \) def compute_gradient(x, y, w, b): w_gradient = 2*x*(w*x+b-y).mean() b_gradient = 2*(w*x+b-y).mean() return w_gradient, b_gradient 利用梯度下降計算 cost 最小值 \(\text{w}_2=\text{w}-\text{m} _\text{w} \times \text{learning\_rate}\) \(\text{b}_2=\text{b}-\text{m} _\text{b} \times \text{learning\_rate}\) learning_rate = 0.001 for i in range(10): w_gradient, b_gradient = compute_gradient(x, y, w, b) w = w - w_gradient * learning_rate b = b - b_gradient * learning_rate cost = cost_function(x, y, w, b) print(f"Iteration {i} : Cost {cost}, w: {w}, b: {b}") gradient_descent 函式 def gradient_descent(x, y, w_init, b_init, learning_rate, cost_function, gradient_function, run_iteration): c_hist = [] w_hist = [] b_hist = [] w = w_init b = b_init for i in range(run_iteration): w_gradient, b_gradient = gradient_function(x, y, w, b) w = w - w_gradient * learning_rate b = b - b_gradient * learning_rate cost = cost_function(x, y, w, b) w_hist.append(w) b_hist.append(b) c_hist.append(cost) return w, b, w_hist, b_hist, c_hist 多特徵的預測 from sklearn.model_selection import train_test_split scaler = StandardScaler() scaler.fit(x_train) x_train = scaler.transform(x_train) x_test = scaler.transform(x_test) x_real = np.array([[5.3, 2, 1, 0], [7,2, 0, 0, 1]]) x_real = scaler.transfrom(x_real) y_real = (w_final*x_real).sum(axis=1) + b_final y_real 「特徵縮放」加速 gradient descent w1x1+w2x2+w3x3+w4x4+b 因分布範圍不同，調整參數，最好令每一個乘積都相當相當於是標準化：\(\frac{\text{x-平均值}}{標準差}\) from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(x_train) x_train = scaler.transform(x_train) x_test = scaler.transform(x_test) 邏輯迴歸 Logistic Regression Sigmoid Function 當模性呈現 0-1 關係(邏輯迴歸)時可用 \(\text{Sigmoid Function}=\frac{1}{1+e^{-z}}\) def sigmoid(z): return 1/(1+np.exp(-z)) w = np.array([1,2,3,4]) b = 1 z = (w*x_train).sum(axis=1) + b sigmoid(z)

Oh! You closed up the window, so you cannot see raining

[ML] 機器學習與統計學

Introduction to ML 統計學與機器學習差在哪裡? 同: 將資料(data)轉為資訊(info) 異: 有無強烈的人為事先假設統計學統計學是在資料分析的基礎上，研究如何測定、收集、整理、歸納和分析反映資料，以便給出正確訊息的科學。機器學習機器學習演算法是一類從資料中自動分析獲得規律，並利用規律對未知資料進行預測的演算法。 \(\begin{array}{lll} \text{Item} & \text{Statistics} & \text{Machine Learning}\\\hline \text{特性} & \text{伴隨事前假設，依賴明確規則，以模型定義資料關聯性，重視模型解釋性} & \text{幾乎無視前假設，不依賴明確規則，相信經驗}\\ & \text{事前假設(人)}\rightarrow\text{模型估計(機器)} & \text{特徵萃取(機器)}\rightarrow\text{網路建構(機器)} \\\hline \text{優點} & \text{模型可解釋} & \text{不須事先假設或了解資料關聯性}\\ & \text{推論有強烈理論根據} & \text{可抓取資料的所有(幾乎)複雜特徵}\\ & \text{符合事前假設前提下，可做更多的推論}\\ & \text{符合事前假設前提下，不需大量資料} \\\hline \text{缺點} & \text{所有推論接基於事前假設，常難以驗證假設的正確性} & \text{模型難以解釋(黑盒子)}\\ & \text{難以抓取資料中過於複雜的特徵} & \text{推論無強烈理論根據} \\\hline \text{專家} & \text{統計背景} & \text{資訊背景及統計背景} \\\hline \end{array}\) 結論統計模型的重點是有合理的事前假設在有合理假設之情況下，統計模型能發揮效力(即使資料量少) 機器學習的重點是大量有代表性的資料在有大量有效資料之情況下，機器學習能發揮效力(即使人類對資料間的關聯之了解並不多) 何時使用統計方法? 何時使用機器學習? 資料關聯性清楚，容易給予合適的模型假設時，建議使用統計模型資料無明確規則(如影像及語音辨識)，且資料量夠多時，建議使用機器學習方法(可以佐以人為提示) 統計與機器學習類似的專有名詞 \(\begin{array}{ll} \text{Statistics} & \text{Machine Learning} \text{response, dependent variable} & \text{label} \\\hline \text{covariate, explanatory variable, independent variable} & \text{feature} \\\hline \text{model} & \text{network} \\\hline \text{parameter, coefficient} & \text{weight} \\\hline \text{fitting} & \text{learning} \\\hline \text{refression, classification} & \text{supervised learning} \\\hline \text{density estimation, cluster} & \text{unsupervised learning} \\\hline \end{array}\) ...

[ML] introduction

什麼是 AI & ML & DL 人工智慧是我們想要達成的目標，而機器學習是想要達成目標的手段，希望機器通過學習的方式，變得跟人一樣聰明。而深度學習就是機器學習的其中一種方法。人工智慧(Aritificial Intelligence, AI) → 目標機器學習(Machine Learning, ML) → 手段深度學習(Deep Learning, DL) … 在機器學習出現之前生物的行為取決於兩件事，一個是後天學習的結果，一個是天生的本能。 Hand-crafted rules: 人類為機器設定好的天生本能僵化，無法超越創造者需要大量人力，不適合小企業機器學習寫程式讓機器可以學習 → 尋找關聯資料的函式舉例：語音辨識、影像辨識、Alpha Go、對話機器人框架(Framework) 設定一定量的函數餵入數據評估函數的好壞找出最好的函數 \(\begin{array}{rc} \text{step1}&\boxed{\text{Define a set of function}}\\ &\downarrow\\ \text{step2}&\boxed{\text{Evaluate goodness of function}}\\ &\downarrow\\ \text{step3}&\boxed{\text{Pick the best function}}\end{array}\) 告訴機器 input 和正確的 output 這就叫作 supervised learning。機器學習相關的技術任務(Task) 迴歸(Regression) Regression 指的是函數的輸出為 scalar(數值)，如 PM2.5。分類(Classification) Classification 指的是函數的輸出為東西的類別。當分類為 Yes or No，則為 Binary Classificatino，如垃圾郵件。當分類是多個選項的，則為 Multi-Classification，如新聞分類。結構性學習(Structured Learning) 讓機器的輸出具有結構性。如語音辨識，聲音訊號為輸入，句子為輸出。如影像辨識，圖片是輸入，人名是輸出。方法(Method) 選不同的 function set 就是選不同的 model。 ...