🍵

What are you looking for 🌱

Bayesian Hierarchical Model 1

谁知道有没有2呢
两个常见的贝叶斯hierarchical model,主要是记录一下期望和方差的推导,熟悉一下常见的分布 -_-

Beta-Binomial distribution

Data model:

Process model:

补充Beta distribution

对于:


顺便回顾了分布的性质

期望和方差

E(X):

Var(X):

这里面用到了Beta分布的方差推断公式,为了防止我以后又忘了,也记一下:

Poisson-Gamma distribution

Data model:

Process model:

补充Gamma distribution

对于

期望和方差

E(X):

Var(X):

Over ^-^/

Validation for Bayesian model

一个非常不完善的总结/记录 for my painful discovery of Bayesian model validation
可能还有后续

说实话,model validation的总体思路就是split dataset into training and testing data. 然后利用training data在testing data的prediction去判断。有了这个整体思路,就是看需要解决的问题:

  • 怎么划分dataset:training/testing data的定义;在不同研究中结合实际。
  • 如何做training data在testing data的prediction:后验预测这一块(可能是该篇的重点内容)
  • 得到prediction之后怎么判断和testing data的observation:常用的validation判断,median/mean/95% predicted interval等
  • 如果没有达到预期怎么调整(目前我也不知道(笑)) 暂时不涉及

Definition of validation

虽然说到validation肯定都知道是什么,但是还是在这里贴一下(不一定)官方的:

Validation is the process of assessing how well a model performs on held-out data.

Why need validation? measure its predictive accuracy, for its own sake or for purposes of model comparison, selection, or averaging (Vehtari et al. 2017).

Splitting dataset

首先,关于training/testing data的定义先奉上:

  • Training data: Data used to fit the model
  • Testing data: Data left-out during model fitting, used to evaluate model performance

非常的简单明了,但是,结合实际的时候就要分情况了。但是一般来说,都是80% training, 20% testing.
(这里是用前80%作为training, 后20%作为testing.) 但是!有时候划分不能很精确的按照这个比例,至于如何划分,也是一个值得研究的问题,但这里就不展开研究了 (因为我不会呢还。。。

Posterior predictive sample

Brief eg.

举例代码:

1
2
3
4
5
6
7
8
9
10
11
set.seed(123)
n <- 100
x.i <- 1:n
year.i <- seq(1990, 1990 + n - 1)
y.i <- x.i + rbeta(n, 1, 10) * 3 # 这个按理说是未知的,只给出y.i

data.full <- data.frame(x = x.i, year = year.i, y = y.i)

# 拆分 training 和 full dataset
train.cut <- quantile(data.full$year, 0.8) # 80% training
data.train <- subset(data.full, year <= train.cut) # training set

这里用的jags建模(模型随意),然后parameters.to.save = c("beta0", "beta1")得到后验抽样:

1
2
3
b0.train <- fit.train$BUGSoutput$sims.list$beta0
b1.train <- fit.train$BUGSoutput$sims.list$beta1
sigma_train <- fit_train$BUGSoutput$sims.list$sigma

Posterior predict

后面就是对应testing data里面的每一条数据,找到对应year值匹配,得到一个matrix: row = length(testing data), col = predicted sample.

1
2
3
4
5
6
7
8
for (yr in data.valid.years) {
row <- data.full[data.full$year == yr, ]
x_val <- row$x
y_obs <- row$y

# Training 模型预测
mu_draws <- b0.train + b1.train * x_val
}

重点来了,pred.train在这里是期望值(),也就是latent mean,不是概率密度(posterior predictive density).
如果要得到后验概率likelihood:

1
2
# 把预测值当作分布的参数,去计算某个观测值在该分布下的密度或者概率
yrep <- rnorm(draw_length, mean = mu_draws , sd = sigma_train)

Summary

上面提到了1)预测的均值,是sample里面每一条数据的样本均值,用来获得posterior sample的均值预测区间等。2)分布概率,做 WAIC、LOO-CV、预测验证,需要结合 posterior 估计的 (精度)或(标准差)来计算。
比如这里,如果要和testing data (observation真实数据)对比,也就是预测验证,所以要考虑observation的噪声().

Validation

这里就是看predicted sample在testing data的预测情况,考虑下面两个指标:

  • Median/mean value
  • 95% PI

Error

这里包括Mean error 和 Absolute mean error.

1
2
3
4
5
6
# mean error
pred_mean[j] <- mean(yrep)
residuals[j] <- test$y[j] - pred_mean[j]

# abs mean error
mae <- mean(abs(residuals))

Coverage

如果选择95%PI,那么coverage的期望值就是95%(below下限和above上限的都是5%)。

1
2
3
4
pred_lower[j] <- quantile(yrep, 0.025)
pred_upper[j] <- quantile(yrep, 0.975)

coverage95 <- mean(test$y >= pred_lower & test$y <= pred_upper)

End

如果要看training data和full data训练的模型差异,就用full data fit相同的模型,然后用predicted value对比(不需要obs error)。使用同样指标。

在validation过程中,如果预测不达预期(error过大或coverage覆盖太少),首先考虑模型,error过大就是模型选择问题了,可以考虑变量相关(比如控制变量);如果error很好但是coverage太小(低于60%这样),就要看看有没有考虑observation error!在这之前,可以先画一个PIT图,如果是U型,那就是观测误差的问题。 fit_train$BUGSoutput$sims.list$sigma在取后验样本加上这个东西。

关于pesterior predictive sample的生成,dnorm or rnorm 这个问题询问的是G老师,给出的回答:

rnorm: 后验预测分布, 直接地得到你要的验证指标
dnorm: 基于密度的评分规则(比如 log predictive density、elpd),
那时再用 dnorm(y_obs, mean=mu_draw, sd=sigma_draw) 求密度并取对数、对样本求平均就行

嗯,就是密度函数和随机生成函数的区别。具体的讲解这里讲的挺清楚的。

Reference

  1. Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical bayesian model evaluation using leave-one-out cross-validation and waic Statistics and computing, 27(5):1413–1432.
  2. R语言中dnorm, pnorm, qnorm与rnorm以及随机数。source

INLA installation

继上次在hpc上安装了jags后,自以为不会再踩坑任何的r package install,没想到inla又来一遍,不知道和界园拿不到特种券和这个谁更恶心一筹^^

官方安装教程

链接:R-inla project
找到对应的版本在HPC的R环境里安装。

但是很明显,事情没有这么简单:

  • HPC能否连接外部网络(过滤规则非常抽象)
  • 下载超时

所以主要想记录一下官网下载 tar 之后本地解压安装方法~

本地安装

下载

地址: inla download
至于该下哪个版本:
inla-R

传输到HPC

没什么好说的,新建一个 dir 然后 winSCP 操作一下就传上去了

安装

1
2
R
install.packages("./INLA_24.05.10.tar.gz", repos = NULL, type = "source")

这样可以保证不会出现网络问题/超时/无法访问等,强烈推荐(真

检测一下安装情况

1
2
3
4
5
R
library(inla)
# 这里提醒我缺失 fmesher
# 解决办法就是 conda/mamba install fmesher
# 再lib一下就ok

运行inla

根据 inla讨论,如果直接运行官网的example code会报错,还需要运行:

1
2
R
inla.binary.install()

然后选择对应的系统(CentOS, Ubuntu等)。

到这应该结束了(撒花)

Note

一些实用的

  • 设置安装的镜像
    1
    2
    3
    4
    5
    6
    7
    8
    R
    # 查看现有镜像
    options()$repo

    # 设置镜像
    file.edit(file.path("~",".Rprofile"))
    # 写入镜像
    options("repos"=c(CRAN="https://mirrors.pku.edu.cn/CRAN/","The Comprehensive R Archive Network","http://mirrors.aliyun.com/CRAN"))
  • 修改安装最大等待时间(默认60s能干什么请问)
    1
    2
    3
    R
    # 改成300再install
    options(timeout = max(300, getOption("timeout")))

Reference

  1. 如何在linux系统中用conda安装R环境及R包

  2. r-inla project

  3. inla conversation

Esitimating Child Mortality Rate

Age of Woman and Time Since First Birth are two typical methods to estimate Child mortality rate using the Summary Birth History (SBH) data. These two methods apply indicators of age of woman and time since first birth for woman to estimate the child mortality rate.

AOW focuses on the age distribution of women, particularly looking at the number of women in different age groups, to estimate mortality rates. TSFB examines the time elapsed since a woman’s first birth, using this interval to infer mortality rates.

Age of Woman - AOW

Using fertility data categorized by the age of the mother and adjust the results using Brass Logit transformations and the African Model Life Table. It assumes that the risk of dying is the same for all children within a certain age group of their mothers.

Time Since First Birth - TSFB

TSFB provides a method for estimating child mortality rates using the TSFB as a proxy for children’s exposure to the risk of dying.

How do config R in Slurm

[TOC]

Conda environment

  • Create environment
    1
    conda create -n R4.4-syf R=4.4.1 -y
    Create R environment using 4.4.1 version.
  • Check current environments
    1
    conda env list
  • Activate & Deactivate environment
    1
    2
    3
    4
    5
    # activate
    conda activate env_name # change to env_name created before
    conda activate R4.4-syf # eg.
    # deactivate
    conda deactivate R4.4-syf
  • Delete env
    1
    conda remove -n env_name --all -y

Install jags

  1. Download and Tar
    Download package from here
    1
    2
    3
    4
    5
    6
    7
    mkdir ./JAGS-4.3.2
    tar -zxf ./jags_4.3.2.orig.tar.gz
    cd ./JAGS-4.3.2
    ./configure
    make
    make check
    make install
  2. Conda install rjags
    After tar JAGS, conda install rjags in command.
    1
    conda install -c conda forge r-rjags
  3. Install package in R
    Activate R environment using command in shell: R
    1
    2
    3
    install.packages("R2jags")
    library(rjags)
    library(R2jags)

Running R script using Shell in Slurm

The test shell script: submit_test.sh

R script: test_hpc_parallel.R

To run the R script using conda environment, the shell script:

1
2
3
4
5
6
7
module purge
module load anaconda3/2021.11
conda activate /home/224030234/R/R4.4-syf

cd /home/224030234/R
#run the application:
Rscript test_hpc_parallel.R

Following error will occur:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

$ conda init <SHELL_NAME>

Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.

To solve this, source conda.sh before activate, and use the full path for the R scipt.

The revised shell script:

1
2
3
4
5
6
7
module purge
module load anaconda3/2021.11
source ~/anaconda3/etc/profile.d/conda.sh
conda activate /home/224030234/R/R4.4-syf

#run the application:
Rscript /home/224030234/R/test_hpc_parallel.R

Others

  1. conda install freezing
    When install r-pkg, solving environment freezing often occurs, to solve this, replace conda install with mamba.

    What is mamba

  2. conda channels operations

    1
    2
    conda config --show channels # show current channels
    conda config --remove channels defaults # remove defaults channel

    To add new channels:

    1
    2
    3
    vim ./condarc
    # add in condarc via vim
    conda config --show channels # check channels added status

Reference

  1. infercnv安装全流程(包含jags等所有踩坑指南)
  2. conda常用命令
  3. shell脚本激活conda虚拟环境
  4. conda中如何移除默认源

Time Series analysis: AR(1) process

自回归过程

考虑是由决定,即。假设

其中,随机干扰项
序列的稳定性由决定波动程度决定。


一般地,考虑序列值可由前p个时刻的序列值表示,即

其中,;是一个p阶自回归模型AR(P), 是自回归系数。

一阶自回归过程AR(1)

一阶自回归表达式:

其中,
求方差得:


根据上得出:
对(1)两边同时乘以,并求期望得:

即:

根据独立于,且均值为0,所以,即:

将k带入求得




即:

自相关系数只与间隔(k)有关,因此是平稳(stationary)的。

模拟过程

对于不同,模拟过程并绘制自相关图(序列 自相关函数):
AR(1)模拟

代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
set.seed(123)

simulate_AR1 <- function(rho, sigma = 1, T = 100) {
y <- numeric(T)
y[1] <- rnorm(1, 0, sigma / sqrt(1 - rho^2)) # stationarity
for (t in 2:T) {
y[t] <- rho * y[t-1] + rnorm(1, 0, sigma)
}
return(y)
}

rhos <- c(0, 0.3, 0.7, 0.95)
par(mfrow = c(4, 2), mar = c(4, 4, 2, 1), cex.main = 1.5)

for (rho in rhos) {
y <- simulate_AR1(rho)

# 时间序列图
plot(y, type = "l", main = paste("AR(1) with rho =", rho),
ylab = "y", xlab = "Time")

# 自相关图
acf(y, main = paste("ACF: rho =", rho))
}

根据上图,自相关系数越小,ACF衰减越快,相反,ACF衰减越慢。

B-Splines

Lagrange interpolation

Consider 2 points and


is a random point between and :

where and are base functions.

Consider 3 points , and


Knowing that these 3 points will get a certain quadratic function. To get this function, for every point, there is ,

The result for interpolation is:

To fit that ,

The sum for these 3 points:

Largragne interpolation


For points, , .

  • , therefore, it contains all the points
  • is the base function.

Bezier curve

Given 3 points , assume in the line :



Hence, .


Suppose there are points, the Bezier curve is:

The is determained by the former points’ ’ Bezier curve and later points :

B-Splines

Basic concepts:

  • Control points: control the shape of curves. Suppose there are control points: .
  • Knot: affect for the weight, suppose knots, the curve is devided into pieces.
  • Degree & Order: order = degree + 1, degree is usually denoted as .


where represents the -th points’ weight function.


How to get (de Boor):

B-splines regression

For multilinear regression, , where .\
For splines estimates, assume:

where is the base spline function.


For more details, see here for reference.

Reference

  1. Introduction of Lagrange interpolation.

  2. 陈广雷, 王兆军. 多元部分线性模型的B-样条估计[J]. 应用概率统计, 2010, 26(2): 138-150.

Train your Own Detection Model - YOLO

Continuing from the previous blog 1 that successfully detecting objects like cats and dogs but failing to detect Saber. This blog will apply Yolo v8 to train a model from data collection & labeling, training & validating and finally predicting the object result.

Background of Yolo

YOLO: You Only Look Once

The YOLO model (Redmon et al., 2016) is the very first attempt at building a fast real-time object detector. The main feature of YOLO algorithm is that it treats the object detection problem as a regression problem, directly predicting multiple bounding boxes and category probabilities in the image through a single network forward propagation.

Basic Concept

YOLO abandons the sliding window and directly divides the original image into SS non overlapping small squares. Then, through convolution, the SS feature map is generated, where each element of the feature map corresponds to a small square in the original image.

  1. How to identify the Object
    The training result can be viewed as a classification, and the result is . If the prediction is the same as the target then , otherwise .


  2. Determain the Border and Position
    The coordinates of bounding box are defined by a tuple of 4 values, (center x-coord, center y-coord, width, height) - . and are normalized by the image width and height, and thus all between (0, 1].


  3. Confidence Scores of Border
    Confidence scores includes two parts: probability of target included and precision of border. The former one could be represented by , and the later one is the ratio of prediction and ground truth called intersection over union (IOU).

    And the confidence scores .


  4. Included Target or Not
    For each cell, it is also necessary to provide predicted probability values for C categories, which represent the probabilities of the bounding boxes predicted by that cell belonging to each category. These probability values are actually conditional probabilities () at the confidence levels of each bounding box. The class-specific confidence scores can be calculated:


  5. Summary
    Image size: ,
    Bounding boxes numbers: ;
    In total, one image contains bounding boxes, each box corresponding to 4 location predictions, 1 confidence score, and C conditional probabilities for object classification. Every bounding box needs , so the final prediction for a image is , which is the tensor shape of the final conv layer of the model.


Network Architecture
YOLO uses convolutional networks to extract features, and then uses fully connected layers to obtain predicted values which is similar to GooLeNet.
GooLeNet
The final prediction is produced by two fully connected layers over the whole conv feature map.

Loss Function
The loss calculation of the model includes three aspects: positioning loss, classification loss, and confidence loss.

Position: 𝟙

Classification: 𝟙

Object: 𝟙𝟙

𝟙 indicates whether the j-th bounding box of the cell i is “responsible” for the object prediction.

The total loss:

(The detailed explanation for yolov5 can be found here)

Training the Model

Environment

Python: 3.8

Yolo model: cv2, yolo v8

Label: labelImg for labelling the training dataset

Dataset

Here I downloaded around 150 images from google with keywords ‘saber’ for Yolo model training. With the help of LabelImg, images had been labeled with Yolo format. After processing, each image will pair with a txt file with format: <class_id> <x_center> <y_center> <width> <height>
For example:
example

and the txt file for this image is : 0 0.545000 0.332447 0.790000 0.491135

There is also a calss file to save the calssification catrgories, here I only have one target so the class only contained saber.

Training

  1. Dataset Allocation

    Allocating about 60% images to train, 20% to valid and 20% to predict, and put the label files as well. The class txt should be at the same level as above. The directory for the images and txts is:

    • Dataset
      • images
      • labels
      • class.txt
  2. Config the Yaml

    The yaml tells the location for dataset and the number of class in this model training.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    train: D:/PyCharm Community Edition 2024.1.3/TechBlog/dataset/images/train  
    val: D:/PyCharm Community Edition 2024.1.3/TechBlog/dataset/images/val
    test: D:/PyCharm Community Edition 2024.1.3/TechBlog/dataset/images/test

    # number of classes
    nc: 1

    # class names
    names: ['saber']
  3. Model training

    With the downloaded yolov8n.pt, it’s time to start to train.

    1
    2
    3
    4
    5
    6
    7
    8
    from ultralytics import YOLO

    if __name__ == '__main__':
    # Load a model
    model = YOLO('yolov8n.pt') # load a pretrained model

    # Train the model
    model.train(data='./dataset/data.yaml', epochs=270, imgsz=320)

    After 270 epochs training, model for detecting ‘saber’ has been saved.

  4. Validation

    Code for valid dataset:

    1
    2
    3
    4
    5
    6
    7
    8
    from ultralytics import YOLO

    if __name__ == '__main__':
    # Load a model
    model = YOLO('D:/PyCharm Community Edition 2024.1.3/TechBlog/runs/detect/train9/weights/best.pt')

    # Validate the model
    metrics = model.val()

    And the result for this dataset:

    val_res
    If we go to the directory to see the predicted results in /val, we will find the model successfully detects all the images!
    val_pred

Resolve the Legacy issue

Now, to see how this works to solve the problem in the Blog 1. Let the model detect the input image ‘saber’ to see the result.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from PIL import Image
from ultralytics import YOLO

if __name__ == '__main__':
# load the model
model = YOLO("./runs/detect/train9/weights/best.pt")

# from PIL loading img
im1 = Image.open("saber.jpg")
results = model.predict(source=im1, save=True)

# using loop to print the position result of detected target
for result in results:
boxes = result.boxes # bounding box

for box in boxes:
# position
x_min, y_min, x_max, y_max = box.xyxy[0]
# top-left & bottom-right
print(f"Detected object at: x_min={x_min}, y_min={y_min}, x_max={x_max}, y_max={y_max}")

Time to see the result :-)
saber_detect
Good 🎉



Reference

[1] Joseph Redmon, et al. “You only look once: Unified, real-time object detection.” CVPR 2016.

[2] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going Deeper with Convolutions. ArXiv. https://arxiv.org/abs/1409.4842

[3] Loss function detail in Yolov5 blog

[4] YOLO code source from github

Basic intro and experiment of CV

In this blog, a simple CV process will be applied for object detection. From a basic image gradient vector, followed by image segmentation, and finally an example of object detection using VGG.

Image Gradient Vector

  • Gradient: The direction of gradient is the greatest change rate of the function, which is used to find the extremum.

  • Image Gradient Vector: Take the image as a function, gradient can be used to measure the pixel‘s change rate. Image gradient can be regarded as a two-dimensional discrete function, and image gradient is actually the derivative of this two-dimensional discrete function .

    Take the Sobel operator for an example, which mainly used for edge detection, is a discrete difference operator used to calculate the grayscale approximation of the image gradient function.



Calculation for an image .

  • Image Process Example: here using an example to see the difference between applying and to process image pixel.
    The example image is: mygo
    Code for this:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    import numpy as np
    import scipy
    import scipy.signal as sigs
    import matplotlib.pyplot as plt


    img = scipy.misc.imread("mygo1.jpg", mode="L")

    # Define the Sobel operator kernels.
    kernel_x = np.array([ [-1, 0, 1],[-2, 0, 2],[-1, 0, 1] ])
    kernel_y = np.array([ [1, 2, 1], [0, 0, 0], [-1, -2, -1] ])

    G_x = sig.convolve2d(img, kernel_x, mode='same')
    G_y = sig.convolve2d(img, kernel_y, mode='same')

    # Plot
    fig = plt.figure()
    ax1 = fig.add_subplot(121)
    ax2 = fig.add_subplot(122)

    # the transformation (G_x + 255) / 2.
    ax1.imshow((G_x + 255) / 2, cmap='gray'); ax1.set_xlabel("Gx")
    ax2.imshow((G_y + 255) / 2, cmap='gray'); ax2.set_xlabel("Gy")
    plt.show()

    The result shows the comparation of using and .

    gradient

Image Segmentation

Felzenszwalb’s Algorithm was proposed for segmenting an image into similar regions. Each pixel is a vertex and then gradually merged to create a region, and the connection between each pixel is a minimum spanning trss (MST).

  • How to Balance Difference bwtween two Pixels
    Difinations:
    • Internal Difference: , which represents the edge with the greatest dissimilarity in MST.
    • Difference between two components:
      , which represents the dissimilarity of the edge that connects all the edges of the two regions, the dissimilarity of the edge with the least dissimilarity. The dissimilarity of the two regions where they are most similar.

The standard for merging two regions is:

Only when and are able to stand the , they will be segmented into different regions. Otherwise, they are regarded as in the same region.

  • Procedures of the Algorithm
    Given , and .

    1. edges are sorted by dissimilarity (non-desconding), labeled as ,
    2. choose ,
    3. determines the currently selected edges for merging if meets:
      1. the degree of dissimilarity is not greater than the degree of dissimilarity within the two , then step 4. Otherwise, straight to step 5.
    4. update thresholds and class designators:
      class designators: -> ,
    5. if , select the next edge to go to step 3.
  • Example Code
    Applying skimage segmentation to segment the example image. Set k = 100 and 500 to see how it controlls merge-region size for showing.

    The example image used for segmentation is:
    iceland

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    import numpy as np
    import scipy
    import scipy.signal as sig
    import skimage.segmentation
    from matplotlib import pyplot as plt

    img2 = scipy.misc.imread("iceland.jpg", mode="L")
    segment_mask1 = skimage.segmentation.felzenszwalb(img2, scale=100)
    segment_mask2 = skimage.segmentation.felzenszwalb(img2, scale=1000)

    fig = plt.figure(figsize=(12, 5))
    ax1 = fig.add_subplot(121)
    ax2 = fig.add_subplot(122)
    ax1.imshow(segment_mask1); ax1.set_xlabel("k=100")
    ax2.imshow(segment_mask2); ax2.set_xlabel("k=500")
    fig.suptitle("Felsenszwalb's efficient graph based image segmentation")
    plt.tight_layout()
    plt.show()

    Segment results (k = 100 & k = 500):
    seg

Image Classification

  • CNN for Image Classification
    Convolution operation: As we learned from the course, in short, convolution applies element-wise multiplication for the vector/ matrix and then
    summation.

  • VGG (Visual Geometry Group)

    Using 3*3 convoluton layer and 2*2 pooling layer. VGG has two structures, namely VGG16 and VGG19, and there is no essential difference between the two, but the network depth is different.

    Why small size works better: each convolutional layer passes through an activation function. The activation function is a nonlinear transformation. The ability to be non-linear is stronger.

  • Example Code

    Here I use VGG16 to implement a simple classficatoin for animals. The npy file and a class file for choosing results were downloaded from here.

    VGG16 contains 16 hidden layers (13 convolutional layers and 3 fully connected layers).

    The code for test is from here. I downloaded 3 images from google for test. Minor changed code for classifiication:

Click ME to Show Code


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
import numpy as np
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from scipy.misc import imread, imresize, toimage
import matplotlib.pyplot as plt
import skimage
import skimage.io
import skimage.transform
from imageClass import class_names

VGG_MEAN = [103.939, 116.779, 123.68]


class VGG16(object):
"""
The VGG16 model for image classification
"""

def __init__(self, vgg16_npy_path=None, trainable=True):
"""
:param vgg16_npy_path: string, vgg16_npz path
:param trainable: bool, construct a trainable model if True
"""
# The pretained data
if vgg16_npy_path is None:
self._data_dict = None
else:
self._data_dict = np.load(vgg16_npy_path, encoding="latin1", allow_pickle= True).item()
self.trainable = trainable
# Keep all trainable parameters
self._var_dict = {}
self.__bulid__()

def __bulid__(self):
"""
The inner method to build VGG16 model
"""
# input and output
self._x = tf.placeholder(tf.float32, shape=[None, 224, 224, 3])
self._y = tf.placeholder(tf.int64, shape=[None, ])
# Data preprocessiing
mean = tf.constant([103.939, 116.779, 123.68], dtype=tf.float32, shape=[1, 1, 1, 3])
x = self._x - mean
self._train_mode = tf.placeholder(tf.bool) # use training model is True, otherwise test model
# construct model
conv1_1 = self._conv_layer(x, 3, 64, "conv1_1")
conv1_2 = self._conv_layer(conv1_1, 64, 64, "conv1_2")
pool1 = self._max_pool(conv1_2, "pool1")

conv2_1 = self._conv_layer(pool1, 64, 128, "conv2_1")
conv2_2 = self._conv_layer(conv2_1, 128, 128, "conv2_2")
pool2 = self._max_pool(conv2_2, "pool2")

conv3_1 = self._conv_layer(pool2, 128, 256, "conv3_1")
conv3_2 = self._conv_layer(conv3_1, 256, 256, "conv3_2")
conv3_3 = self._conv_layer(conv3_2, 256, 256, "conv3_3")
pool3 = self._max_pool(conv3_3, "pool3")

conv4_1 = self._conv_layer(pool3, 256, 512, "conv4_1")
conv4_2 = self._conv_layer(conv4_1, 512, 512, "conv4_2")
conv4_3 = self._conv_layer(conv4_2, 512, 512, "conv4_3")
pool4 = self._max_pool(conv4_3, "pool4")

conv5_1 = self._conv_layer(pool4, 512, 512, "conv5_1")
conv5_2 = self._conv_layer(conv5_1, 512, 512, "conv5_2")
conv5_3 = self._conv_layer(conv5_2, 512, 512, "conv5_3")
pool5 = self._max_pool(conv5_3, "pool5")

# n_in = ((224 / (2**5)) ** 2) * 512
fc6 = self._fc_layer(pool5, 25088, 4096, "fc6", act=tf.nn.relu, reshaped=False)
# Use train_mode to control
fc6 = tf.cond(self._train_mode, lambda: tf.nn.dropout(fc6, 0.5), lambda: fc6)
fc7 = self._fc_layer(fc6, 4096, 4096, "fc7", act=tf.nn.relu)
fc7 = tf.cond(self._train_mode, lambda: tf.nn.dropout(fc7, 0.5), lambda: fc7)
fc8 = self._fc_layer(fc7, 4096, 1000, "fc8", act=tf.identity)

self._prob = tf.nn.softmax(fc8, name="prob")

if self.trainable:
self._cost = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(fc8, self._y))
correct_pred = tf.equal(self._y, tf.argmax(self._prob, 1))
self._accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
else:
self._cost = None
self._accuracy = None

def _conv_layer(self, inpt, in_channels, out_channels, name):
"""
Create conv layer
"""
with tf.variable_scope(name):
filters, biases = self._get_conv_var(3, in_channels, out_channels, name)
conv_output = tf.nn.conv2d(inpt, filters, strides=[1, 1, 1, 1], padding="SAME")
conv_output = tf.nn.bias_add(conv_output, biases)
conv_output = tf.nn.relu(conv_output)
return conv_output

def _fc_layer(self, inpt, n_in, n_out, name, act=tf.nn.relu, reshaped=True):
"""Create fully connected layer"""
if not reshaped:
inpt = tf.reshape(inpt, shape=[-1, n_in])
with tf.variable_scope(name):
weights, biases = self._get_fc_var(n_in, n_out, name)
output = tf.matmul(inpt, weights) + biases
return act(output)

def _avg_pool(self, inpt, name):
return tf.nn.avg_pool(inpt, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME",
name=name)

def _max_pool(self, inpt, name):
return tf.nn.max_pool(inpt, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME",
name=name)

def _get_fc_var(self, n_in, n_out, name):
"""Get the weights and biases of fully connected layer"""
if self.trainable:
init_weights = tf.truncated_normal([n_in, n_out], 0.0, 0.001)
init_biases = tf.truncated_normal([n_out, ], 0.0, 0.001)
else:
init_weights = None
init_biases = None
weights = self._get_var(init_weights, name, 0, name + "_weights")
biases = self._get_var(init_biases, name, 1, name + "_biases")
return weights, biases

def _get_conv_var(self, filter_size, in_channels, out_channels, name):
"""
Get the filter and bias of conv layer
"""
if self.trainable:
initial_value_filter = tf.truncated_normal([filter_size, filter_size, in_channels, out_channels], 0.0,
0.001)
initial_value_bias = tf.truncated_normal([out_channels, ], 0.0, 0.001)
else:
initial_value_filter = None
initial_value_bias = None
filters = self._get_var(initial_value_filter, name, 0, name + "_filters")
biases = self._get_var(initial_value_bias, name, 1, name + "_biases")
return filters, biases

def _get_var(self, initial_value, name, idx, var_name):
"""
Use this method to construct variable parameters
"""
if self._data_dict is not None:
value = self._data_dict[name][idx]
else:
value = initial_value

if self.trainable:
var = tf.Variable(value, dtype=tf.float32, name=var_name)
else:
var = tf.constant(value, dtype=tf.float32, name="var_name")
# Save
self._var_dict[(name, idx)] = var
return var

def get_train_op(self, lr=0.01):
if not self.trainable:
return
return tf.train.GradientDescentOptimizer(lr).minimize(self.cost,
var_list=list(self._var_dict.values()))

@property
def input(self):
return self._x

@property
def target(self):
return self._y

@property
def train_mode(self):
return self._train_mode

@property
def accuracy(self):
return self._accuracy

@property
def cost(self):
return self._cost

@property
def prob(self):
return self._prob


# returns image of shape [224, 224, 3]
# [height, width, depth]
def load_image(path):
# load image
img = skimage.io.imread(path)
img = img / 255.0
# assert (0 <= img).all() and (img <= 1.0).all()
# print "Original Image Shape: ", img.shape
# we crop image from center
short_edge = min(img.shape[:2])
yy = int((img.shape[0] - short_edge) / 2)
xx = int((img.shape[1] - short_edge) / 2)
crop_img = img[yy: yy + short_edge, xx: xx + short_edge]
# resize to 224, 224
resized_img = skimage.transform.resize(crop_img, (224, 224))
return resized_img


def test_not_trainable_vgg16():
path = "D:/PyCharm Community Edition 2024.1.3/TechBlog"
img1 = load_image(path + "/puppy.jpg") * 255.0
batch1 = img1.reshape((1, 224, 224, 3))

tf.compat.v1.disable_eager_execution()
with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
vgg = VGG16(path + "/vgg16.npy", trainable=False)
probs = sess.run(vgg.prob, feed_dict={vgg.input: batch1, vgg.train_mode: False})
for i, prob in enumerate([probs[0]]):
preds = (np.argsort(prob)[::-1])[0:5]
print("The" + str(i + 1) + " image:")
for p in preds:
print("\t", p, class_names[p], prob[p])


if __name__ == "__main__":
path = "D:/PyCharm Community Edition 2024.1.3/TechBlog"
img1 = load_image(path + "/puppy.jpg") * 255.0
batch1 = img1.reshape((1, 224, 224, 3))
x = np.concatenate((batch1), 0)
y = np.array([292, 611], dtype=np.int64)
with tf.Graph().as_default():
with tf.Session() as sess:
vgg = VGG16(path + "/vgg16.npy", trainable=True)
sess.run(tf.global_variables_initializer())

train_op = vgg.get_train_op(lr=0.0001)
_, cost = sess.run([train_op, vgg.cost], feed_dict={vgg.input: x,
vgg.target: y, vgg.train_mode: True})
accuracy = sess.run(vgg.accuracy, feed_dict={vgg.input: x,
vgg.target: y, vgg.train_mode: False})
print(cost, accuracy)

Example images for VGG16:

  1. Puppy

    puppy
    vgg1
    The results generated by VGG16 for the puppy was a “Japanese spaniel“.

  2. Cat

    cat
    res2
    The results generated by VGG16 for this cat was a “Egyptian cat“.

  3. Saber

    saber
    res3
    Sadly, it only implements object detection known in the class file to the image instead of recognition of Saber. To make it successful, dataset for her needs to be collected for training.

1
In the next Blog, I intend to apply YOLO for image classification, from training dataset to realize the image recognition. Hopefully, Saber will be recognized :).

Reference

[1] Gradient Vector

[2] Pedro F. Felzenszwalb, and Daniel P. Huttenlocher. “Efficient graph-based image segmentation.” Intl. journal of computer vision 59.2 (2004): 167-181.article

[3] 图像分割—基于图的图像分割 blog address

[4] Simonyan, K. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

[5] VGG in TensorFlow

0%