slurm

Posted on 2025-05-04 Edited on 2025-05-19

How do config R in Slurm

[TOC]

Conda environment

Create environment
1
conda create -n R4.4-syf R=4.4.1 -y
Create R environment using 4.4.1 version.
Check current environments
1
conda env list

Activate & Deactivate environment

# activate
conda activate env_name # change to env_name created before
conda activate R4.4-syf # eg.
# deactivate
conda deactivate R4.4-syf

Delete env
1
conda remove -n env_name --all -y

Install jags

Download and Tar
Download package from here

mkdir ./JAGS-4.3.2
tar -zxf ./jags_4.3.2.orig.tar.gz
cd ./JAGS-4.3.2
./configure
make
make check
make install

Conda install rjags
After tar JAGS, conda install rjags in command.
1
conda install -c conda forge r-rjags
Install package in R
Activate R environment using command in shell: R
1
2
3
install.packages("R2jags")
library(rjags)
library(R2jags)

Running R script using Shell in Slurm

The test shell script: submit_test.sh

R script: test_hpc_parallel.R

To run the R script using conda environment, the shell script:

module purge
module load anaconda3/2021.11
conda activate /home/224030234/R/R4.4-syf

cd /home/224030234/R
#run the application:
Rscript test_hpc_parallel.R

Following error will occur:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
   - bash
   - fish
   - tcsh
   - xonsh
   - zsh
   - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.

To solve this, source conda.sh before activate, and use the full path for the R scipt.

The revised shell script:

module purge
module load anaconda3/2021.11
source ~/anaconda3/etc/profile.d/conda.sh
conda activate /home/224030234/R/R4.4-syf

#run the application:
Rscript /home/224030234/R/test_hpc_parallel.R

Others

conda install freezing
When install r-pkg, solving environment freezing often occurs, to solve this, replace conda install with mamba.

What is mamba

conda channels operations

1 2	conda config --show channels # show current channels conda config --remove channels defaults # remove defaults channel

To add new channels:

1
2
3

vim ./condarc
# add in condarc via vim
conda config --show channels # check channels added status

Reference

AR1

Posted on 2025-04-17 Edited on 2025-05-20

Time Series analysis: AR(1) process

Time Series analysis: AR(1) process

自回归过程

考虑是由决定，即。假设

$x_t = \rho x_{t-1} + \epsilon_t$

其中，随机干扰项。
序列的稳定性由决定，波动程度由决定。

一般地，考虑序列值可由前p个时刻的序列值表示，即

$X_t = \rho_1 X_{t-1} + \rho_2 X_{t-2} + \rho_k X_{t-k} + \epsilon_t$

其中，；是一个p阶自回归模型AR(P)，是自回归系数。

一阶自回归过程AR(1)

一阶自回归表达式：

$X_t = \rho X_{t-1} + \epsilon_t (1)$

其中，。
求方差得：

$\gamma_0 = \rho^2\gamma_0 + \sigma_e^2$ $\gamma_0 = \dfrac{\sigma_e^2}{1-\rho^2}$

根据上得出：即。
对(1)两边同时乘以，并求期望得：

$E(X_{t-k}X_t) = \rho E(X_{t-k}X_{t-1}) + E(\epsilon_t X_{t-k})$

即：

$\gamma_k = \rho \gamma_{k-1} + E(\epsilon_t X_{t-k})$

根据独立于，且均值为0，所以，即：

$\gamma_k = \rho \gamma_{k-1}, k = 1, 2, 3, ...$

将k带入求得：

$\gamma_1 = \rho \gamma_0 = \rho \dfrac{\sigma_e^2}{1-\rho^2}$ $\gamma_2 = \rho \gamma_1 = \rho^2 \dfrac{\sigma_e^2}{1-\rho^2}$ $...$ $\gamma_k = \rho \gamma_{k-1} = \rho^{k} \dfrac{\sigma_e^2}{1-\rho^2}$

即：

$\rho^k = \dfrac{\gamma_k}{\gamma_0}$

自相关系数只与间隔(k)有关，因此是平稳(stationary)的。

模拟过程

对于不同，模拟过程并绘制自相关图(序列自相关函数)：
AR(1)模拟

代码：

set.seed(123)

simulate_AR1 <- function(rho, sigma = 1, T = 100) {
  y <- numeric(T)
  y[1] <- rnorm(1, 0, sigma / sqrt(1 - rho^2))  # stationarity
  for (t in 2:T) {
    y[t] <- rho * y[t-1] + rnorm(1, 0, sigma)
  }
  return(y)
}

rhos <- c(0, 0.3, 0.7, 0.95)
par(mfrow = c(4, 2), mar = c(4, 4, 2, 1), cex.main = 1.5)

for (rho in rhos) {
  y <- simulate_AR1(rho)
  
  # 时间序列图
  plot(y, type = "l", main = paste("AR(1) with rho =", rho),
       ylab = "y", xlab = "Time")
  
  # 自相关图
  acf(y, main = paste("ACF: rho =", rho))
}

根据上图，自相关系数越小，ACF衰减越快，相反，ACF衰减越慢。

B-splines

Posted on 2025-03-11 Edited on 2025-05-19

B-Splines

B-Splines

Lagrange interpolation

Consider 2 points and

is a random point between and :

$P_x = P_0 + t\cdot (P_1 - P_0),$

where and are base functions.

Consider 3 points , and

Knowing that these 3 points will get a certain quadratic function. To get this function, for every point, there is ,

$f_i(x) = \begin{cases} 1, & x = x_i \\ 0, & x = x_j (j \neq i) \end{cases}$

The result for interpolation is:

$P_{n-1}(x) = \sum_{i = 1}^n y_i f_i(x)$

To fit that ,

$P_1(x) = \dfrac{(x-x_2)(x-x_3)}{(x_1-x_2)(x_1-x_3)}$

The sum for these 3 points:

$P_3(x) = \dfrac{y_1(x-x_2)(x-x_3)}{(x_1-x_2)(x_1-x_3)} + \dfrac{y_2(x-x_1)(x-x_3)}{(x_2-x_1)(x_2-x_3)} + \dfrac{y_3(x-x_1)(x-x_2)}{(x_3-x_1)(x_3-x_2)}$

Largragne interpolation

For points, , .

, therefore, it contains all the points
is the base function.

Bezier curve

Given 3 points , assume in the line :

$P_i = (1-t)P_0 + t\cdot P_1,$ $P_j = (1-t)P_1 + t\cdot P_2,$ $P_x = (1-t)P_i + t\cdot P_j$

Hence, .

Suppose there are points, the Bezier curve is:

$B(t) = \sum_{i = 0}^n C_n^i(1-t)^{n-i}t^iP_i, t\in [0,1].$

The is determained by the former points’ ’ Bezier curve and later points :

$B^{n}(t|P_0, P_1, ..., P_n) = (1-t)\cdot B^{n-1}(t|P_0, P_1, ..., P_{n-1}) + t\cdot B^{n-1}(t|P_1, P_2, ..., P_n)$

B-Splines

Basic concepts:

Control points: control the shape of curves. Suppose there are control points: .
Knot: affect for the weight, suppose knots, the curve is devided into pieces.
Degree & Order: order = degree + 1, degree is usually denoted as .

$B(t) = \sum_{i = 0}^nB_{i,k}(t)P_i$

where represents the -th points’ weight function.

How to get (de Boor):

$k = 0, \quad B_{i,0}(t) = \begin{cases} 1, & t\in [t_i, t_{i+1}] \\ 0, & o.w. \end{cases}$ $k > 0, \quad B_{i,k}(t) = \frac{t-t_i}{t_{i+k}-t_i}B_{i,k-1}(t) + \frac{t_{i+k+1}-t}{t_{i+k+1}-t_{i+1}}B_{i+1,k-1}(t)$

B-splines regression

For multilinear regression, , where .\
For splines estimates, assume:

$\begin{aligned} y & = \beta + \sum_{i = 0}^n B_{i,k}(x)\beta_i\\ & = \beta_0 B_{0,k}(x) + \beta_1 B_{1,k}(x) + ... + \beta_{h-k-1}B_{h-k-1,k}(x)\\ \end{aligned}$

where is the base spline function.

For more details, see here for reference.

Reference

Introduction of Lagrange interpolation.
陈广雷, 王兆军. 多元部分线性模型的B-样条估计[J]. 应用概率统计, 2010, 26(2): 138-150.

TechBlog-2

Posted on 2024-12-18

Train your Own Detection Model - YOLO

Continuing from the previous blog 1 that successfully detecting objects like cats and dogs but failing to detect Saber. This blog will apply Yolo v8 to train a model from data collection & labeling, training & validating and finally predicting the object result.

Train your Own Detection Model - YOLO

Background of Yolo

YOLO: You Only Look Once

The YOLO model (Redmon et al., 2016) is the very first attempt at building a fast real-time object detector. The main feature of YOLO algorithm is that it treats the object detection problem as a regression problem, directly predicting multiple bounding boxes and category probabilities in the image through a single network forward propagation.

Basic Concept

YOLO abandons the sliding window and directly divides the original image into SS non overlapping small squares. Then, through convolution, the SS feature map is generated, where each element of the feature map corresponds to a small square in the original image.

How to identify the Object
The training result can be viewed as a classification, and the result is . If the prediction is the same as the target then , otherwise .
Determain the Border and Position
The coordinates of bounding box are defined by a tuple of 4 values, (center x-coord, center y-coord, width, height) - . and are normalized by the image width and height, and thus all between (0, 1].
Confidence Scores of Border
Confidence scores includes two parts: probability of target included and precision of border. The former one could be represented by , and the later one is the ratio of prediction and ground truth called intersection over union (IOU).

And the confidence scores .
Included Target or Not
For each cell, it is also necessary to provide predicted probability values for C categories, which represent the probabilities of the bounding boxes predicted by that cell belonging to each category. These probability values are actually conditional probabilities () at the confidence levels of each bounding box. The class-specific confidence scores can be calculated:
Summary
Image size: ,
Bounding boxes numbers: ;
In total, one image contains bounding boxes, each box corresponding to 4 location predictions, 1 confidence score, and C conditional probabilities for object classification. Every bounding box needs , so the final prediction for a image is , which is the tensor shape of the final conv layer of the model.

Network Architecture
YOLO uses convolutional networks to extract features, and then uses fully connected layers to obtain predicted values which is similar to GooLeNet.

The final prediction is produced by two fully connected layers over the whole conv feature map.

Loss Function
The loss calculation of the model includes three aspects: positioning loss, classification loss, and confidence loss.

Position: $𝟙$

Classification: $𝟙$

Object: $𝟙 𝟙$

$𝟙$ indicates whether the j-th bounding box of the cell i is “responsible” for the object prediction.

The total loss:

(The detailed explanation for yolov5 can be found here)

Training the Model

Environment

Python: 3.8

Yolo model: cv2, yolo v8

Label: labelImg for labelling the training dataset

Dataset

Here I downloaded around 150 images from google with keywords ‘saber’ for Yolo model training. With the help of LabelImg, images had been labeled with Yolo format. After processing, each image will pair with a txt file with format: <class_id> <x_center> <y_center> <width> <height>
For example:
example

and the txt file for this image is :0 0.545000 0.332447 0.790000 0.491135

There is also a calss file to save the calssification catrgories, here I only have one target so the class only contained saber.

Training

Dataset Allocation

Allocating about 60% images to train, 20% to valid and 20% to predict, and put the label files as well. The class txt should be at the same level as above. The directory for the images and txts is:
- Dataset
  - images
  - labels
  - class.txt

Config the Yaml

The yaml tells the location for dataset and the number of class in this model training.

train: D:/PyCharm Community Edition 2024.1.3/TechBlog/dataset/images/train  
val: D:/PyCharm Community Edition 2024.1.3/TechBlog/dataset/images/val  
test: D:/PyCharm Community Edition 2024.1.3/TechBlog/dataset/images/test  

 # number of classes
 nc: 1

 # class names
 names: ['saber']

Model training

With the downloaded yolov8n.pt, it’s time to start to train.

from ultralytics import YOLO

 if __name__ == '__main__':
     # Load a model
     model = YOLO('yolov8n.pt')  # load a pretrained model 

     # Train the model
     model.train(data='./dataset/data.yaml', epochs=270, imgsz=320)

After 270 epochs training, model for detecting ‘saber’ has been saved.

Validation

Code for valid dataset:

from ultralytics import YOLO

 if __name__ == '__main__':
     # Load a model
     model = YOLO('D:/PyCharm Community Edition 2024.1.3/TechBlog/runs/detect/train9/weights/best.pt')  

     # Validate the model
     metrics = model.val()

And the result for this dataset:

val_res
If we go to the directory to see the predicted results in /val, we will find the model successfully detects all the images!
val_pred

Resolve the Legacy issue

Now, to see how this works to solve the problem in the Blog 1. Let the model detect the input image ‘saber’ to see the result.

from PIL import Image
from ultralytics import YOLO

if __name__ == '__main__':
    # load the model
    model = YOLO("./runs/detect/train9/weights/best.pt")

    # from PIL loading img
    im1 = Image.open("saber.jpg") 
    results = model.predict(source=im1, save=True)  

    # using loop to print the position result of detected target
    for result in results:
        boxes = result.boxes  # bounding box

        for box in boxes:
            # position
            x_min, y_min, x_max, y_max = box.xyxy[0]
            # top-left & bottom-right
            print(f"Detected object at: x_min={x_min}, y_min={y_min}, x_max={x_max}, y_max={y_max}")

Time to see the result :-)

Good 🎉

Reference

[1] Joseph Redmon, et al. “You only look once: Unified, real-time object detection.” CVPR 2016.

[2] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going Deeper with Convolutions. ArXiv. https://arxiv.org/abs/1409.4842

[3] Loss function detail in Yolov5 blog

[4] YOLO code source from github

TechBlog 1

Posted on 2024-11-18 Edited on 2024-12-22

Basic intro and experiment of CV

In this blog, a simple CV process will be applied for object detection. From a basic image gradient vector, followed by image segmentation, and finally an example of object detection using VGG.

Image Gradient Vector

Gradient: The direction of gradient is the greatest change rate of the function, which is used to find the extremum.
Image Gradient Vector: Take the image as a function, gradient can be used to measure the pixel‘s change rate. Image gradient can be regarded as a two-dimensional discrete function, and image gradient is actually the derivative of this two-dimensional discrete function .
$\nabla f(x,y) = [G_x, G_y]^T = [{\partial f\over \partial x}, {\partial f\over \partial y}]^T$
Take the Sobel operator for an example, which mainly used for edge detection, is a discrete difference operator used to calculate the grayscale approximation of the image gradient function.

$G_x = \left[ \begin{array}{ccc} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{array} \right] * A$ $G_x = \left[ \begin{array}{ccc} -1 & -2 & -1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{array} \right] * A$

Calculation for an image .

Image Process Example: here using an example to see the difference between applying

and

to process image pixel.
The example image is: mygo

Code for this:

import numpy as np
import scipy
import scipy.signal as sigs
import matplotlib.pyplot as plt


img = scipy.misc.imread("mygo1.jpg", mode="L")

# Define the Sobel operator kernels.
kernel_x = np.array([ [-1, 0, 1],[-2, 0, 2],[-1, 0, 1] ])
kernel_y = np.array([ [1, 2, 1], [0, 0, 0], [-1, -2, -1] ])

G_x = sig.convolve2d(img, kernel_x, mode='same')
G_y = sig.convolve2d(img, kernel_y, mode='same')

# Plot 
fig = plt.figure()
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

# the transformation (G_x + 255) / 2.
ax1.imshow((G_x + 255) / 2, cmap='gray'); ax1.set_xlabel("Gx")
ax2.imshow((G_y + 255) / 2, cmap='gray'); ax2.set_xlabel("Gy")
plt.show()

The result shows the comparation of using

and

Image Segmentation

Felzenszwalb’s Algorithm was proposed for segmenting an image into similar regions. Each pixel is a vertex and then gradually merged to create a region, and the connection between each pixel is a minimum spanning trss (MST).

How to Balance Difference bwtween two Pixels
Difinations:
- Internal Difference: , which represents the edge with the greatest dissimilarity in MST.
- Difference between two components:
  , which represents the dissimilarity of the edge that connects all the edges of the two regions, the dissimilarity of the edge with the least dissimilarity. The dissimilarity of the two regions where they are most similar.
The standard for merging two regions is:

Only when and are able to stand the , they will be segmented into different regions. Otherwise, they are regarded as in the same region.
- Procedures of the Algorithm
  Given , and .
  1. edges are sorted by dissimilarity (non-desconding), labeled as ,
  2. choose ,
  3. determines the currently selected edges for merging if meets:
    1)
    2) the degree of dissimilarity is not greater than the degree of dissimilarity within the two , then step 4. Otherwise, straight to step 5.
  4. update thresholds and class designators:
    class designators: -> ,
  5. if , select the next edge to go to step 3.

Example Code
Applying skimage segmentation to segment the example image. Set k = 100 and 500 to see how it controlls merge-region size for showing.

The example image used for segmentation is:
iceland

import numpy as np
import scipy
import scipy.signal as sig
import skimage.segmentation
from matplotlib import pyplot as plt

img2 = scipy.misc.imread("iceland.jpg", mode="L")
segment_mask1 = skimage.segmentation.felzenszwalb(img2, scale=100)
segment_mask2 = skimage.segmentation.felzenszwalb(img2, scale=1000)

fig = plt.figure(figsize=(12, 5))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
ax1.imshow(segment_mask1); ax1.set_xlabel("k=100")
ax2.imshow(segment_mask2); ax2.set_xlabel("k=500")
fig.suptitle("Felsenszwalb's efficient graph based image segmentation")
plt.tight_layout()
plt.show()

Segment results (k = 100 & k = 500):
seg

Image Classification

CNN for Image Classification
Convolution operation: As we learned from the course, in short, convolution applies element-wise multiplication for the vector/ matrix and then
summation.
VGG (Visual Geometry Group)

Using 3*3 convoluton layer and 2*2 pooling layer. VGG has two structures, namely VGG16 and VGG19, and there is no essential difference between the two, but the network depth is different.

Why small size works better: each convolutional layer passes through an activation function. The activation function is a nonlinear transformation. The ability to be non-linear is stronger.
Example Code

Here I use VGG16 to implement a simple classficatoin for animals. The npy file and a class file for choosing results were downloaded from here.

VGG16 contains 16 hidden layers (13 convolutional layers and 3 fully connected layers).

The code for test is from here. I downloaded 3 images from google for test. Minor changed code for classifiication:

Click ME to Show Code



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
import numpy as np
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from scipy.misc import imread, imresize, toimage
import matplotlib.pyplot as plt
import skimage
import skimage.io
import skimage.transform
from imageClass import class_names

VGG_MEAN = [103.939, 116.779, 123.68]


class VGG16(object):
    """
    The VGG16 model for image classification
    """

    def __init__(self, vgg16_npy_path=None, trainable=True):
        """
        :param vgg16_npy_path: string, vgg16_npz path
        :param trainable: bool, construct a trainable model if True
        """
        # The pretained data
        if vgg16_npy_path is None:
            self._data_dict = None
        else:
            self._data_dict = np.load(vgg16_npy_path, encoding="latin1", allow_pickle= True).item()
        self.trainable = trainable
        # Keep all trainable parameters
        self._var_dict = {}
        self.__bulid__()

    def __bulid__(self):
        """
        The inner method to build VGG16 model
        """
        # input and output
        self._x = tf.placeholder(tf.float32, shape=[None, 224, 224, 3])
        self._y = tf.placeholder(tf.int64, shape=[None, ])
        # Data preprocessiing
        mean = tf.constant([103.939, 116.779, 123.68], dtype=tf.float32, shape=[1, 1, 1, 3])
        x = self._x - mean
        self._train_mode = tf.placeholder(tf.bool)  # use training model is True, otherwise test model
        # construct model
        conv1_1 = self._conv_layer(x, 3, 64, "conv1_1")
        conv1_2 = self._conv_layer(conv1_1, 64, 64, "conv1_2")
        pool1 = self._max_pool(conv1_2, "pool1")

        conv2_1 = self._conv_layer(pool1, 64, 128, "conv2_1")
        conv2_2 = self._conv_layer(conv2_1, 128, 128, "conv2_2")
        pool2 = self._max_pool(conv2_2, "pool2")

        conv3_1 = self._conv_layer(pool2, 128, 256, "conv3_1")
        conv3_2 = self._conv_layer(conv3_1, 256, 256, "conv3_2")
        conv3_3 = self._conv_layer(conv3_2, 256, 256, "conv3_3")
        pool3 = self._max_pool(conv3_3, "pool3")

        conv4_1 = self._conv_layer(pool3, 256, 512, "conv4_1")
        conv4_2 = self._conv_layer(conv4_1, 512, 512, "conv4_2")
        conv4_3 = self._conv_layer(conv4_2, 512, 512, "conv4_3")
        pool4 = self._max_pool(conv4_3, "pool4")

        conv5_1 = self._conv_layer(pool4, 512, 512, "conv5_1")
        conv5_2 = self._conv_layer(conv5_1, 512, 512, "conv5_2")
        conv5_3 = self._conv_layer(conv5_2, 512, 512, "conv5_3")
        pool5 = self._max_pool(conv5_3, "pool5")

        # n_in = ((224 / (2**5)) ** 2) * 512
        fc6 = self._fc_layer(pool5, 25088, 4096, "fc6", act=tf.nn.relu, reshaped=False)
        # Use train_mode to control
        fc6 = tf.cond(self._train_mode, lambda: tf.nn.dropout(fc6, 0.5), lambda: fc6)
        fc7 = self._fc_layer(fc6, 4096, 4096, "fc7", act=tf.nn.relu)
        fc7 = tf.cond(self._train_mode, lambda: tf.nn.dropout(fc7, 0.5), lambda: fc7)
        fc8 = self._fc_layer(fc7, 4096, 1000, "fc8", act=tf.identity)

        self._prob = tf.nn.softmax(fc8, name="prob")

        if self.trainable:
            self._cost = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(fc8, self._y))
            correct_pred = tf.equal(self._y, tf.argmax(self._prob, 1))
            self._accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
        else:
            self._cost = None
            self._accuracy = None

    def _conv_layer(self, inpt, in_channels, out_channels, name):
        """
        Create conv layer
        """
        with tf.variable_scope(name):
            filters, biases = self._get_conv_var(3, in_channels, out_channels, name)
            conv_output = tf.nn.conv2d(inpt, filters, strides=[1, 1, 1, 1], padding="SAME")
            conv_output = tf.nn.bias_add(conv_output, biases)
            conv_output = tf.nn.relu(conv_output)
        return conv_output

    def _fc_layer(self, inpt, n_in, n_out, name, act=tf.nn.relu, reshaped=True):
        """Create fully connected layer"""
        if not reshaped:
            inpt = tf.reshape(inpt, shape=[-1, n_in])
        with tf.variable_scope(name):
            weights, biases = self._get_fc_var(n_in, n_out, name)
            output = tf.matmul(inpt, weights) + biases
        return act(output)

    def _avg_pool(self, inpt, name):
        return tf.nn.avg_pool(inpt, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME",
                              name=name)

    def _max_pool(self, inpt, name):
        return tf.nn.max_pool(inpt, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME",
                              name=name)

    def _get_fc_var(self, n_in, n_out, name):
        """Get the weights and biases of fully connected layer"""
        if self.trainable:
            init_weights = tf.truncated_normal([n_in, n_out], 0.0, 0.001)
            init_biases = tf.truncated_normal([n_out, ], 0.0, 0.001)
        else:
            init_weights = None
            init_biases = None
        weights = self._get_var(init_weights, name, 0, name + "_weights")
        biases = self._get_var(init_biases, name, 1, name + "_biases")
        return weights, biases

    def _get_conv_var(self, filter_size, in_channels, out_channels, name):
        """
        Get the filter and bias of conv layer
        """
        if self.trainable:
            initial_value_filter = tf.truncated_normal([filter_size, filter_size, in_channels, out_channels], 0.0,
                                                       0.001)
            initial_value_bias = tf.truncated_normal([out_channels, ], 0.0, 0.001)
        else:
            initial_value_filter = None
            initial_value_bias = None
        filters = self._get_var(initial_value_filter, name, 0, name + "_filters")
        biases = self._get_var(initial_value_bias, name, 1, name + "_biases")
        return filters, biases

    def _get_var(self, initial_value, name, idx, var_name):
        """
        Use this method to construct variable parameters
        """
        if self._data_dict is not None:
            value = self._data_dict[name][idx]
        else:
            value = initial_value

        if self.trainable:
            var = tf.Variable(value, dtype=tf.float32, name=var_name)
        else:
            var = tf.constant(value, dtype=tf.float32, name="var_name")
        # Save
        self._var_dict[(name, idx)] = var
        return var

    def get_train_op(self, lr=0.01):
        if not self.trainable:
            return
        return tf.train.GradientDescentOptimizer(lr).minimize(self.cost,
                                                              var_list=list(self._var_dict.values()))

    @property
    def input(self):
        return self._x

    @property
    def target(self):
        return self._y

    @property
    def train_mode(self):
        return self._train_mode

    @property
    def accuracy(self):
        return self._accuracy

    @property
    def cost(self):
        return self._cost

    @property
    def prob(self):
        return self._prob


# returns image of shape [224, 224, 3]
# [height, width, depth]
def load_image(path):
    # load image
    img = skimage.io.imread(path)
    img = img / 255.0
    # assert (0 <= img).all() and (img <= 1.0).all()
    # print "Original Image Shape: ", img.shape
    # we crop image from center
    short_edge = min(img.shape[:2])
    yy = int((img.shape[0] - short_edge) / 2)
    xx = int((img.shape[1] - short_edge) / 2)
    crop_img = img[yy: yy + short_edge, xx: xx + short_edge]
    # resize to 224, 224
    resized_img = skimage.transform.resize(crop_img, (224, 224))
    return resized_img


def test_not_trainable_vgg16():
    path = "D:/PyCharm Community Edition 2024.1.3/TechBlog"
    img1 = load_image(path + "/puppy.jpg") * 255.0
    batch1 = img1.reshape((1, 224, 224, 3))

    tf.compat.v1.disable_eager_execution()
    with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
        vgg = VGG16(path + "/vgg16.npy", trainable=False)
        probs = sess.run(vgg.prob, feed_dict={vgg.input: batch1, vgg.train_mode: False})
        for i, prob in enumerate([probs[0]]):
            preds = (np.argsort(prob)[::-1])[0:5]
            print("The" + str(i + 1) + " image:")
            for p in preds:
                print("\t", p, class_names[p], prob[p])


if __name__ == "__main__":
    path = "D:/PyCharm Community Edition 2024.1.3/TechBlog"
    img1 = load_image(path + "/puppy.jpg") * 255.0
    batch1 = img1.reshape((1, 224, 224, 3))
    x = np.concatenate((batch1), 0)
    y = np.array([292, 611], dtype=np.int64)
    with tf.Graph().as_default():
        with tf.Session() as sess:
            vgg = VGG16(path + "/vgg16.npy", trainable=True)
            sess.run(tf.global_variables_initializer())

            train_op = vgg.get_train_op(lr=0.0001)
            _, cost = sess.run([train_op, vgg.cost], feed_dict={vgg.input: x,
                                                                vgg.target: y, vgg.train_mode: True})
            accuracy = sess.run(vgg.accuracy, feed_dict={vgg.input: x,
                                                         vgg.target: y, vgg.train_mode: False})
            print(cost, accuracy)

Example images for VGG16:

Puppy

The results generated by VGG16 for the puppy was a “Japanese spaniel“.
Cat

The results generated by VGG16 for this cat was a “Egyptian cat“.
Saber

Sadly, it only implements object detection known in the class file to the image instead of recognition of Saber. To make it successful, dataset for her needs to be collected for training.

1	In the next Blog, I intend to apply YOLO for image classification, from training dataset to realize the image recognition. Hopefully, Saber will be recognized :).

Reference

[1] Gradient Vector

[2] Pedro F. Felzenszwalb, and Daniel P. Huttenlocher. “Efficient graph-based image segmentation.” Intl. journal of computer vision 59.2 (2004): 167-181.article

[3] 图像分割—基于图的图像分割 blog address

[4] Simonyan, K. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

[5] VGG in TensorFlow