0.1 NumPy

Numpy 库是 Python 科学计算的核心。

Numpy Array 是一个存放相同数据类型的数组(类型 numpy.ndarray),可以使用非负元组(Non-negative tuple)来索引。

我们称 Rank 为数组的维数,Shape 表示数组的各个维度的大小,使用整型元组表示。

0.1.1 Array Creation

初始化 Numpy Array 的方法如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import numpy as np

a = np.array([1, 2, 3])
print(type(a))
print(a.shape)
print(a[0], a[1], a[2])
# Reference Modification
a[0] = 5
print(a)

b = np.array(
[[1, 2, 3],
[4, 5, 6]]
)
print(b.shape)
print(len(b))
print(b[0, 0], b[1, 0], b[0, 1])

类似 Matlab,提供多种方法创建数组:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
a = np.zeros((2, 2))    # 2x2 0-matrix
print(a)
b = np.ones((3, 3)) # 3x3 1-matrix
print(b)
c = np.eye(2) # 2x2 identical matrix
print(c)
d = np.full((4, 4), 5) # 4x4 matrix filled with 5
print(d)
# 2x2 matrix filled with uniformed ([0,1]) random value
e = np.random.random((2, 2))
print(e)

f = np.diag((-3, -4, 5, 6)) # 4x4 matrix with -3, -4, 5, 6 on diagonal
print(f)
g_dup = f
g = f.copy() # Copy construction
f[0] = 1
print(g_dup, g)

h = np.linspace(0, 13, 5) # start=0, end(included)=13, num=5
print(h)

# args are similar with range(), but return np.array
k = np.arange(4, 5, 0.1, dtype=float)
print(k)

更多创建方法用到再说。

0.1.2 Array Indexing

索引 Numpy Array 的方法和 Python 原生数组类似:

  • 整型索引 + slice : 切片;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
a = np.array(
[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]]
)

# Use slice in each dimension (!!! Reference but NOT copy construction !!!)
b = a[:2, 1:3] # row: 0-1 (2 not included), column: 1-2
print(b)
# Reference Modification
b[0, 1] = 99
print(a)

# We need to copy construct explicitly
c = a[:2, 1:3].copy()
b[0, 1] = 88
print(c)

# get by only one axis
c = a[0]
print(c)
d = a[0, 2] # row=0, column=2
print(d)

注意引用传递的问题。

还可以将 整型索引 和 slice 混合使用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
a = np.array(
[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]]
)

# Get the whole row 0 (equivalent to a[0])
row_r1 = a[0, :]
print(row_r1, row_r1.shape)
# row 0 (but keep dimension)
row_r2 = a[0:1, :]
print(row_r2, row_r2.shape)

# Same with column
col_r1 = a[:, 0]
col_r2 = a[:, 0:1]
print(col_r1, col_r1.shape)
print(col_r2, col_r2.shape)

此外,Numpy 还支持:

  • 整型数组索引 + 布尔数组索引(更类似 Matlab):仍然是引用传递。需要 copy() 来 copy construct;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
a = np.array(
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]]
)

# Equivalent to: np.array([ a[0, 0], a[1, 1], a[2, 0] ])
print(a[[0, 1, 2], [0, 1, 0]])

# Variable indexing
b = np.array([0, 2, 0, 1])
# Select each row, column: 0, 2, 0, 1, respectively.
print(a[np.arange(4), b])

# Reference modification
a[np.arange(4), b] += 10
print(a)
1
2
3
4
5
6
7
8
9
10
11
12
13
a = np.array([
[1, 2],
[3, 4],
[5, 6]
])

# Like Matlab
bool_idx = (a > 2)
print(bool_idx)

# NOTE: boolean indexing can only construct rank-1 array
print(a[bool_idx]) # [3 4 5 6]
print(a[a > 2])

0.1.3 Data Types

Numpy 中提供来多种数据类型,可以用来构建数组等操作。默认情况下构建 array,在不指定 dtype 参数时,Numpy 会猜测数组的数据类型。

1
2
3
4
5
6
7
8
x = np.array([1, 2])
print(x.dtype) # np.int64

x = np.array([1, 2], dtype=np.uint64)
print(x.dtype) # np.uint64

x = np.array([1., 2.])
print(x.dtype) # np.float64

Python 原生类型,如 int / bool / float / complex / bytes / str / object,分别对应 int_ / bool_ / float64 / complex128 / bytes_ / str_ / object_

当数据类型比较复杂时(例如结构体数组),我们可以手动创建 dtype 并指定,例如:

1
2
3
4
5
6
7
8
dt = np.dtype([
('name', np.str_, 16), # Like database CHAR(16)
('grades', np.float64, (2,)) # length=2, rank-1 array
])
print(dt) # < 表示小端序、U 代表 unicode char 数字就是占用的字节
x = np.array([('Sarah', (8.0, 7.0)), ('John', (6.0, 7.0))], dtype=dt)
# Get data like dict
print(x[1]['grades'])

更多类型的使用、dtype 信息,请参见 Array DTypes - Numpy Doc

0.1.4 Array Math

数组运算,在 Numpy 中也是向量 / 矩阵运算。

在 Matlab 中,我们知道这些运算可以是向量整体点积/叉积、矩阵整体乘法/取逆、向量矩阵乘法;也可以是 element-wise(逐元素)的运算。

逐元素的四则运算、开方运算如下(会产生新对象):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

print(x + y)
print(np.add(x, y))

print(x - y)
print(np.subtract(x, y))

# 和 matlab 不一样,* 符号是 element-wise 的
print(x * y)
print(np.multiply(x, y))

print(x / y)
print(np.divide(x, y))

print(np.sqrt(x))

向量点积(内积)/ 矩阵整体乘法 / 向量矩阵乘法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# matrices
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

# vectors
v = np.array([9, 10])
w = np.array([11, 12])

# Inner product of vectors
print(v.dot(w))
print(np.dot(v, w)) # equivalence

# Matrix / vector product
# They have different answers!
print(v.dot(x)) # v is regarded as row vector
print(x.dot(v)) # v is regarded as column vector

# Matrix product
print(x.dot(y))

我们发现,在 Numpy 中,通过 array() 定义的 rank-1 array 很灵活,既可以作行向量,又可以作列向量

也正因如此,我们下面介绍的 转置操作 对 rank-1 array 毫无影响:

1
2
3
4
5
6
7
8
9
10
11
12
x = np.array([[1,2], [3,4]])
print(x) # Prints "[[1 2]
# [3 4]]"
# 注意:返回引用!
print(x.T) # Prints "[[1 3]
# [2 4]]"
# 等价于 x.transpose()

# Note that taking the transpose of a rank 1 array does nothing:
v = np.array([1,2,3])
print(v) # Prints "[1 2 3]"
print(v.T) # Prints "[1 2 3]"

此外,除了 element-wise 求和,Numpy 和 Matlab 一样提供了整体求和的方法 sum

1
2
3
4
5
x = np.array([[1,2],[3,4]])

print(np.sum(x)) # 默认全部元素求和
print(np.sum(x, axis=0)) # axis=0 对第一维度(列)求和
print(np.sum(x, axis=1)) # axis=1 对第二维度(行)求和

更多运算方法用到再说。

0.1.5 Broadcasting

广播(broadcasting)是 Numpy 运算中相当重要的性质之一,它为不同维度的 array 间的运算提供了更高效的方式。下面几种常用的手段:

将 rank-1 array 加到 matrix 的每一行上

假设现在有个需求,将 v 向量加到 x 的每一行上,最后放到 y 中。

1
2
3
4
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
v = np.array([1, 0, 1])
# empty_like() 按照 x.shape 来创建一个空矩阵
y = np.empty_like(x)

如果不用 broadcasting 就应该这么做:

1
2
for i in range(x.shape[1]):
y[i, :] = x[i, :] + v

或者使用 tile 来堆叠重复向量:

1
2
3
4
5
vv = np.tile(v, (4, 1))   # Stack 4 copies of v on top of each other
print(vv) # Prints "[[1 0 1]
# [1 0 1]
# [1 0 1]
# [1 0 1]]"

第一种方法虽然能实现需求,但是如果 x 相当大,那么 Python 自己的循环就非常慢,因此我们需要借助 broadcasting(写在 Numpy Library 中,C/C++ 实现):

1
2
# 不同维度的 array 通过 broadcast 直接运算
y = x + v # Add v to each row of x using broadcasting

但是如果使用不当会出现 shape mismatch 的情况。因此我们需要知道 broadcast 的算法:

  1. 如果参与运算的 arrays 的 rank 不同,那么向 rank 小的 array 的 shape 中 prepend 数字 1(可以在不添加元素的前提下扩展维度),直至它们的 rank(就是 shape 元组的长度)相等;

    例如 [1, 2, 3](shape (3,))扩展一次维度后变成 [[1, 2, 3]](shape (1, 3));

    定义:两个 arrays 在某个维度上 compatible(匹配),当且仅当它们的 shape 在这个维度上的值相同,或者一方为 1

    因此,第一步就是尝试让两个参与运算的 array 在每个维度上都匹配。数学上只有两个 array 在每个维度上都匹配,才能进行 broadcasting 操作

  2. 两个 arrays 运算时,如果在某个维度上一方为 1,另一方大于 1,那么说明需要在这个维度上 broadcasting,具体做法就是让这个维度上为 1 的 array,在当前维度上重复多次直至与另一个 array 的相等;

  3. 如果两个 arrays 是匹配的,那么在最终运算结束时,结果是二者 shapes 的 element-wise 的最大值。

思考 1:为什么 broadcasting 会将向量加到矩阵的每一行,而不是每一列?

答:由 broadcasting 算法决定。上面提到,会向 shape 中 prepend 数字 1,这就是原因(如果算法是 append 的话就是加到列上去了)。

思考 2:那么我们如何利用 broadcasting 将向量加到每一列上?

答:先用转置,然后最后再转置回去就行。或者使用 reshape 来手动预处理 shape(向它 append 1);

1
2
3
4
5
x = np.array([[1, 2, 3], [4, 5, 6]])
w = np.array([4, 5])
print((x.T + w).T)
# 或者
print(x + np.reshape(w, (2, 1)))

常量 Element-wise 运算

这应该是广播最为通俗易懂的形式。

我们直接用常量(相当于扩展前的 shape (1,))向矩阵乘:

1
2
x = np.array([[1, 2, 3], [4, 5, 6]])
print(x * 2)

计算向量叉积

注:可以使用自带方法 numpy.outer(a, b),这里只是展示如何用 broadcasting 完成叉积(外积)。

本质上就是将被乘向量的每个元素乘到每个列上:

1
2
3
4
v = np.array([1, 2, 3])
w = np.array([4, 5])

print(np.reshape(v, (3, 1)) * w)

关于 Numpy 的基础用法掌握这么多就够了。如果希望更多的信息,请参见 Numpy 官方文档。

0.2 SciPy

SciPy 库提供了一些高性能方法来计算和掌控多维数组,对科学计算和工程有很大用处。

最好入门的方法是阅读这个文档:Scipy - Reference。这里我们尽快入门为主,有需要再自行翻阅。

0.2.1 Image Operations

Scipy 库提供了图像处理的基本函数。例如:

  • 从磁盘的图片中读入到 Numpy 数组;
  • 将 Numpy 数组写成图片;

下面是使用 Scipy resize 图片的例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 注:这个写法在 scipy 1.2.0 以后被弃用。
# 你应该使用 opencv 来导入图片
# from cv2 import imread, resize, imwrite
from scipy.misc import imread, imsave, imresize

# Read an JPEG image into a numpy array
img = imread('assets/cat.jpg')
print(img.dtype, img.shape) # Prints "uint8 (400, 248, 3)"

# 补充计算机图形学知识:从这里能看出,jpg 格式的位深度为 8x3 = 24(bytes)
# 其中每个值是 uint8 存储,共 3 个通道(RGB)
# 如果是 png 格式,可以是 4 个通道(RGBA),可以存透明度,称为 PNG-32

# Broadcasting
# 将图像的 Green Channel 和 Blue Channel 像素值分别广播乘以 0.95 和 0.9
# 图像会微微泛红
img_tinted = img * [1, 0.95, 0.9]

# Resize
img_tinted = imresize(img_tinted, (300, 300))

# Save
imsave('assets/cat_tinted.jpg', img_tinted)

0.2.2 Matlab Files

我们可以从 Matlab 中加载通用的矩阵文件。例如下面的例子是从 Matlab 的矩阵文件中加载复数矩阵:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from scipy.io import loadmat

# Load matlab mat file
trainDataRaw = loadmat("data/PA_data_train.mat")
testDataRaw = loadmat("data/PA_data_test.mat")
train_input = trainDataRaw['paInput'][0]
train_output = trainDataRaw['paOutput'][0]
assert len(train_input) == len(train_output), "The size of input vector should be equal to the output one."

test_input = testDataRaw['paInput'][0]
test_output = testDataRaw['paOutput'][0]
assert len(test_input) == len(test_output), "The size of input vector should be equal to the output one."

trainData = [(complex(train_input[k]), complex(train_output[k])) for k in range(len(train_input))]
testData = [(complex(test_input[k]), complex(test_output[k])) for k in range(len(test_input))]

print(f"[INFO] Train data loaded: {len(trainData)} groups (input, output)")
print(f"[INFO] Test data loaded: {len(testData)} groups (input, output)")

# Preparing for the module of the data
trainModule = [(abs(i[0]), abs(i[1])) for i in trainData]
testModule = [(abs(i[0]), abs(i[1])) for i in testData]

0.2.3 Distance between points

Scipy 求两点间距如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import numpy as np
from scipy.spatial.distance import pdist, cdist, squareform

# 定义了一组二维点
x = np.array([
[0, 1],
[1, 0],
[2, 0]
])

# 计算点集中两两间的欧几里得距离,输出为矩阵
# 第 i 行、第 j 列表示点集中第 i 个点和第 j 个点的欧式距离
# [[ 0. 1.41421356 2.23606798]
# [ 1.41421356 0. 1. ]
# [ 2.23606798 1. 0. ]]
d = squareform(pdist(x, 'euclidean'))
# 等价于
e = cdist(x, x, 'euclidean')

print(d)
print(e)

还有一个 cdist 求的是两个点集间的距离(指定两个点集,而不像上面是在一个点集内),自动输出为矩阵形式。

0.3 Matplotlib

介绍一些模板用法。

0.3.1 并列折线图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import matplotlib.pyplot as plt

x = ['1', '2', '4', '8']

fig, axe = plt.subplots()

prim = [27138166.8, 9152368.0, 7887584.4, 5203370.6]
opt = [45508900.4, 91379932.6, 179513808.6, 289122555.6]

axe.scatter(x, prim, c='#219ebc', marker='x')
axe.scatter(x, opt, c='#feb705', marker='o')
# axe.scatter(x, ratio_3, c='#fa8600', marker='^')
axe.plot(x, prim, '-', c='#219ebc', label="Primitive")
axe.plot(x, opt, '-', c='#feb705', label="Optimized")
# axe.plot(x, ratio_3, '-', c='#fa8600', label="Workload 3")

axe.set_xticks(x)
axe.legend()
axe.minorticks_on()
axe.grid()

# axe.set_ylim([0.3, 1.2])
axe.set_xlabel('Thread Number')
axe.set_ylabel('Average Throughput (ops/s)')
axe.set_title('Average Throughput - Thread Number Relationship Plot')

axe.ticklabel_format(style='sci', scilimits=(-1, 2), axis='y', useMathText=True)

plt.show()

0.3.2 并列直方图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import matplotlib.pyplot as plt
import numpy as np


labels = ['Tiny', 'Small', 'Medium', 'Large']
x_labels = labels
width = 0.2
x = np.arange(len(labels))

qs = [5458, 36660, 155667, 1341294]
ls = [27247, 287699, 1146510, 10195125]
qs_r = [6469, 45830, 192725, 2354645]

fig, axe = plt.subplots()

axe.minorticks_on()
axe.grid()

axe.bar(x - width - 0.05, qs, width, color='#2a9d8c', label='Quick Select', zorder=10,
edgecolor='black', linewidth=1)
axe.bar(x, ls, width, label='Linear Select ($Q=5$)', color='#e9c46b', zorder=10,
edgecolor='black', linewidth=1)
axe.bar(x + width + 0.05, qs_r, width, color='#e66f51', label='Quick Select (random pivot)', zorder=10,
edgecolor='black', linewidth=1)

axe.set_xlabel('Data Scale')
axe.set_ylabel('Operation Latency (ns)')
axe.set_xticks(x)
axe.set_xticklabels(x_labels)

axe.set_yscale('log')
axe.legend()

axe.set_title('Operation Latency for Unordered Data')

plt.show()

0.3.3 堆叠条形图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import matplotlib.pyplot as plt
import numpy as np

labels = ['Short (80 B)', 'Middle (7.5 KB)', 'Large (740 KB)']
x_labels = labels
width = 0.2
x = np.arange(len(labels))

uncompressed = [80, 7500, 740_000]
uncompressed_ratio = [1, 1, 1]

single = [50, 4000, 392_000]
single_dict = [185, 477, 508]
single_ratio = [single[i] / uncompressed[i] for i in range(len(labels))]
single_dict_ratio = [single_dict[i] / uncompressed[i] for i in range(len(labels))]

multi = [48, 3900, 383_000]
multi_dict = [212, 504, 528]
multi_ratio = [multi[i] / uncompressed[i] for i in range(len(labels))]
multi_dict_ratio = [multi_dict[i] / uncompressed[i] for i in range(len(labels))]

fig, axe = plt.subplots()

axe.minorticks_on()
axe.grid()

axe.bar(x - width - 0.05, uncompressed_ratio, width, color='gray',label='Uncompressed', zorder=10)
axe.bar(x, single_ratio, width, label='Single', color='#56baf8', zorder=10)
axe.bar(x + width + 0.05, multi_ratio, width, color='#f8566a', label='Multiple', zorder=10)

axe.bar(x, single_dict_ratio, width, bottom=single_ratio, color='#9fd8fb', label='Single Dict', zorder=10)
axe.bar(x + width + 0.05, multi_dict_ratio, width, color='#fb9fab', bottom=multi_ratio, label='Multiple Dict', zorder=10)

axe.set_xlabel('File Type')
axe.set_ylabel('Compress Ratio')
axe.set_xticks(x)
axe.set_xticklabels(x_labels)
axe.legend()

axe.set_title('The Compressed Ratio Comparison')

plt.show()

0.3.4 折线-直方组合图 & 对数坐标

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import matplotlib.pyplot as plt
import numpy as np

labels = ['Short (80 B)', 'Middle (7.5 KB)', 'Large (740 KB)']
x_labels = labels
width = 0.2
x = np.arange(len(labels))

uncompressed = [80, 7500, 740_000]

single = [50, 4000, 392_000]
single_dict = [185, 477, 508]

single_dict_ratio_in_comp = [single_dict[i] / (single_dict[i] + single[i]) for i in range(len(labels))]

multi = [48, 3900, 383_000]
multi_dict = [212, 504, 528]

multi_dict_ratio_in_comp = [multi_dict[i] / (multi_dict[i] + multi[i]) for i in range(len(labels))]

single_total_ratio = [(single[i] + single_dict[i]) / uncompressed[i] for i in range(len(labels))]
multi_total_ratio = [(multi[i] + multi_dict[i]) / uncompressed[i] for i in range(len(labels))]

fig, axe = plt.subplots()
axe2 = axe.twinx()

axe.bar(x - width / 2 - 0.03, single_dict_ratio_in_comp, width, color='#56baf8', label="Single")
axe.bar(x + width / 2 + 0.03, multi_dict_ratio_in_comp, width, color='#f8566a', label="Multiple")

axe.set_xticks(x)
axe.set_xticklabels(x_labels)

axe.set_xlabel('File Type')
axe.set_ylabel('Dictionary Size Ratio')

axe.minorticks_on()
axe.grid()
axe.set_yscale('log')
axe.legend()

axe2.scatter(x, single_total_ratio, c='k', marker='x')
axe2.scatter(x, multi_total_ratio, c='k', marker='x')
axe2.plot(x, single_total_ratio, 'g--', label="Single in Total")
axe2.plot(x, multi_total_ratio, 'y-', label="Multiple in Total")
axe2.legend(loc=(0.66, 0.7))

axe2.set_ylabel('Total Compressed Ratio')

axe.set_title('Dict Size Ratio & Compressed Ratio Relationship')
plt.show()

0.3.3 样条曲线和插值拟合

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# File: fit_interp_1d.py
import numpy as np
from scipy import interpolate
from scipy.optimize import curve_fit
from typing import List, Union


class Interpolate1DGen:
def __init__(self, x: List[float], y: List[float], point_cnt: int):
self.x = x
self.y = y
self.point_cnt = point_cnt
self.sorted_x = sorted(self.x)
self.sorted_y = sorted(self.y)

def interpolate_polynomial(self, kind: Union[str, int]):
"""
Polynomial interpolate for 1D data.
:param kind: 'nearest', 'zero', 'linear', 'slinear', 'quadratic'(2), 'cubic'(3), 4, 5, ...
:return: tuple(interpolated_x, interpolated_y)
"""
ans_x = np.linspace(
self.sorted_x[0],
self.sorted_x[len(self.x)-1],
self.point_cnt
)
f_linear = interpolate.interp1d(
self.x, self.y, kind=kind
)
return ans_x, f_linear(ans_x)

def interpolate_B_spline(self):
"""
B-spline interpolate for 1D data.
**WARNING**: Data sort and non-duplicated will be needed.
:return: tuple(interpolated_x, interpolated_y)
"""
ans_x = np.linspace(
self.sorted_x[0],
self.sorted_x[len(self.x)-1],
self.point_cnt
)
tck = interpolate.splrep(self.sorted_x, self.sorted_y)
y_bSpline = interpolate.splev(ans_x, tck)
return ans_x, y_bSpline


class FitLine1DGen:
def __init__(self, x: List[float], y: List[float], point_cnt: int):
self.x = x
self.y = y
self.point_cnt = point_cnt
self.sorted_x = sorted(self.x)
self.sorted_y = sorted(self.y)

# S-shape curve fit.
# Or logistic model.
@staticmethod
def sigmoid(x, lp, x0, k, b):
y = lp / (1 + np.exp(-k * (x - x0))) + b
return y

def fit_polynomial_curve(self, n: int):
"""
Polynomial fitting for 1D data.
:param n: (int) the max power of the polynomial.
:return: tuple(fitting_curve_x, fitting_curve_y)
"""
params = np.polyfit(self.x, self.y, n)
ans_x = np.linspace(
self.sorted_x[0],
self.sorted_x[len(self.x)-1],
self.point_cnt
)
ans_y = np.polyval(params, ans_x)
return ans_x, ans_y

# logistic fit
def fit_S_curve(self):
"""
S Curve fitting for 1D data.
:return: tuple(fitting_curve_x, fitting_curve_y)
"""
# standardize data:
x_min = self.sorted_x[0]
y_min = self.sorted_y[0]
x_max = self.sorted_x[len(self.x)-1]
y_max = self.sorted_y[len(self.y)-1]
x_zoomed = (np.array(self.x) - x_min) / (x_max - x_min)
y_zoomed = (np.array(self.y) - y_min) / (y_max - y_min)
# A mandatory initial guess
# [max(yData), median(xData), 1, min(yData)]
p_guess = [
max(y_zoomed), np.median(x_zoomed),
1, min(y_zoomed)
]
popt, pcov = curve_fit(
self.sigmoid, x_zoomed, y_zoomed,
p_guess
)
print("[INFO] curve_fit: popt = ", popt)
print("[INFO] curve_fit: pcov = ", pcov)

fit_x = np.linspace(0, 1, self.point_cnt)
fit_y = self.sigmoid(fit_x, popt[0], popt[1], popt[2], popt[3])
ans_x = fit_x * (x_max - x_min) + x_min
ans_y = fit_y * (y_max - y_min) + y_min
return ans_x, ans_y
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import math
import matplotlib.pyplot as plt
import numpy as np

from fit_interp_1d import Interpolate1DGen

element_test_list = [50, 100, 200, 500, 1000]
p_test_list = [1 / 2, 1 / math.e, 1 / 4, 1 / 8, 1 / 16]
ans = [[8.75, 7.45, 5.48, 5.3, 4.92],
[8.98, 7.67, 6.19, 5.58, 6.89],
[9.72, 8.25, 6.35, 5.53, 10.33],
[11.67, 9.22, 8.39, 8.81, 9.52],
[12.83, 10.17, 8.56, 9.26, 10.71]]

style_list = ['-', '-.', '--', '-', ':']
color_list = ['#04acf4', '#ff5722', '#ffeb3b', '#4caf50', '#9c27b0']
inter_sample_cnt = 1000
for idx, elem in enumerate(element_test_list):
aqd = ans[idx]
ans_p, ans_aqd = Interpolate1DGen(
p_test_list, aqd, inter_sample_cnt
).interpolate_polynomial(2)

# plt.scatter(ans_p, ans_aqd, color='k', marker='x')
plt.plot(ans_p, ans_aqd, color=color_list[idx], linestyle=style_list[idx])
plt.xlabel('Probability $p$')
plt.ylabel('Average Query Distance (AQD)')
plt.title('The curves for AQD-probability')
plt.minorticks_on()
plt.grid()
plt.legend(
labels=(
'element: 50',
'element: 100',
'element: 200',
'element: 500',
'element: 1000'
)
)
plt.show()

0.3.4 3D 曲面

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import math
import matplotlib.pyplot as plt
import numpy as np

from mpl_toolkits.mplot3d import Axes3D


mu0 = 4 * math.pi * 10 ** (-3)
N = 310
R = 0.14

# i unit: A
# B unit: 1 Gauss = 10 ** -4 Tesla
def I2B(i: float, x: float):
p1 = mu0 * N * math.pow(R, 2) * i /(2 * math.pow(math.pow(R, 2) + math.pow(R / 2 + x, 2), 3 / 2))
p2 = mu0 * N * math.pow(R, 2) * i /(2 * math.pow(math.pow(R, 2) + math.pow(R / 2 - x, 2), 3 / 2))
return p1 + p2

def theoB(th_B0: float, x: float):
return th_B0 * math.pow(5, 3 / 2) / 16 * (1 / math.pow(1 + math.pow(0.5 + x / R, 2), 3 / 2) + 1 / math.pow(1 + math.pow(0.5 - x / R, 2), 3 / 2))

if __name__ == '__main__':
p_cnt = 1000

fig = plt.figure()
axe = plt.axes(projection='3d')
plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号
xx = np.array([0, 1, 2, 3, 4, 5, 6])
yy = np.array([0, 1, 2, 3, 4, 5, 6])

mesh_val = [
[1.025, 1.024, 1.024, 1.024, 1.023, 1.020, 1.016], # y = 0
[1.025, 1.024, 1.024, 1.024, 1.023, 1.020, 1.016], # y = 0.05R
[1.024, 1.027, 1.024, 1.024, 1.023, 1.022, 1.018], # y = 0.10R
[1.024, 1.024, 1.024, 1.024, 1.025, 1.024, 1.021], # y = 0.15R
[1.023, 1.023, 1.024, 1.025, 1.027, 1.026, 1.024], # y = 0.20R
[1.021, 1.021, 1.023, 1.026, 1.028, 1.030, 1.030], # y = 0.25R
[1.018, 1.019, 1.021, 1.027, 1.031, 1.033, 1.037] # y = 0.30R
]

def U2B(u: float):
return (u - 4.61538 * math.pow(10, -4)) / 0.25396

def Z_func(x: int, y: int):
return mesh_val[x][y]

for i in range(len(mesh_val)):
for j in range(len(mesh_val[0])):
mesh_val[i][j] = U2B(mesh_val[i][j])

rawX, rawY = np.meshgrid(xx, yy)

ap_all = np.vectorize(Z_func)
Z = ap_all(rawX, rawY)
print(Z)

toReal = np.vectorize(lambda x: x * 0.05 * R)
rxx = toReal(xx)
ryy = toReal(yy)
X, Y = np.meshgrid(rxx, ryy)

axe.plot_surface(X, Y, Z, cmap='jet')
axe.plot_wireframe(X, Y, Z, color='k', linewidth=0.3)
# axe.minorticks_on()
axe.set_xlabel('X position ($m$)')
axe.set_ylabel('Y position ($m$)')
axe.set_zlabel('Magnetic field density B ($G$)')
axe.set_title('Space magnetic field distribution')
axe.grid(which="major", alpha=0.6)
axe.grid(which="minor", alpha=0.3)
plt.show()

0.3.5 输出图片

除了使用 OpenCV 库的 imshow,Matplotlib 也支持输出图片:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import numpy as np
from cv2 import imread
import matplotlib.pyplot as plt

img = imread('assets/cat.jpg')
img_tinted = img * [1, 0.95, 0.9]

# Show the original image
plt.subplot(1, 2, 1)
plt.imshow(img)

# Show the tinted image
plt.subplot(1, 2, 2)

# A slight gotcha with imshow is that it might give strange results
# if presented with data that is not uint8. To work around this, we
# explicitly cast the image to uint8 before displaying it.
plt.imshow(np.uint8(img_tinted))
plt.show()