0%

简介

首先要解决的就是为什么需要空间金字塔池化(SPP)这个问题,它到底为了什么而出现。

对于以往的神经网络结构大部分所需要的都是固定的网络大小输入,但是现实中很多图片数据并不是固定大小的输入。以往的方法往往是通过裁剪(Crop)和扭曲(Warp),但是前者会导致信息的丢失,后者可能会导致图片的失真,都会使得数据分布发生一定变化。

SPP解决的就是图片大小不同的问题,使得输入可以是任意宽和高的图片。

Spatial Pyramid Pooling Layer

如上图所示的SPP-Net 中有若干个并行的池化层,将卷积层的结果 w×h×dw\times h\times d 池化成 [1×1],[2×2],[4×4],[1\times 1],[2\times 2],[4\times4],\cdots的一层层结果,再将其所有结果进行拼接之后与 FC 层相连。

由于只有最后的FC层对于输入的大小是存在硬性要求的,当输入为任意大小的图片时,我们可以随意进行卷积、池化。在过FC 层之前,通过 SPP 层,将图片抽象出固定大小的特征(即多尺度特征下的固定特征向量抽取)。

好处有以下几点:

  1. SPP可以针对于不同的input size输出固定长度的向量,这是原本的滑动窗口池化做不到的
  2. SPP用了多层级的空间信息,而滑动窗口池化操作使用的窗口大小是单一的
  3. 由于输入的大小是可以变化的,所以SPP可以提取到不同尺度上信息

Training

  • Single-size Training

单输入size大小的训练方法同普通的训练相同,这里所需要的就是设置好对应的pooling层的stride和window size,以便于之后的SPP层可以输出正确的结果。事实上,这里为了探究single-size的训练主要是为了来测试金字塔池化的行为是否符合预期。

  • Multi-size Training

为了防止切换数据带来的overhead过高,这里假设有两种size的输入图片,每一种size训练一个epoch之后切换到另一种。事实上发现采用多尺度的图片,收敛速率和单尺度图片是相似的,并没有带来收敛速率上的损失。

以上两种方法都是只针对训练阶段的,在测试阶段,可以直接将任何尺寸的图片输入到SPP-net当中。

代码实现

基于PyTorch框架的实现如下,在github上看了几个实现大多数都是通过论文当中提供的公式来进行实现的,少部分发现了公式在面对一些不太友好数据的情况会出现输出维度不同的问题,增加了padding的计算方法。

本着不重复造轮子的原则,在我使用的PyTorch-1.5.0当中提供了AdaptiveMaxPool2dAdaptiveAvgPool2d方法,直接采用其进行构造,代码逻辑会更为清晰和行数也会更短。

同时提供一个outputdim的辅助函数,通过输入的之前卷积层结果的channel数来计算输出维度。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import torch
import torch.nn as nn

class SpatialPyramidPooling(nn.Module):
def __init__(self, levels = 3, pooling='max'):
super(SpatialPyramidPooling, self).__init__()
self.levels = levels
self.mode = pooling
self.pooling_method = nn.AdaptiveMaxPool2d if pooling == 'max' else nn.AdaptiveAvgPool2d
self.layers = [self.pooling_method(i) for i in range(1, levels+1)]

def forward(self, x):
b, c, _, _ = x.size()
pooled = []
for p in self.layers:
pooled.append(p(x).view(b, -1))
return torch.cat(pooled, -1)

def outputdim(self, previous_channel):
return previous_channel * sum([i*i for i in range(1, self.levels+1)])

测试如下:

1
2
3
4
5
6
7
8
spp = SpatialPyramidPooling()
input = torch.randn(8, 32, 224, 224)
output = spp(input)
print(output.shape)

input = torch.randn(8, 32, 128, 324)
output = spp(input)
print(output.shape)

输出结果为:

1
2
torch.Size([8, 448])
torch.Size([8, 448])

的确将不同大小的输入给调整成了统一大小。

原文

Arthur owns a ski resort on a mountain. There are nn landing spots on the mountain numbered from 11 to nn from the top to the foot of the mountain. The spots are connected with one-directional ski tracks. All tracks go towards the foot of the mountain, so there are no directed cycles formed by the tracks. There are at most two tracks leaving each spot, but many tracks may enter the same spot.

A skier can start skiing from one spot and stop in another spot if there is a sequence of tracks that lead from the starting spot and end in the ending spot. Unfortunately, recently there were many accidents, because the structure of the resort allows a skier to go through dangerous paths, by reaching high speed and endangering himself and the other customers. Here, a path is called dangerous, if it consists of at least two tracks.

Arthur wants to secure his customers by closing some of the spots in a way that there are no dangerous paths in the resort. When a spot is closed, all tracks entering and leaving that spot become unusable.

Formally, after closing some of the spots, there should not be a path that consists of two or more tracks.

Arthur doesn’t want to close too many spots. He will be happy to find any way to close at most 47n\frac47n spots so that the remaining part is safe. Help him find any suitable way to do so.

Input

The first line contains a single positive integer TT — the number of test cases. TT test case description follows.

The first line of each description contains two integers nn and mm (1n21051\le n\le 2\cdot 10^5) — the number of landing spots and tracks respectively.

The following mm lines describe the tracks. Each of these lines contains two integers xx and yy (1x<yn1\le x< y\le n) — indices of the starting and finishing spots for the respective track. It is guaranteed that at most two tracks start at each spot. There may be tracks in which starting and finishing spots both coincide.

It is guaranteed that the sum of nn over all test cases does not exceed 21052\cdot 10^5.

Output

For each test case, print a single integer kk (0k47n0\le k\le \frac47n) — the number of spots to be closed. In the next line, print kk distinct integers — indices of all spots to be closed, in any order.

If there are several answers, you may output any of them. Note that you don’t have to minimize kk. It can be shown that a suitable answer always exists.

Example

input

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2
4 6
1 2
1 3
2 3
2 4
3 4
3 4
7 6
1 2
1 3
2 4
2 5
3 6
3 7

output

1
2
3
4
2
3 4
4
4 5 6 7

Note

In the first sample case, closing any two spots is suitable.

In the second sample case, closing only the spot 11 is also suitable.

大致意思

给出一个有向无环图,所有定点的出度不超过2,入度不做限制,同时限定边的起始点一定比终止点要小(是从山顶到山脚的滑雪道)。

要求去掉其中的一些顶点(不多于47n\frac47n),使得图中没有长度超过2的路径。

思路

贪心想法为从节点1(山顶)向后分层级来遍历,如果是第一条连边则保留,第二条则删除相关联节点。

如之后代码中所表示的level,这里通过level将所有顶点分成三类,下标为节点对应的level,形式化表述如下:

  • V0V_0包含只拥有来自V2V_2入边的顶点
  • V1V_1包含至少有一条来自V0V_0入边,但是没有来自V1V_1入边的顶点
  • V2V_2包含至少有一条来自V1V_1入边的顶点

对应的就是删除所有V2V_2中的顶点,下面来证明为什么可以保证不超过47n\frac47n

由于V2V_2至少有一条来自V1V_1的边,而一个顶点最多两条出边,所以最多有2V12|V_1|条出边,那么有着V22V1|V_2|\le 2|V_1|,同理有着V12V0|V_1|\le 2|V_0|。于是n=V0+V1+V274V2n = |V_0|+|V_1|+|V_2|\ge \frac74|V_2|,所以V247n|V_2|\le \frac47n

只需要从山顶到山脚扫描一遍就可以确定哪些是V2|V_2|当中的顶点,这里的时间复杂度为O(n)O(n)

注意:如果对整个level数组进行memset操作,有一个t=2105t=2\cdot 10^5的case会超时!

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <bits/stdc++.h>
using namespace std;

#define IOS ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);
#define ll long long
#define maxn 200010

int level[maxn];
vector<int> adj[maxn];

int main(){
IOS;
int t;
cin>>t;
while(t--){
int n, m;
cin>>n>>m;
for(int i=0;i<m;++i){
int x, y;
cin>>x>>y;
adj[y].push_back(x);
}
vector<int> ans;
for(int u=1;u<=n;++u){
for(int v: adj[u]){
level[u] = max(level[u], (level[v]+1)%3);
}
if(level[u] == 2){
ans.push_back(u);
}
}

cout<<ans.size()<<endl;
for(auto i: ans){
cout<<i<<' ';
}
cout<<endl;

memset(level, 0, sizeof(int)*(n+1));
for(int i=1;i<=n;++i)
adj[i].clear();
}
}

因为就是准备笔试时候自己刷了刷题,所以基本都只在本地跑过了样例

1.最近点对

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include<bits/stdc++.h>
using namespace std;

#define maxn 10010
#define ll long long

ll x[maxn], y[maxn];

ll dist(ll x1, ll y1, ll x2, ll y2){
return (x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2);
}

int main(){
int t;
cin>>t;
while(t--){
int n;
ll mindist = 0x7fffffff;
cin>>n;
for(int i=0;i<n;++i){
cin>>x[i]>>y[i];
for(int j=0;j<i;++j){
mindist = min(mindist, dist(x[i], y[i], x[j], y[j]));
}
}
cout<<mindist<<endl;
}
}

2. 病人排队

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#include<bits/stdc++.h>
using namespace std;

#define ll long long

struct person{
string id;
int age;
int order;
person(string id, int age, int order):
id(id), age(age), order(order){};
};

bool operator< (const person a, const person b){
if(a.age>=60 && b.age<60){
return false;
}
else if(a.age<60 && b.age>=60){
return true;
}
else if(a.age>=60 && b.age>=60){
if(a.age != b.age){
return a.age < b.age;
}
else{
return a.order > b.order;
}
}
else{
return a.order > b.order;
}
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

int n;
cin>>n;
priority_queue<person> q;
for(int i=0;i<n;++i){
string id;
int age;
cin>>id>>age;
q.push(person(id, age, i));
}
while(!q.empty()){
auto tmp = q.top();
q.pop();
cout<<tmp.id<<endl;
}
}

3. 网线主管

Binary Search

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include<bits/stdc++.h>
using namespace std;

#define maxn 10010
#define ll long long

int len[maxn];
int n, k;

bool test(int l){
int cnt = 0;
for(int i=0;i<n;++i){
cnt += len[i] / l;
}
return cnt >= k;
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

cin>>n>>k;
for(int i=0;i<n;++i){
double tmp;
cin>>tmp;
len[i] = round(tmp * 100);
}

if(!test(1)){
cout<<"0.00"<<endl;
return 0;
}

int l = 1, r = 10000001;
while(r - l > 1){
int mid = (l + r) >> 1;
if(test(mid)){
l = mid;
}
else{
r = mid;
}
}
printf("%.2lf\n", ((double)l/100));
}

4. 棋盘问题

Search

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include<bits/stdc++.h>
using namespace std;

int n, k;
int res;
char mat[10][10];
int col[10];

bool isok(int r, int c){
for(int i=0;i<r;++i){
if(col[i] == c){
return false;
}
}
return true;
}

void solve(int r, int rest){
if(rest == 0){
++res;
return;
}
if(r == n){
return;
}
for(int i=0;i<n;++i){
if(mat[r][i] == '#' && isok(r, i)){
col[r] = i;
solve(r+1, rest-1);
}
}
col[r] = -1;
solve(r+1, rest);
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

while(cin>>n>>k && n!=-1 && k !=-1){
for(int i=0;i<n;++i){
cin>>mat[i];
}
res = 0;
solve(0, k);
cout<<res<<endl;
}
}

5. 移动办公

DP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include<bits/stdc++.h>
using namespace std;

#define maxn 101

int p[maxn], n[maxn];
int dp[2][maxn];

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

int t, m;
cin>>t>>m;
for(int i=0;i<t;++i){
cin>>p[i]>>n[i];
}

dp[0][t] = dp[1][t] = 0;
for(int i=t-1;i>=0;--i){
dp[0][i] = max(dp[0][i+1]+p[i], dp[1][i+1]+p[i]-m);
dp[1][i] = max(dp[1][i+1]+n[i], dp[0][i+1]+n[i]-m);
}

cout<<max(dp[0][0], dp[1][0])<<endl;
}

6. 螺旋加密

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#include<bits/stdc++.h>
using namespace std;

int row, col;
char mat[24][24];

string getcode(char c){
if(c == ' '){
return "00000";
}
int n = c - 'A' + 1;
string res;
for(int i=0;i<5;++i){
if(n & 1){
res.push_back('1');
}
else{
res.push_back('0');
}
n >>= 1;
}
reverse(res.begin(), res.end());
return res;
}

void fillblock(const string & code){
for(int i=0;i<24;++i){
for(int j=0;j<24;++j){
mat[i][j] = '0';
}
}

int u = 0, d = row-1, l = 0, r = col-1;
int i = 0, j = 0;
int dir[4][2] = {{0, 1}, {1, 0}, {0, -1}, {-1, 0}};
int dd = 0;
for(int k=0;k<code.length();++k){
mat[i][j] = code[k];
if(j + dir[dd][1] > r){
++u;
dd=1;
}
else if(j + dir[dd][1] < l){
--d;
dd=3;
}
else if(i + dir[dd][0] > d){
--r;
dd=2;
}
else if(i + dir[dd][0] < u){
++l;
dd=0;
}
i += dir[dd][0];
j += dir[dd][1];
}
}


int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

cin>>row>>col;
string s;
getline(cin, s);

string mycode = "";
for(int i=1;i<s.length();++i){
mycode += getcode(s[i]);
}

fillblock(mycode);
for(int i=0;i<row;++i){
for(int j=0;j<col;++j){
cout<<mat[i][j];
}
}
cout<<endl;
}

7. 单词序列

BFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include<bits/stdc++.h>
using namespace std;

int row, col;
char mat[24][24];
map<string, int> dist;

bool diff1(const string& a, const string& b){
int len = a.length();
int cnt = 0;
for(int i=0;i<len;++i){
if(a[i] != b[i]){
++cnt;
}
}
return cnt == 1;
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

string s, e;
cin>>s>>e;
dist[s] = 1;
dist[e] = 42;
string t;
while(cin>>t){
dist[t] = 42;
}

queue<string> q;
q.push(s);
while(!q.empty()){
string v = q.front();
q.pop();
for(auto tmp: dist){
if(tmp.second == 42 && diff1(tmp.first, v)){
dist[tmp.first] = dist[v] + 1;
q.push(tmp.first);
}
}
}
if(dist[e] == 42){
cout<<0<<endl;
}
else{
cout<<dist[e]<<endl;
}
}

8. 区间异或和

Binary Indexed Tree

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include<bits/stdc++.h>
using namespace std;

#define maxn 100010

int n, q;
int arr[maxn], nums[maxn];

int lb(int x){
return x & (-x);
}

void update(int k, int x){
for(int i=k;i<=n;i+=lb(i))
arr[i] ^= x;
}

int getsum(int k){
int res = 0;
for(int i=k;i>0;i-=lb(i))
res ^= arr[i];
return res;
}

int main(){
while(cin>>n>>q){
memset(arr, 0, sizeof(arr));
int res = 0;
for(int i=1;i<=n;++i){
update(i, i);
nums[i] = i;
}
for(int i=0;i<q;++i){
int op, l, r;
cin>>op>>l>>r;
if(op){
update(l, r ^ nums[l]);
nums[i] = r;
}
else{
res ^= getsum(r) ^ getsum(l-1);
}
}
cout<<res<<endl;
}
}

9. 炮兵阵地

DP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include<bits/stdc++.h>
using namespace std;
int n, m;
char grid[104][10];
int row[104];
int state[104];
int countone[104];
int dp[104][64][64];

int count_one(int k)
{
int cnt = 0;
while (k)
{
++cnt;
k = k & (k - 1);
}
return cnt;
}

int preprocess()
{
int cnt = 0;
for (int i = 0; i < (1 << m); ++i)
{
if ((i&(i >> 1)) == 0 && (i&(i >> 2)) == 0)
{
countone[cnt] = count_one(i);
state[cnt++] = i;
}
}
return cnt;
}

int main()
{
cin >> n >> m;
int ms = preprocess();
for (int i = 0; i < n; ++i)
cin >> grid[i];
for (int i = 0; i < n; ++i)
for (int j = 1; j <= m; ++j)
row[i+1] += (grid[i][j-1] == 'P') << (m - j);
for (int j = 0; j < ms; ++j)
if ((state[j] | row[1]) == row[1])
dp[1][0][j] = countone[j];
for (int i = 2; i <= n; ++i)
{
for (int ppre = 0; ppre < ms; ++ppre)
{
if ((state[ppre] | row[i - 2]) != row[i - 2])
continue;
for (int pre = 0; pre < ms; ++pre)
{
if ((state[pre] | row[i - 1]) != row[i - 1] || (state[ppre] & state[pre]) != 0)
continue;
for (int j = 0; j < ms; ++j)
if ((state[j] | row[i]) == row[i] && (state[j] & state[pre]) == 0 && (state[j] & state[ppre]) == 0)
dp[i][pre][j] =max(dp[i][pre][j], countone[j]+ dp[i - 1][ppre][pre]);
}
}
}
int ans = 0;
for (int i = 0; i < ms; ++i)
for (int j = 0; j < ms; ++j)
ans = max(ans,dp[n][i][j]);
cout << ans << endl;
}

a. 迷宫路口

DFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#include<bits/stdc++.h>
using namespace std;

int s, n;
int cnt[12];
int mat[32][32];

bool can_place(int x, int y, int size){
if(x+size<=s && y+size<=s){
for(int i=0;i<size;++i){
for(int j=0;j<size;++j){
if(mat[x+i][y+j] != 0){
return false;
}
}
}
return true;
}
else{
return false;
}
}

void place(int x, int y, int size){
for(int i=0;i<size;++i){
for(int j=0;j<size;++j){
mat[x+i][y+i] = size;
}
}
--cnt[size];
}

void unplace(int x, int y){
int size = mat[x][y];
for(int i=0;i<size;++i){
for(int j=0;j<size;++j){
mat[x+i][y+j] = 0;
}
}
++cnt[size];
}

pair<int, int> next_loc(int x, int y){
for(int i=x;i<s;++i){
for(int j=y;j<s;++j){
if(mat[i][j] == 0){
return make_pair(i, j);
}
}
}
return make_pair(-1, -1);
}

bool dfs(int x, int y){
for(int i=10;i>=1;--i){
if(cnt[i]){
if(can_place(x, y, i)){
place(x, y, i);
auto loc = next_loc(x, y);
if(loc.first == -1 && loc.second == -1){
return true;
}
if(dfs(loc.first, loc.second)){
return true;
}
unplace(x, y);
}
}
}
return false;
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

int t;
cin>>t;
while(t--){
cin>>s>>n;
for(int i=0;i<n;++i){
int tmp;
cin>>tmp;
cnt[tmp]++;
}
cout<<(dfs(0, 0)? "YES": "NO")<<endl;
}
}

因为就是准备笔试时候自己刷了刷题,所以基本都只在本地跑过了样例

A. 第n小的质数

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include<bits/stdc++.h>
using namespace std;

#define maxn 200010

bool isprime[maxn];
vector<int> primes;

void solve(){
fill(isprime, isprime+maxn, true);
int k=2;
while(k<maxn-1){
for(int i=2*k;i<maxn-1;i+=k){
isprime[i] = false;
}
for(++k;!isprime[k];++k){
;
}
}
for(int i=2;i<maxn;++i){
if(isprime[i]){
primes.push_back(i);
}
}
}


int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

solve();

int n;
cin>>n;
cout<<primes[n-1]<<endl;
}

B. 潜伏者

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include<bits/stdc++.h>
using namespace std;

map<char, char> trans;

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

string s, t;
cin>>s>>t;
bool failed = false;
for(int i=0;i<s.length();++i){
if(trans.find(s[i]) != trans.end() && trans[s[i]] != t[i]){
failed = true;
break;
}
else{
trans[s[i]] = t[i];
}
}

for(int i=0;i<26;++i){
if(trans.find('A'+i) == trans.end()){
failed = true;
break;
}
}

cin>>s;
if(failed){
cout<<"Failed"<<endl;
}
else{
for(int i=0;i<s.length();++i){
cout<<trans[s[i]];
}
cout<<endl;
}

}

C. 逃离迷宫

BFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include<bits/stdc++.h>
using namespace std;

int m, t;
char mat[12][12];
int cnt[12][12];
int sx, sy, ex, ey;
int dir[4][2] = {{0, 1}, {1, 0}, {0, -1}, {-1, 0}};

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

int k;
cin>>k;
while(k--){
cin>>m>>t;
for(int i=0;i<m;++i){
for(int j=0;j<m;++j){
cin>>mat[i][j];
if(mat[i][j] == 'S'){
sx = i;
sy = j;
}
if(mat[i][j] == 'E'){
ex = i;
ey = j;
}
}
}

queue<pair<int, int>> q;
for(int i=0;i<m;++i){
for(int j=0;j<m;++j){
cnt[i][j] = -1;
}
}
q.push(make_pair(sx, sy));
cnt[sx][sy] = 0;
while(!q.empty()){
int x = q.front().first;
int y = q.front().second;
q.pop();

for(int i=0;i<4;++i){
int nx = x + dir[i][0];
int ny = y + dir[i][1];
if(0 <= nx && nx < m && 0 <= ny && ny < m && (mat[nx][ny] == '.' || mat[nx][ny] == 'E') && cnt[nx][ny] == -1){
cnt[nx][ny] = cnt[x][y] + 1;
q.push(make_pair(nx, ny));
}
}
}

if(cnt[ex][ey] == -1 || cnt[ex][ey] > t){
cout<<"NO"<<endl;
}
else{
cout<<"YES"<<endl;
}
}
}

D. 跑步

DP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include<bits/stdc++.h>
using namespace std;

#define maxn 10010
#define maxm 502

int n, m;
int d[maxn];
int dp[maxn][maxm];

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

cin>>n>>m;
for(int i=0;i<n;++i){
cin>>d[i];
}

for(int i=0;i<n;++i){
for(int j=0;j<m;++j){
dp[i][j] = -1;
}
}

dp[0][0] = 0;

for(int i=0;i<n;++i){
for(int j=0;j<=m;++j){
if(dp[i][j] != -1){
if(j<m){
dp[i+1][j+1] = max(dp[i+1][j+1], dp[i][j] + d[i]);
}
if(i+j <= n){
dp[i+j][0] = max(dp[i+j][0], dp[i][j]);
}
if(j == 0){
dp[i+1][0] = max(dp[i+1][0], dp[i][0]);
}
}
}
}

cout<<dp[n][0]<<endl;
}

E. What time is it?

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#include<bits/stdc++.h>
using namespace std;

string s[3];

bool check_num(int num, int idx){
switch (num)
{
case 0:
return s[1][idx+1]==' ';
case 1:
return s[0][idx+1]==' '&&s[1][idx]==' '&&s[1][idx+1]==' '&&s[2][idx]==' '&&s[2][idx+1]==' ';
case 2:
return s[1][idx]==' '&&s[2][idx+2]==' ';
case 3:
return s[1][idx]==' '&&s[2][idx]==' ';
case 4:
return s[0][idx+1]==' '&&s[2][idx]==' '&&s[2][idx+1]==' ';
case 5:
return s[1][idx+2]==' '&&s[2][idx]==' ';
case 6:
return s[1][idx+2]==' ';
case 7:
return s[1][idx]==' '&&s[1][idx+1]==' '&&s[2][idx]==' '&&s[2][idx+1]==' ';
case 8:
return true;
case 9:
return s[2][idx]==' ';
default:
return false;
}
}

bool check_time(int hour, int minute, int idx){
return check_num(hour/10, idx) && check_num(hour%10, idx+3) && check_num(minute/10, idx+6) && check_num(minute%10, idx+9);
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

int t;
cin>>t;
getline(cin, s[0]);
while(t--){
for(int i=0;i<3;++i){
getline(cin, s[i]);
}
int ans = -1;

for(int hour=0;hour<24;++hour){
for(int minute=0;minute<59;++minute){
int delayminute = minute + 15;
int delayhour = hour + (delayminute / 60);
delayminute %= 60;
delayhour %= 24;
if(check_time(hour, minute, 0) && check_time(delayhour, delayminute, 13)){
if(ans == -1){
ans = hour * 100 + minute;
}
else{
ans = -2;
}
}
}
}
if(ans >= 0){
printf("%04d\n", ans);
}
else{
printf("Not Sure\n");
}
}
}

F. Sorting It All Out

Sortings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#include<bits/stdc++.h>
using namespace std;

int edge[26][26];
int indeg[26], outdeg[26];
int n, m;
bool is_unique, no_circle;

string test(){
memset(indeg, 0, sizeof(indeg));
memset(outdeg, 0, sizeof(outdeg));
is_unique = true;
no_circle = true;
string res;

for(int i=0;i<n;++i){
for(int j=0;j<n;++j){
if(edge[i][j]){
++outdeg[i];
++indeg[j];
}
}
}

while(true){
int tmp = -1;
for(int i=0;i<n;++i){
if(indeg[i] == 0){
if(tmp == -1){
tmp = i;
}
else{
is_unique=false;
}
}
}
if(tmp == -1){
break;
}
res.push_back('A'+tmp);
indeg[tmp] = -1;
for(int i=0;i<n;++i){
if(edge[tmp][i]){
--indeg[i];
}
}
}

for(int i=0;i<n;++i){
if(indeg[i] != -1){
no_circle = false;
}
}
return res;
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

while(cin>>n>>m, n && m){
bool finished = false;
memset(edge, 0, sizeof(edge));
for(int i=0;i<m;++i){
char a, c, b;
cin>>a>>c>>b;
if(finished){
continue;
}
edge[a-'A'][b-'A'] = 1;
string res = test();
if(!no_circle){
finished = true;
cout<<"Inconsistency found after "<<i+1<<" relations."<<endl;
}
if(no_circle && is_unique){
finished = true;
cout<<"Sorted sequence determined after "<<i+1<<" relations: "<<res<<"."<<endl;
}
}
if(!finished){
cout<<"Sorted sequence cannot be determined."<<endl;
}
}
}

G. Rails

Stack

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include<bits/stdc++.h>
using namespace std;

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

stack<int> s;
int n;
while(cin>>n, n){
while(true){
bool impossible = false;
int cnt = 1;
while(!s.empty()){
s.pop();
}
for(int i=0;i<n;++i){
int tmp;
cin>>tmp;
if(i==0 && tmp == 0){
goto end_loop;
}
if(impossible){
continue;
}
while(s.empty() || s.top() < tmp){
s.push(cnt);
++cnt;
}
if(s.top() == tmp){
s.pop();
}
else{
impossible = true;
}
}
cout<<(impossible? "No":"Yes")<<endl;
}
end_loop:
cout<<endl;
}
}

H. The Rotation Game

IDA*

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
#include<bits/stdc++.h>
using namespace std;

int opposite[8] = { 5, 4, 7, 6, 1, 0, 3, 2 };
int maxdepth;
int line[8][7] = {
{ 0, 2, 6, 11, 15, 20, 22 },
{ 1, 3, 8, 12, 17, 21, 23 },
{ 10, 9, 8, 7, 6, 5, 4 },
{ 19, 18, 17, 16, 15, 14, 13 },
{ 23, 21, 17, 12, 8, 3, 1 },
{ 22, 20, 15, 11, 6, 2, 0 },
{ 13, 14, 15, 16, 17, 18, 19 },
{ 4, 5, 6, 7, 8, 9, 10 }
};
string ans;
int circle[8] = { 6,7,8,11,12,15,16,17 };
int board[24];

int estimate()
{
int cnt[4] = { 0 }, ans = 0;
for (int i = 0; i<8; ++i)
ans = max(ans, ++cnt[board[circle[i]]]);
return 8 - ans;
}

void doit(int op)
{
int temp = board[line[op][0]];
for (int i = 0; i<6; ++i)
board[line[op][i]] = board[line[op][i + 1]];
board[line[op][6]] = temp;
}

bool dfs(int depth)
{
int guess = estimate();
if (guess == 0)
return true;
else if (depth + guess>maxdepth)
return false;
for (int i = 0; i<8; ++i)
{
doit(i);
ans.push_back('A' + i);
if (dfs(depth + 1))
return true;
ans.pop_back();
doit(opposite[i]);
}
return false;
}

void ida()
{
for (int i = 1;; ++i)
{
maxdepth = i;
if (dfs(0))
break;
}
}

int main()
{
while (cin >> board[0], board[0])
{
ans.clear();
for (int i = 1; i<24; ++i)
cin >> board[i];
if (estimate() == 0)
{
cout << "No moves needed" << endl;
cout << board[7] << endl;
}
else
{
ida();
cout << ans << endl;
cout << board[7] << endl;
}
}
}

在神经网络训练过程当中,如果是利用jupyter-notebook来进行代码编写的话,可能断开之后不会看到输出的结果。利用logging模块可以将内容同时输出到文件和终端,首先定义构造logger的函数内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import logging

def get_logger(filename, verbosity=1, name=None):
level_dict = {0: logging.DEBUG, 1: logging.INFO, 2: logging.WARNING}
formatter = logging.Formatter(
"[%(asctime)s][%(filename)s][line:%(lineno)d][%(levelname)s] %(message)s"
)
logger = logging.getLogger(name)
logger.setLevel(level_dict[verbosity])

# Output to file
fh = logging.FileHandler(filename, "w")
fh.setFormatter(formatter)
logger.addHandler(fh)

# Output to terminal
sh = logging.StreamHandler()
sh.setFormatter(formatter)
logger.addHandler(sh)

return logger

之后在训练流程当中通过以上的函数生成logger,再采用info方法进行保存就可以了:

1
2
3
4
5
6
7
8
9
10
logger = get_logger('./train.log')

logger.info('start training!')
for epoch in range(EPOCHS):
...
loss = ...
acc = ...
logger.info('Epoch:[{}/{}]\t loss={:.5f}\t acc={:.3f}'.format(epoch, EPOCHS, loss, acc))
...
logger.info('finish training!')

Introduction

正常的分布式训练方法会传递所有的梯度,但是实际上,可能有超过99%的梯度交换都是无关紧要的,所以可以通过只传送重要的梯度来减少在深度学习的分布式训练当中通信带宽。DGC方法通过稀疏更新来只传递重要的梯度,同时利用Momentum Correction,Gradient Clipping等方法来预防不收敛的情形,保证分布式训练不会影响最终的模型准确性。

Deep Gradient Compression

Gradient Sparsification

那么如何来衡量一个梯度的重要性呢,显然如果一个网络节点的梯度大小绝对值更大的话,它会带来的权重更新就会更大,那么对于整个网络的变化趋势而言,就是更为重要的。在DGC方法当中,认为绝对值大小超过某一阈值的梯度为重要的梯度。但是如果只传递此类重要梯度的话,和损失函数的优化目标会存在一定程度上的差距,所以出于减少信息损失的考虑,把剩下不重要的梯度在本地进行累积,那么只要时间足够,最终累积梯度就会超过所设定的阈值,进行梯度交换。

由于仅仅是数值较大的梯度进行了立即传递,较小的梯度在本地进行累积更新,所以能够极大减少每个step交换梯度所需要的通信带宽。那么需要考虑的一点是这种本地梯度累积的方法是否会对于优化目标产生影响,计f()f(\cdot)为损失函数,可以首先分析一个step的更新公式如下:

wt+1(i)=wt(i)η1Nbk=1NxBk,t(i)f(x,wt) w_{t+1}^{(i)} = w_{t}^{(i)}- \eta \frac{1}{Nb}\sum_{k=1}^N \sum_{x\in \mathcal{B}_{k,t}}\nabla^{(i)}f(x,w_t)

如果在本地进行梯度累积,那么假设在经历过TT个step之后才进行梯度交换,那么更新公式可以修改为如下形式:

wt+T(i)=wt(i)η1Nbk=1Nτ=0T1xBk,t(i)f(x,wt+τ)=wt(i)ηT1NbTk=1N(τ=0T1xBk,t(i)f(x,wt+τ)) \begin{aligned} w_{t+T}^{(i)} &= w_{t}^{(i)}- \eta \frac{1}{Nb}\sum_{k=1}^N \sum_{\tau= 0}^{T-1}\sum_{x\in \mathcal{B}_{k,t}}\nabla^{(i)}f(x,w_{t+\tau}) \\ &= w_{t}^{(i)} - \eta T \frac{1}{NbT}\sum_{k=1}^N \left(\sum_{\tau= 0}^{T-1}\sum_{x\in \mathcal{B}_{k,t}}\nabla^{(i)}f(x,w_{t+\tau})\right) \end{aligned}

那么如上式所示,可以发现当针对于TT大小进行学习率缩放之后,在分子分母的TT可以消去,于是总体可以看成是人为的将batch大小从NbNb提升到了NbTNbT。所以直观上本地梯度累积的方法可以看成是随着更新时间区间的拉长来增大训练batch的大小,同时对于学习率进行同比例缩放,并不会影响最终的优化目标。

Momentum Correction

如果直接针对于普通的SGD采用以上的DGC方法,那么先让当更新十分稀疏,即间隔区间长度TT很大的时候,可能会影响网络的收敛。所以又提出了Momentum Correction和Local Gradient的方法来缓解对于收敛性质的伤害。

最普通的动量方法如下所示,其中mm的值即为动量。

ut+1=mut+k=1Nk,t,wt+1=wtηut u_{t+1} = mu_t + \sum_{k=1}^N \nabla_{k,t},\quad w_{t+1} = w_t - \eta u_t

事实上最终进行本地的梯度累积和更新都是利用左侧的utu_t来代替原始梯度t\nabla_t的,于是可以得到参数更新的表达式如下,假设稀疏更新的时间间隔为TT

wt+T(i)=wt(i)ηk=1N(τ=0T1ut)=wt(i)ηk=1N[(τ=0T1mτ)k,t(i)+(τ=0T2mτ)k,t+1(i)+] \begin{aligned} w_{t+T} ^{(i)} &= w_t^{(i)} - \eta\sum_{k=1}^N\left( \sum_{\tau=0}^{T-1}u_t \right) \\ &= w_t^{(i)} - \eta\sum_{k=1}^N\left[ \left(\sum_{\tau=0}^{T-1}m^\tau\right)\nabla_{k,t}^{(i)}+ \left(\sum_{\tau=0}^{T-2}m^\tau\right)\nabla_{k,t+1}^{(i)} +\ldots\right] \end{aligned}

对比没有动量修正的更新方法如下:

wt+T(i)=wt(i)ηk=1N[k,t(i)+k,t+1(i)+] \begin{aligned} w_{t+T} ^{(i)} &= w_t^{(i)} - \eta\sum_{k=1}^N\left[\nabla_{k,t}^{(i)}+ \nabla_{k,t+1}^{(i)} +\ldots\right] \end{aligned}

可以发现实际上缺少了τ=0T1mτ\sum_{\tau=0}^{T-1}m^\tau的求和项,当mm为0的时候得到的就是普通情形。直观上来理解可以认为越早的梯度提供了一个更大的权重。这是合理的是因为在进行梯度交换更新之后,本地参数和全局参数是相同的,而随着本地更新时间的增加,本地参数同全局参数的差异会越来越大,那么对于所得到梯度全局的影响的泛化性应当越来越差,所以理应当赋予一个更小的权重。

Local Gradient Clipping

梯度裁剪即在梯度的L2范数超过一个阈值的时候,对梯度进行一个缩放,来防止梯度爆炸的问题。通常而言,分布式训练的梯度裁剪是在进行梯度累积之后进行,然后利用裁剪过后的梯度进行更新,并分发新的网络权重给其他的训练节点。但是在DGC方法中将梯度的传送稀疏化了,同时存在本地更新,这种先汇总后裁剪的方法就不可取。这里的梯度裁剪是再将新的梯度累加到本地之前就进行。

需要做一个假设如下,假设NN个训练节点的随机梯度为独立同分布,都有着方差σ2\sigma^2,那么可以知道所有训练节点的梯度汇总之后,总的梯度应当有方差Nσ2N\sigma^2,于是单个运算节点的梯度和总梯度有关系如下:

E[Gk2]σ,E[G2]N1/2σE[Gk2]N1/2E[G2] E[||G^k||_2]\approx\sigma , \quad E[||G||_2] \approx N^{1/2}\sigma \Rightarrow E[||G^k||_2]\approx N^{-1/2}E[||G||_2]

所以应当对于所设定的阈值进行一个缩放,假设原本设定的梯度的L2范数的阈值为thrGthr_{G}的话,那么对于每一个训练节点而言,其阈值应当为:

thrGk=N1/2thrGthr_{G^k} = N^{-1/2}thr_G

其中的NN表示的是训练节点的个数。

Overcoming the Staleness Effect

​ 事实上由于梯度在本地进行累积,可能更新的时候梯度是过时的了(stale),在实验中发现绝大部分的梯度都在600~1000次迭代之后才会更新。文章中提出了两种方法来进行改善。

Momentum Factor Masking

vv记作在本地的梯度累积:

vk,t=vk,t1+uk,tv_{k,t} = v_{k,t-1} + u_{k,t}

可以利用Momentum Factor Masking的方法,这里简单起见,对于梯度uu和累积梯度vv采用同样的Mask:

Maskvk,t>thrvk,tvk,t¬Maskuk,tuk,t¬Mask\begin{aligned} Mask \leftarrow|v_{k,t}| > thr \\ v_{k,t}\leftarrow v_{k,t}\odot \neg Mask \\ u_{k,t}\leftarrow u_{k,t}\odot \neg Mask \end{aligned}

这个方法会让过于老的梯度来停止累积,防止过于旧的梯度影响模型的优化方向。

Warm-up Training

在训练的最初期,模型往往变化的特别剧烈,这时候采用DGC的问题如下:

  1. 稀疏更新会限制模型变化的范围,使得这个剧烈变化的时段变得更长。
  2. 早期留在本地的梯度,可能和实际的优化方向并不符合,在后面传递可能会把模型引入错误的方向。

所以采用Warm-up的训练trick,在一开始采用更低的学习率,同时采用更低的稀疏更新的阈值,减少被延缓传递的参数。这里在训练的最开始采用一个很低的学习率,然后以指数形式逐渐增加到默认值。

原文

Ashish has a tree consisting of nn nodes numbered 11 to nn rooted at node 11. The ii-th node in the tree has a cost aia_i, and binary digit bib_i is written in it. He wants to have binary digit cic_i written in the ii-th node in the end.

To achieve this, he can perform the following operation any number of times:

  • Select any kk nodes from the subtree of any node uu, and shuffle the digits in these nodes as he wishes, incurring a cost of kauk\cdot a_u. Here, he can choose kk ranging from 11 to the size of the subtree of uu.

He wants to perform the operations in such a way that every node finally has the digit corresponding to its target.

Help him find the minimum total cost he needs to spend so that after all the operations, every node uu has digit cuc_u written in it, or determine that it is impossible.

Input

First line contains a single integer nn (1n2105)(1\le n\le 2\cdot 10^5) denoting the number of nodes in the tree.

ii-th line of the next nn lines contains 3 space-separated integers aia_i, bib_i, cic_i (1ai109,0bi,ci1)(1\le a_i\le 10^9,0\le b_i,c_i\le1) — the cost of the ii-th node, its initial digit and its goal digit.

Each of the next n−1 lines contain two integers uu, vv (1u,vn,uv)(1\le u,v\le n, u\ne v), meaning that there is an edge between nodes uu and vv in the tree.

Output

Print the minimum total cost to make every node reach its target digit, and −1 if it is impossible.

Examples

input

1
2
3
4
5
6
7
8
9
10
5
1 0 1
20 1 0
300 0 1
4000 0 0
50000 1 0
1 2
2 3
2 4
1 5

output

1
4

input

1
2
3
4
5
6
7
8
9
10
5
10000 0 1
2000 1 0
300 0 1
40 0 0
1 1 0
1 2
2 3
2 4
1 5

output

1
24000

input

1
2
3
4
2
109 0 1
205 0 1
1 2

output

1
-1

Note

The tree corresponding to samples 11 and 22 are:

In sample 11, we can choose node 11 and k=4k=4 for a cost of 41=44\cdot1 = 4 and select nodes 1,2,3,51,2,3,5, shuffle their digits and get the desired digits in every node.

In sample 22, we can choose node 11 and k=2k=2 for a cost of 10000210000\cdot2, select nodes 1,51,5 and exchange their digits, and similarly, choose node 22 and k=2k=2 for a cost of 200022000\cdot2, select nodes 2,32,3 and exchange their digits to get the desired digits in every node.

In sample 33, it is impossible to get the desired digits, because there is no node with digit 11 initially.

题意

有一颗树,树根为节点1,每个节点有三个属性a,b,ca,b,cbb代表当前节点的状态,cc代表当前节点需要变成的状态,aa代表修改节点状态所需要的花费。

现在能够进行一种操作:

选择一颗以uu为根节点的子树,任意的修改其子树中的kk个节点的状态(互相交换),这个操作的花费为kauk \cdot a_u

现在需要求解使树中所有节点都能变成目标状态所需要的最小开销,如果不存在则进行判定。

思路

首先可以知道,如果一个节点ii的父亲节点为pp,且有a[i]>a[p]a[i]>a[p],那么所有以ii节点为根的子树上涉及到的操作都可以移动到其父节点上进行,可以对整棵树进行a[i]=min(a[i],a[p])a[i] = \min(a[i],a[p])的操作而不改变本身的结果。

完成以上操作之后,根节点到叶子结点的值实际上就是一个单调递减的序列,可以从叶子结点往根节点进行遍历,一旦当前子树上存在可以操作的成对节点就进行操作。

时间复杂度为遍历树的时间复杂度O(N)O(N)

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <bits/stdc++.h>
using namespace std;

#define maxn 200010
#define ll long long

int n;
ll a[maxn], b[maxn], c[maxn];
vector<int> g[maxn];
ll ans;

pair<ll, ll> solve(int u, int pa=0){
a[u] = min(a[u], a[pa]);
ll cnt0 = 0;
ll cnt1 = 0;
if(b[u] == 0 && c[u] == 1){
++cnt0;
}
else if(b[u] == 1 && c[u] == 0){
++cnt1;
}

for(auto v: g[u]){
if(v != pa){
auto tmp = solve(v, u);
cnt0 += tmp.first;
cnt1 += tmp.second;
}
}

ll change = min(cnt0, cnt1);
ans += change * a[u] * 2;
return make_pair(cnt0-change, cnt1-change);
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

cin>>n;
for(int i=1;i<=n;++i){
cin>>a[i]>>b[i]>>c[i];
}

for(int i=1;i<n;++i){
int u,v;
cin>>u>>v;
g[v].push_back(u);
g[u].push_back(v);
}

a[0] = a[1];

auto res = solve(1);

if(res.first != 0 || res.second != 0){
cout<<-1<<endl;
}
else{
cout<<ans<<endl;
}
}

Exercise 1

当是ENV_TYPE_FS的时候提供读写的权限,对于普通的Env则不提供权限。

1
2
3
4
// If this is the file server (type == ENV_TYPE_FS) give it I/O privileges.
// LAB 5: Your code here.
if(type == ENV_TYPE_FS)
e->env_tf.tf_eflags |= FL_IOPL_MASK;

Exercise 2

为block分配一个页面,并且将block的内容读入。首先将addr进行对齐,这里由于page的大小和block的大小相同,所以怎样对齐都是可以的。之后利用sys_page_alloc()系统调用来进行页面分配,然后利用ide_read()进行内容的读取。注意的是ide_read()是以sector为单位进行读取的,是512个byte,同block的大小是8倍的关系,所以需要进行一个小小的处理。

1
2
3
4
5
6
7
8
9
10
11
// Allocate a page in the disk map region, read the contents
// of the block from the disk into that page.
// Hint: first round addr to page boundary. fs/ide.c has code to read
// the disk.
//
// LAB 5: you code here:
addr = ROUNDDOWN(addr, BLKSIZE);
if((r = sys_page_alloc(0, addr, PTE_U | PTE_P | PTE_W)) < 0)
panic("in bc_pgfault, sys_page_alloc: %e", r);
if((r = ide_read(blockno << 3, addr, 8)) < 0)
panic("in_bc_pgfault, ide_read: %e", r);

当必要的时候,需要将Block写回磁盘,而必要的时候就是可写页面被修改的时候,所以这里首先的判断条件就应当是当前所对应的block是被映射了的,并且是dirty的。在调用ide_write()写回之后,需要重新进行一次映射来消除掉dirty位。这样可以避免多次进行flush,访问磁盘占用大量的时间。

1
2
3
4
5
6
7
8
9
// LAB 5: Your code here.
int r;
addr = ROUNDDOWN(addr, BLKSIZE);
if(va_is_mapped(addr) && va_is_dirty(addr)){
if((r = ide_write(blockno << 3, addr, 8)) < 0)
panic("in flush_block, ide_write: %e", r);
if((r = sys_page_map(0, addr, 0, addr, uvpt[PGNUM(addr)] & PTE_SYSCALL)) < 0)
panic("in flush block, sys_page_map: %e", r);
}

Exercise 3

首先观察free_block()函数,可以发现每一位标志着一个block的状态,其中1表示为空闲,0表示为忙碌。访问的方式如第八行所示。

1
2
3
4
5
6
7
8
9
// Mark a block free in the bitmap
void
free_block(uint32_t blockno)
{
// Blockno zero is the null pointer of block numbers.
if (blockno == 0)
panic("attempt to free zero block");
bitmap[blockno/32] |= 1<<(blockno%32);
}

那么仿照就可以完成alloc_block(),对所有的block进行一个遍历,如果发现free的就进行分配,在bitmap中标志为0,之后立即利用flush_block()将将修改写回磁盘,并且返回blockno。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Search the bitmap for a free block and allocate it.  When you
// allocate a block, immediately flush the changed bitmap block
// to disk.
//
// Return block number allocated on success,
// -E_NO_DISK if we are out of blocks.
//
// Hint: use free_block as an example for manipulating the bitmap.
int
alloc_block(void)
{
// The bitmap consists of one or more blocks. A single bitmap block
// contains the in-use bits for BLKBITSIZE blocks. There are
// super->s_nblocks blocks in the disk altogether.

// LAB 5: Your code here.
int blockno;
for(blockno = 1; blockno < super->s_nblocks; blockno++){
if(block_is_free(blockno)){
bitmap[blockno/32] &= ~(1<<(blockno%32));
flush_block(&bitmap[blockno/32]);
return blockno;
}
}
return -E_NO_DISK;
}

Exercise 4

这里的寻址实际上类似于lab2当中虚拟内存的寻址,不过在这里的设计上只有一个indirect的链接,所以实现起来相对简单很多。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Find the disk block number slot for the 'filebno'th block in file 'f'.
// Set '*ppdiskbno' to point to that slot.
// The slot will be one of the f->f_direct[] entries,
// or an entry in the indirect block.
// When 'alloc' is set, this function will allocate an indirect block
// if necessary.
//
// Returns:
// 0 on success (but note that *ppdiskbno might equal 0).
// -E_NOT_FOUND if the function needed to allocate an indirect block, but
// alloc was 0.
// -E_NO_DISK if there's no space on the disk for an indirect block.
// -E_INVAL if filebno is out of range (it's >= NDIRECT + NINDIRECT).
//
// Analogy: This is like pgdir_walk for files.
// Hint: Don't forget to clear any block you allocate.
static int
file_block_walk(struct File *f, uint32_t filebno, uint32_t **ppdiskbno, bool alloc)
{
// LAB 5: Your code here.
int r;

if(filebno >= NDIRECT + NINDIRECT)
return -E_INVAL;

if(filebno < NDIRECT){
*ppdiskbno = (f->f_direct) + filebno;
}
else{
if(alloc && (f->f_indirect == 0)){
if((r = alloc_block()) < 0)
return r;
memset(diskaddr(r), 0, BLKSIZE);
f->f_indirect = r;
}
else if(f->f_indirect == 0){
return -E_NOT_FOUND;
}
*ppdiskbno = ((uint32_t *)diskaddr(f->f_indirect)) + filebno - NDIRECT;
}
return 0;
}

通过file_block_walk()找到对应的pdiskbno,如果为0,那么就进行分配,否则的话利用diskaddr()转换成地址。成功就返回0。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Set *blk to the address in memory where the filebno'th
// block of file 'f' would be mapped.
//
// Returns 0 on success, < 0 on error. Errors are:
// -E_NO_DISK if a block needed to be allocated but the disk is full.
// -E_INVAL if filebno is out of range.
//
// Hint: Use file_block_walk and alloc_block.
int
file_get_block(struct File *f, uint32_t filebno, char **blk)
{
// LAB 5: Your code here.
uint32_t * pdiskbno;
int r;
if((r = file_block_walk(f, filebno, &pdiskbno, true)) < 0)
return r;
if(*pdiskbno == 0){
if((r = alloc_block()) < 0)
return r;
*pdiskbno = r;
}
*blk = (char *)diskaddr(*pdiskbno);
flush_block(*blk);
return 0;
}

Exercise 5

实际上就是针对于file_read()的一层封装,首先利用openfile_lookup()找到对应的文件,然后调用file_read()进行读入,之后调整文件的指针,并返回读入的大小。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Read at most ipc->read.req_n bytes from the current seek position
// in ipc->read.req_fileid. Return the bytes read from the file to
// the caller in ipc->readRet, then update the seek position. Returns
// the number of bytes successfully read, or < 0 on error.
int
serve_read(envid_t envid, union Fsipc *ipc)
{
struct Fsreq_read *req = &ipc->read;
struct Fsret_read *ret = &ipc->readRet;

if (debug)
cprintf("serve_read %08x %08x %08x\n", envid, req->req_fileid, req->req_n);

// Lab 5: Your code here:
struct OpenFile * o;
int r;
if((r = openfile_lookup(envid, req->req_fileid, &o)) < 0)
return r;
if((r = file_read(o->o_file, ret->ret_buf, req->req_n, o->o_fd->fd_offset)) < 0)
return r;
o->o_fd->fd_offset += r;
return r;
}

Exercise 6

serve_write()的逻辑类似于serve_read(),具体代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Write req->req_n bytes from req->req_buf to req_fileid, starting at
// the current seek position, and update the seek position
// accordingly. Extend the file if necessary. Returns the number of
// bytes written, or < 0 on error.
int
serve_write(envid_t envid, struct Fsreq_write *req)
{
if (debug)
cprintf("serve_write %08x %08x %08x\n", envid, req->req_fileid, req->req_n);

// LAB 5: Your code here.
struct OpenFile *o;
int r;
if((r = openfile_lookup(envid, req->req_fileid, &o)) < 0)
return r;
if((r = file_write(o->o_file, req->req_buf, req->req_n, o->o_fd->fd_offset)) < 0)
return r;
o->o_fd->fd_offset += r;
return r;
}

devfile_write(0)同样可以仿照devfile_read()进行完成。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Write at most 'n' bytes from 'buf' to 'fd' at the current seek position.
//
// Returns:
// The number of bytes successfully written.
// < 0 on error.
static ssize_t
devfile_write(struct Fd *fd, const void *buf, size_t n)
{
// Make an FSREQ_WRITE request to the file system server. Be
// careful: fsipcbuf.write.req_buf is only so large, but
// remember that write is always allowed to write *fewer*
// bytes than requested.
// LAB 5: Your code here
int r;
fsipcbuf.write.req_fileid = fd->fd_file.id;
fsipcbuf.write.req_n = n;
memmove(fsipcbuf.write.req_buf, buf, n);
if((r = fsipc(FSREQ_WRITE, NULL)) < 0)
return r;
assert(r <= n);
assert(r <= PGSIZE);
return r;
}

Exercise 7

首先判断得到env,然后从18~20行依次为,开启中断,设置IOPL为0,将保护权限设置为3(CPL 3)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Set envid's trap frame to 'tf'.
// tf is modified to make sure that user environments always run at code
// protection level 3 (CPL 3), interrupts enabled, and IOPL of 0.
//
// Returns 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if environment envid doesn't currently exist,
// or the caller doesn't have permission to change envid.
static int
sys_env_set_trapframe(envid_t envid, struct Trapframe *tf)
{
// LAB 5: Your code here.
// Remember to check whether the user has supplied us with a good
// address!
struct Env* env;
if(envid2env(envid, &env, 1))
return -E_BAD_ENV;
env->env_tf = *tf;
env->env_tf.tf_eflags |= FL_IF;
env->env_tf.tf_eflags &= ~FL_IOPL_MASK;
env->env_tf.tf_cs |= 3;
return 0;
panic("sys_env_set_trapframe not implemented");
}

Exercise 8

关于duppage()的修改,可以注意到直接只需要在可写页面修改为COW的时候进行PTE_SHARE的标志位判断就可以了,只在第11行处添加一个判断,其余内容同lab4中完全相同。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
static int
duppage(envid_t envid, unsigned pn)
{
int r;

// LAB 4: Your code here.
pte_t pte = uvpt[pn];
void * addr = (void *)(pn * PGSIZE);

uint32_t perm = pte & PTE_SYSCALL;
if(perm & (PTE_W | PTE_COW) && !(perm & PTE_SHARE)){
perm &= ~PTE_W;
perm |= PTE_COW;
}
if((r = sys_page_map(0, addr, envid, addr, perm & PTE_SYSCALL))<0)
panic("duppage: %e", r);
if((r = sys_page_map(0, addr, 0, addr, perm & PTE_SYSCALL))<0)
panic("duppage: %e", r);
return 0;
}

从UTEXT到UXSTACKTOP的所有共享页面将其映射复制到子进程的地址空间即可,代码类似于lab4里面的fork()

1
2
3
4
5
6
7
8
9
10
11
// Copy the mappings for shared pages into the child address space.
static int
copy_shared_pages(envid_t child)
{
// LAB 5: Your code here.
uint8_t* addr;
for(addr = (uint8_t *)UTEXT; addr <(uint8_t *)UXSTACKTOP; addr += PGSIZE)
if((uvpd[PDX(addr)] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_SHARE))
sys_page_map(0, (void *)addr, child, (void *)addr, (uvpt[PGNUM(addr)] & PTE_SYSCALL));
return 0;
}

Exercise 9

添加对应的分发入口即可:

1
2
3
4
5
6
7
8
9
10
11
// Handle keyboard and serial interrupts.
// LAB 5: Your code here.
if(tf->tf_trapno == IRQ_OFFSET + IRQ_KBD){
kbd_intr();
return;
}

if(tf->tf_trapno == IRQ_OFFSET + IRQ_SERIAL){
serial_intr();
return;
}

Exercise 10

重定向输出流的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
case '>':	// Output redirection
// Grab the filename from the argument list
if (gettoken(0, &t) != 'w') {
cprintf("syntax error: > not followed by word\n");
exit();
}
if ((fd = open(t, O_WRONLY|O_CREAT|O_TRUNC)) < 0) {
cprintf("open %s for write: %e", t, fd);
exit();
}
if (fd != 1) {
dup(fd, 1);
close(fd);
}
break;

仿照就可以完成输入流的重定向:

1
2
3
4
5
6
7
8
9
10
// LAB 5: Your code here.
if((fd = open(t, O_RDONLY)) < 0){
cprintf("open %s for read: %e", t, fd);
exit();
}
if(fd != 0){
dup(fd, 0);
close(fd);
}
break;

到这里可以达到150/150的满分:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
internal FS tests [fs/test.c]: OK (1.0s)
fs i/o: OK
check_bc: OK
check_super: OK
check_bitmap: OK
alloc_block: OK
file_open: OK
file_get_block: OK
file_flush/file_truncate/file rewrite: OK
testfile: OK (1.6s)
serve_open/file_stat/file_close: OK
file_read: OK
file_write: OK
file_read after file_write: OK
open: OK
large file: OK
spawn via spawnhello: OK (1.8s)
Protection I/O space: OK (1.6s)
PTE_SHARE [testpteshare]: OK (0.9s)
PTE_SHARE [testfdsharing]: OK (1.4s)
start the shell [icode]: Timeout! OK (31.5s)
testshell: OK (2.4s)
(Old jos.out.testshell failure log removed)
primespipe: OK (7.2s)
Score: 150/150

Challenge 1

Challenge的要求即为清空掉所有的没有被访问的页面。那么对于单个页面,只需要调用flush_block(),之后通过系统调用unmap就可以了。evict_policy()即对于所有的block做一个便利,清除所有从未被访问过的页面。具体代码内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// challenge
void
evict_block(void *addr){
uint32_t blockno = ((uint32_t)addr - DISKMAP) / BLKSIZE;
if(addr < (void*)DISKMAP || addr >= (void*)(DISKMAP + DISKSIZE))
panic("evict_block of bad va %08x", addr);

int r;
addr = ROUNDDOWN(addr, BLKSIZE);
flush_block(addr);
if((r = sys_page_unmap(0, addr)) < 0)
panic("in evict block, sys_page_unmap: %e", r);
}

void
evict_policy(){
uint32_t blockno;
for(blockno = 3; blockno < DISKSIZE / BLKSIZE; ++blockno){
if(!(uvpt[PGNUM(diskaddr(blockno))]&PTE_A)){
evict_block(diskaddr(blockno));
}
}
}

往常的论文对于训练过程中的方法使用并不是非常关注,通常只能从代码中去寻找训练过程所采用的trick。

这篇文章介绍了一些训练图像分类CNN网络的trick,使用这些方法可以显著的提高分类网络的准确性。同时更好的准确性在迁移到object detection和semantic segmentation等任务上也会有更好的效果。

image-20200509133659544

Training Procedures

利用随机梯度下降的方法来训练神经网络的基本流程如下所示,首先先对于baseline情况的超参数进行一个假定,之后再讨论优化的方法:

Efficient Training

使用更低的精度和更大的batch size可以使得训练更加高效,这里提供了一系列这样的方法,有的可以同时提升准确率和训练速度。

Large-batch training

使用更大的batch size可能会减慢训练进程。对于凸优化问题,随着batch size的上升,收敛速率会下降,在神经网络当中也有相同的情况。在同样数目的epoch情况下,用大的batch训练的模型在验证集上的效果会更差。

下面有四个启发式的方法用来解决上面提到的问题:

Linear scaling learning rate

理论上来说,从估计梯度的角度,提升batch size并不会使得得到梯度的大小期望改变,而是使得它的方差更小。所以可以对于学习率进行成比率的缩放,能够使得效果提升。

比如在当batch size为256的时候,选用0.1的学习率。

那么当采用一个更大的batch size,例如bb的时候,就将学习率设置为0.1×b/2560.1\times b/256

Learning rate warmup

在训练开始的时候,所有参数都是随机设置的,离最终结果可能会非常远,所以采用一个较大的学习率会导致数值上的不稳定,所以可以考虑在最开始使用一个很小的学习率,再慢慢地在训练较为稳定的时候切换到初始学习率。

例如,如果我们采用前mm个batch来进行warm up的话,并且学习率的初值为η\eta,那么在第ii个batch,1im1\le i \le m,就选用大小为iη/mi\eta/m的学习率。

Zero γ\gamma

ResNet是有很多个residual block堆叠出来的,每个residual block的结果可以看成是x+block(x)x+block(x),而block的最后一层通常是BN层,他会首先进行一个standardize,得到x^\hat{x},然后在进行一个缩放γx^+β\gamma \hat{x}+\beta,其中γ\gammaβ\beta都是可学习的参数,通常被初始化为1和0。

可以考虑在初始化的时候,将所有的参数γ\gamma都设置成0,这样网络的输出就和他的输入相同,这样可以缩小网络,使得在最开始的时候训练变得更简单。

No bias decay

weight decay通常被用于所有可学习的参数来防止正则化,这里的trick就是只作用于权重,而不对bias包括BN层当中的γ\gammaβ\beta做decay。

Low-precision training

通常是采用FP32来进行训练的,但是大量的TPU在FP16上面的效率会高很多,所以可以考虑采用FP16来进行训练,能够达到几倍的加速。

实验结果如上,其中Baseline的设置为BS=256BS=256,采用FP32,Efficient的设置为BS=1024BS=1024,采用FP16。

Model Tweaks

原本的ResNet网络结构如图所示:

这里不多做介绍,文章提出了三种针对ResNet修改方式,分别记作ResNet-B/C/D,能够得到一定程度上的准确率提升。

ResNet-B

第一个所做的就是改变了downsampling的方式,由于如果采用1×11\times 1卷积同时选择stride为2的话,会丢失3/43/4的信息,这里在3×33\times 3的的卷积层来downsampling,理论上所有信息都会被接受到。

ResNet-C

一个发现就是卷积计算量随卷积核大小变化是二次多项式级别的,所以利用三层3×33\times 3的卷积来替代7×77\times 7的卷积层。

ResNet-D

从ResNet-B的角度更进一步,旁路也同样会有信息丢失的问题,所以将一层的1×11 \times 1卷积修改成一层Average Pooling再加上一层卷积,减少信息的丢失。

Experiment Result

可以看到三者对于原始的ResNet结构准确率都有一定程度上的上升。

Training Refinements

Cosine Learning Rate Decay

采用最广泛的策略是指数衰减。这里提供的一种方法是利用cosine函数进行衰减,忽略掉最开始进行warmup的部分,假设一共有TT个batch,那么在第tt个batch上,学习率为:

ηt=12(1+cos(tπT))η\eta_t = \frac{1}{2}(1+\cos(\frac{t\pi}{T}))\eta

其中η\eta就是初始的学习率,对比结果如下:

image-20200509165038277

可以看到cosine decay在一开始和最终学习率下降的比较慢,在中间接近于线性的衰减。

Label Smoothing

对于分类任务,最后一层通常都是通过一个softmax函数来得到预测的概率,对于第ii类的结果qiq_i

qi=exp(zi)j=1Kexp(zj)q_i = \frac{\exp(z_i)}{\sum_{j=1}^K \exp(z_j)}

我们在学习过程当中最小化的是负的cross entropy:

l(p,q)=i=1Kpilogqil(p,q) = -\sum_{i=1}^K p_i \log q_i

pip_i只有当i=yi=y的时候为1,其余时候为0,于是就等价于:

l(p,q)=log(qy)=zy+log(i=1Kexp(zi))l(p,q) = -\log(q_y) = -z_y + \log (\sum_{i=1}^K \exp(z_i))

要达到最小化的效果应当有zy=infz_y^\star=\inf,同时使得其他种类的ziz_i尽可能小,换言之,这样会倾向于让预测结果趋于极端值,最终导致overfit。

label smoothing的思想就是将真实概率构造为:

pi={1ϵi=yϵ/(K1)otherwisep_i = \begin{cases}1-\epsilon &i=y\\\epsilon / (K-1) & \text{otherwise}\end{cases}

其中ϵ\epsilon是一个小的常量,此时的最优解为:

zi={log((K1)(1ϵ)/ϵ)+αi=yαotherwisez_i^\star = \begin{cases}\log((K-1)(1-\epsilon)/\epsilon)+\alpha & i=y\\\alpha & \text{otherwise}\end{cases}

其中α\alpha是一个实数,这会使得最终全连接层的输出会趋向于一个有限值,能够更好地泛化。

里面log((K1)(1ϵ)/ϵ)\log((K-1)(1-\epsilon)/\epsilon)的部分就是正确分类和错误分类之间的gap,它随着ϵ\epsilon的增大不断缩小,当ϵ=(K1)/K\epsilon = (K-1)/K的时候,实际上就是均匀分布,gap就为0了。

关于gap的可视化如下:

image-20200510131500469

可以发现smooth之后gap总体均值变小,并且大gap的极端值情况有效减少了。

Knowledge Distillation

知识蒸馏中,我们尝试用一个学生模型来学习高准确率的老师模型里面的知识。方法就是利用一个distillation loss来惩罚学生模型偏离老师模型的预测概率。利用pp作为真实概率,zzrr分别作为学生网络和老师网络全连接层的输出,利用负的cross entropy作为loss,总的loss可以写成:

l(p,softmax(z))+T2l(softmax(r/T),softmax(z/T))l(p,\text{softmax}(z)) + T^2l(\text{softmax}(r/T),\text{softmax}(z/T))

其中TT为温度超参数,使得softmax函数的输出变得更为平滑。

Mixup Training

作为一个数据增强的手段,每次随机选择两个样本(xi,yi)(x_i,y_i)(xj,yj)(x_j,y_j)。然后对两个样本加权来作为新的样本:

x^=λxi+(1λ)xjy^=λyi+(1λ)yj\begin{aligned} \hat{x} &= \lambda x_i + (1-\lambda)x_j \\ \hat{y} &= \lambda y_i + (1-\lambda)y_j \end{aligned}

其中λ[0,1]\lambda \in [0,1]是一个从Beta(α,α)\text{Beta}(\alpha,\alpha)中采样出来的数,在mixup中只对于新的样本(x^,y^)(\hat{x},\hat{y})来进行训练。

原文

A monopole magnet is a magnet that only has one pole, either north or south. They don’t actually exist since real magnets have two poles, but this is a programming contest problem, so we don’t care.

There is an n×mn\times m grid. Initially, you may place some north magnets and some south magnets into the cells. You are allowed to place as many magnets as you like, even multiple in the same cell.

An operation is performed as follows. Choose a north magnet and a south magnet to activate. If they are in the same row or the same column and they occupy different cells, then the north magnet moves one unit closer to the south magnet. Otherwise, if they occupy the same cell or do not share a row or column, then nothing changes. Note that the south magnets are immovable.

Each cell of the grid is colored black or white. Let’s consider ways to place magnets in the cells so that the following conditions are met.

  1. There is at least one south magnet in every row and every column.
  2. If a cell is colored black, then it is possible for a north magnet to occupy this cell after some sequence of operations from the initial placement.
  3. If a cell is colored white, then it is impossible for a north magnet to occupy this cell after some sequence of operations from the initial placement.

Determine if it is possible to place magnets such that these conditions are met. If it is possible, find the minimum number of north magnets required (there are no requirements on the number of south magnets).

Input

The first line contains two integers nn and mm (1n,m10001\le n,m\le 1000) — the number of rows and the number of columns, respectively.

The next nn lines describe the coloring. The ii-th of these lines contains a string of length mm, where the jj-th character denotes the color of the cell in row ii and column jj. The characters “#” and “.” represent black and white, respectively. It is guaranteed, that the string will not contain any other characters.

Output

Output a single integer, the minimum possible number of north magnets required.

If there is no placement of magnets that satisfies all conditions, print a single integer −1.

Examples

input

1
2
3
4
3 3
.#.
###
##.

output

1
1

input

1
2
3
4
5
4 2
##
.#
.#
##

output

1
-1

input

1
2
3
4
5
4 5
....#
####.
.###.
.#...

output

1
2

input

1
2
3
2 1
.
#

output

1
-1

input

1
2
3
4
3 5
.....
.....
.....

output

1
0

Note

In the first test, here is an example placement of magnets:

In the second test, we can show that no required placement of magnets exists. Here are three example placements that fail to meet the requirements. The first example violates rule 3 since we can move the north magnet down onto a white square. The second example violates rule 2 since we cannot move the north magnet to the bottom-left black square by any sequence of operations. The third example violates rule 1 since there is no south magnet in the first column.

In the third test, here is an example placement of magnets. We can show that there is no required placement of magnets with fewer north magnets.

In the fourth test, we can show that no required placement of magnets exists. Here are two example placements that fail to meet the requirements. The first example violates rule 1 since there is no south magnet in the first row. The second example violates rules 1 and 3 since there is no south magnet in the second row and we can move the north magnet up one unit onto a white square.

In the fifth test, we can put the south magnet in each cell and no north magnets. Because there are no black cells, it will be a correct placement.

题意

有S和N两种磁铁,可以互相吸引,每次可以选择一个S和一个N激活,使得N向S方向移动,问需要有多少N磁铁可以满足(不限制S磁铁的数量)

  • 每一行列都存在S磁铁
  • 所有黑色区域都能够被N磁铁到达
  • 所有白色区域都不可能被N磁铁到达

如果无法满足以上三条规则,那么就输出-1。

思路

首先排除所有不可能满足的情况:

  1. 一行/一列有两段黑色区域,如果可以到达必定有S磁铁吸引,那么两段中间的白色区域就是可达的。所以每一行列必定只有一段黑色区域或全白。
  2. 全白行但是没有全白列(行列对换相同),如果存在的话,由于每一行需要有S磁铁,而S磁铁放置的列是有黑色区域的,如果N磁铁到达该黑色区域,然后激活空白行处的S磁铁,会导致白色区域可达。

综上,放置磁铁的方式为将S磁铁放置在所有黑色块中和空白行列相交处。

N磁铁需要每一个连通区域放置一个,由于N磁铁不能跨区域移动,所以这种放置方式是最优的。

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#include <bits/stdc++.h>
using namespace std;

#define ll long long
#define maxn 1010

char grid[maxn][maxn];
bool unused_row;
bool unused_col;
bool vis[maxn][maxn];
int dir[4][2] = {{-1,0},{1,0},{0,1},{0,-1}};
int n,m;

void bfs(int i, int j){
queue<pair<int,int>> q;
vis[i][j] = true;
q.push(make_pair(i,j));
while(!q.empty()){
auto temp = q.front();
i = temp.first;
j = temp.second;
q.pop();
for(int k=0;k<4;++k){
int ii = i + dir[k][0];
int jj = j + dir[k][1];
if(ii>=0 && jj>=0 && ii<n &&jj<m && grid[ii][jj] == '#' && !vis[ii][jj]){
vis[ii][jj] = true;
q.push(make_pair(ii,jj));
}
}
}
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

cin>>n>>m;
for(int i=0;i<n;++i)
cin>>grid[i];

for(int i=0;i<n;++i){
int cnt = 0;
if(grid[i][0] == '#')
cnt = 1;
for(int j=1;j<m;++j)
if(grid[i][j] == '#' && grid[i][j-1] == '.')
++cnt;
if(cnt > 1){
cout<<"-1"<<endl;
return 0;
}
if(cnt == 0){
unused_row = true;
}
}

for(int j=0;j<m;++j){
int cnt = 0;
if(grid[0][j] == '#')
cnt = 1;
for(int i=1;i<n;++i)
if(grid[i][j] == '#' && grid[i-1][j] == '.')
++cnt;
if(cnt > 1){
cout<<"-1"<<endl;
return 0;
}
if(cnt == 0)
unused_col = true;
}

if(unused_row ^ unused_col){
cout<<"-1"<<endl;
return 0;
}

int cnt = 0;
for(int i=0;i<n;++i){
for(int j=0;j<m;++j){
if(grid[i][j] == '#' && !vis[i][j]){
++cnt;
bfs(i,j);
}
}
}
cout<<cnt<<endl;
}