0%

原文

Ashish has a tree consisting of nn nodes numbered 11 to nn rooted at node 11. The ii-th node in the tree has a cost aia_i, and binary digit bib_i is written in it. He wants to have binary digit cic_i written in the ii-th node in the end.

To achieve this, he can perform the following operation any number of times:

  • Select any kk nodes from the subtree of any node uu, and shuffle the digits in these nodes as he wishes, incurring a cost of kauk\cdot a_u. Here, he can choose kk ranging from 11 to the size of the subtree of uu.

He wants to perform the operations in such a way that every node finally has the digit corresponding to its target.

Help him find the minimum total cost he needs to spend so that after all the operations, every node uu has digit cuc_u written in it, or determine that it is impossible.

Input

First line contains a single integer nn (1n2105)(1\le n\le 2\cdot 10^5) denoting the number of nodes in the tree.

ii-th line of the next nn lines contains 3 space-separated integers aia_i, bib_i, cic_i (1ai109,0bi,ci1)(1\le a_i\le 10^9,0\le b_i,c_i\le1) — the cost of the ii-th node, its initial digit and its goal digit.

Each of the next n−1 lines contain two integers uu, vv (1u,vn,uv)(1\le u,v\le n, u\ne v), meaning that there is an edge between nodes uu and vv in the tree.

Output

Print the minimum total cost to make every node reach its target digit, and −1 if it is impossible.

Examples

input

1
2
3
4
5
6
7
8
9
10
5
1 0 1
20 1 0
300 0 1
4000 0 0
50000 1 0
1 2
2 3
2 4
1 5

output

1
4

input

1
2
3
4
5
6
7
8
9
10
5
10000 0 1
2000 1 0
300 0 1
40 0 0
1 1 0
1 2
2 3
2 4
1 5

output

1
24000

input

1
2
3
4
2
109 0 1
205 0 1
1 2

output

1
-1

Note

The tree corresponding to samples 11 and 22 are:

In sample 11, we can choose node 11 and k=4k=4 for a cost of 41=44\cdot1 = 4 and select nodes 1,2,3,51,2,3,5, shuffle their digits and get the desired digits in every node.

In sample 22, we can choose node 11 and k=2k=2 for a cost of 10000210000\cdot2, select nodes 1,51,5 and exchange their digits, and similarly, choose node 22 and k=2k=2 for a cost of 200022000\cdot2, select nodes 2,32,3 and exchange their digits to get the desired digits in every node.

In sample 33, it is impossible to get the desired digits, because there is no node with digit 11 initially.

题意

有一颗树,树根为节点1,每个节点有三个属性a,b,ca,b,cbb代表当前节点的状态,cc代表当前节点需要变成的状态,aa代表修改节点状态所需要的花费。

现在能够进行一种操作:

选择一颗以uu为根节点的子树,任意的修改其子树中的kk个节点的状态(互相交换),这个操作的花费为kauk \cdot a_u

现在需要求解使树中所有节点都能变成目标状态所需要的最小开销,如果不存在则进行判定。

思路

首先可以知道,如果一个节点ii的父亲节点为pp,且有a[i]>a[p]a[i]>a[p],那么所有以ii节点为根的子树上涉及到的操作都可以移动到其父节点上进行,可以对整棵树进行a[i]=min(a[i],a[p])a[i] = \min(a[i],a[p])的操作而不改变本身的结果。

完成以上操作之后,根节点到叶子结点的值实际上就是一个单调递减的序列,可以从叶子结点往根节点进行遍历,一旦当前子树上存在可以操作的成对节点就进行操作。

时间复杂度为遍历树的时间复杂度O(N)O(N)

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <bits/stdc++.h>
using namespace std;

#define maxn 200010
#define ll long long

int n;
ll a[maxn], b[maxn], c[maxn];
vector<int> g[maxn];
ll ans;

pair<ll, ll> solve(int u, int pa=0){
a[u] = min(a[u], a[pa]);
ll cnt0 = 0;
ll cnt1 = 0;
if(b[u] == 0 && c[u] == 1){
++cnt0;
}
else if(b[u] == 1 && c[u] == 0){
++cnt1;
}

for(auto v: g[u]){
if(v != pa){
auto tmp = solve(v, u);
cnt0 += tmp.first;
cnt1 += tmp.second;
}
}

ll change = min(cnt0, cnt1);
ans += change * a[u] * 2;
return make_pair(cnt0-change, cnt1-change);
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

cin>>n;
for(int i=1;i<=n;++i){
cin>>a[i]>>b[i]>>c[i];
}

for(int i=1;i<n;++i){
int u,v;
cin>>u>>v;
g[v].push_back(u);
g[u].push_back(v);
}

a[0] = a[1];

auto res = solve(1);

if(res.first != 0 || res.second != 0){
cout<<-1<<endl;
}
else{
cout<<ans<<endl;
}
}

Exercise 1

当是ENV_TYPE_FS的时候提供读写的权限,对于普通的Env则不提供权限。

1
2
3
4
// If this is the file server (type == ENV_TYPE_FS) give it I/O privileges.
// LAB 5: Your code here.
if(type == ENV_TYPE_FS)
e->env_tf.tf_eflags |= FL_IOPL_MASK;

Exercise 2

为block分配一个页面,并且将block的内容读入。首先将addr进行对齐,这里由于page的大小和block的大小相同,所以怎样对齐都是可以的。之后利用sys_page_alloc()系统调用来进行页面分配,然后利用ide_read()进行内容的读取。注意的是ide_read()是以sector为单位进行读取的,是512个byte,同block的大小是8倍的关系,所以需要进行一个小小的处理。

1
2
3
4
5
6
7
8
9
10
11
// Allocate a page in the disk map region, read the contents
// of the block from the disk into that page.
// Hint: first round addr to page boundary. fs/ide.c has code to read
// the disk.
//
// LAB 5: you code here:
addr = ROUNDDOWN(addr, BLKSIZE);
if((r = sys_page_alloc(0, addr, PTE_U | PTE_P | PTE_W)) < 0)
panic("in bc_pgfault, sys_page_alloc: %e", r);
if((r = ide_read(blockno << 3, addr, 8)) < 0)
panic("in_bc_pgfault, ide_read: %e", r);

当必要的时候,需要将Block写回磁盘,而必要的时候就是可写页面被修改的时候,所以这里首先的判断条件就应当是当前所对应的block是被映射了的,并且是dirty的。在调用ide_write()写回之后,需要重新进行一次映射来消除掉dirty位。这样可以避免多次进行flush,访问磁盘占用大量的时间。

1
2
3
4
5
6
7
8
9
// LAB 5: Your code here.
int r;
addr = ROUNDDOWN(addr, BLKSIZE);
if(va_is_mapped(addr) && va_is_dirty(addr)){
if((r = ide_write(blockno << 3, addr, 8)) < 0)
panic("in flush_block, ide_write: %e", r);
if((r = sys_page_map(0, addr, 0, addr, uvpt[PGNUM(addr)] & PTE_SYSCALL)) < 0)
panic("in flush block, sys_page_map: %e", r);
}

Exercise 3

首先观察free_block()函数,可以发现每一位标志着一个block的状态,其中1表示为空闲,0表示为忙碌。访问的方式如第八行所示。

1
2
3
4
5
6
7
8
9
// Mark a block free in the bitmap
void
free_block(uint32_t blockno)
{
// Blockno zero is the null pointer of block numbers.
if (blockno == 0)
panic("attempt to free zero block");
bitmap[blockno/32] |= 1<<(blockno%32);
}

那么仿照就可以完成alloc_block(),对所有的block进行一个遍历,如果发现free的就进行分配,在bitmap中标志为0,之后立即利用flush_block()将将修改写回磁盘,并且返回blockno。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Search the bitmap for a free block and allocate it.  When you
// allocate a block, immediately flush the changed bitmap block
// to disk.
//
// Return block number allocated on success,
// -E_NO_DISK if we are out of blocks.
//
// Hint: use free_block as an example for manipulating the bitmap.
int
alloc_block(void)
{
// The bitmap consists of one or more blocks. A single bitmap block
// contains the in-use bits for BLKBITSIZE blocks. There are
// super->s_nblocks blocks in the disk altogether.

// LAB 5: Your code here.
int blockno;
for(blockno = 1; blockno < super->s_nblocks; blockno++){
if(block_is_free(blockno)){
bitmap[blockno/32] &= ~(1<<(blockno%32));
flush_block(&bitmap[blockno/32]);
return blockno;
}
}
return -E_NO_DISK;
}

Exercise 4

这里的寻址实际上类似于lab2当中虚拟内存的寻址,不过在这里的设计上只有一个indirect的链接,所以实现起来相对简单很多。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Find the disk block number slot for the 'filebno'th block in file 'f'.
// Set '*ppdiskbno' to point to that slot.
// The slot will be one of the f->f_direct[] entries,
// or an entry in the indirect block.
// When 'alloc' is set, this function will allocate an indirect block
// if necessary.
//
// Returns:
// 0 on success (but note that *ppdiskbno might equal 0).
// -E_NOT_FOUND if the function needed to allocate an indirect block, but
// alloc was 0.
// -E_NO_DISK if there's no space on the disk for an indirect block.
// -E_INVAL if filebno is out of range (it's >= NDIRECT + NINDIRECT).
//
// Analogy: This is like pgdir_walk for files.
// Hint: Don't forget to clear any block you allocate.
static int
file_block_walk(struct File *f, uint32_t filebno, uint32_t **ppdiskbno, bool alloc)
{
// LAB 5: Your code here.
int r;

if(filebno >= NDIRECT + NINDIRECT)
return -E_INVAL;

if(filebno < NDIRECT){
*ppdiskbno = (f->f_direct) + filebno;
}
else{
if(alloc && (f->f_indirect == 0)){
if((r = alloc_block()) < 0)
return r;
memset(diskaddr(r), 0, BLKSIZE);
f->f_indirect = r;
}
else if(f->f_indirect == 0){
return -E_NOT_FOUND;
}
*ppdiskbno = ((uint32_t *)diskaddr(f->f_indirect)) + filebno - NDIRECT;
}
return 0;
}

通过file_block_walk()找到对应的pdiskbno,如果为0,那么就进行分配,否则的话利用diskaddr()转换成地址。成功就返回0。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Set *blk to the address in memory where the filebno'th
// block of file 'f' would be mapped.
//
// Returns 0 on success, < 0 on error. Errors are:
// -E_NO_DISK if a block needed to be allocated but the disk is full.
// -E_INVAL if filebno is out of range.
//
// Hint: Use file_block_walk and alloc_block.
int
file_get_block(struct File *f, uint32_t filebno, char **blk)
{
// LAB 5: Your code here.
uint32_t * pdiskbno;
int r;
if((r = file_block_walk(f, filebno, &pdiskbno, true)) < 0)
return r;
if(*pdiskbno == 0){
if((r = alloc_block()) < 0)
return r;
*pdiskbno = r;
}
*blk = (char *)diskaddr(*pdiskbno);
flush_block(*blk);
return 0;
}

Exercise 5

实际上就是针对于file_read()的一层封装,首先利用openfile_lookup()找到对应的文件,然后调用file_read()进行读入,之后调整文件的指针,并返回读入的大小。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Read at most ipc->read.req_n bytes from the current seek position
// in ipc->read.req_fileid. Return the bytes read from the file to
// the caller in ipc->readRet, then update the seek position. Returns
// the number of bytes successfully read, or < 0 on error.
int
serve_read(envid_t envid, union Fsipc *ipc)
{
struct Fsreq_read *req = &ipc->read;
struct Fsret_read *ret = &ipc->readRet;

if (debug)
cprintf("serve_read %08x %08x %08x\n", envid, req->req_fileid, req->req_n);

// Lab 5: Your code here:
struct OpenFile * o;
int r;
if((r = openfile_lookup(envid, req->req_fileid, &o)) < 0)
return r;
if((r = file_read(o->o_file, ret->ret_buf, req->req_n, o->o_fd->fd_offset)) < 0)
return r;
o->o_fd->fd_offset += r;
return r;
}

Exercise 6

serve_write()的逻辑类似于serve_read(),具体代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Write req->req_n bytes from req->req_buf to req_fileid, starting at
// the current seek position, and update the seek position
// accordingly. Extend the file if necessary. Returns the number of
// bytes written, or < 0 on error.
int
serve_write(envid_t envid, struct Fsreq_write *req)
{
if (debug)
cprintf("serve_write %08x %08x %08x\n", envid, req->req_fileid, req->req_n);

// LAB 5: Your code here.
struct OpenFile *o;
int r;
if((r = openfile_lookup(envid, req->req_fileid, &o)) < 0)
return r;
if((r = file_write(o->o_file, req->req_buf, req->req_n, o->o_fd->fd_offset)) < 0)
return r;
o->o_fd->fd_offset += r;
return r;
}

devfile_write(0)同样可以仿照devfile_read()进行完成。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Write at most 'n' bytes from 'buf' to 'fd' at the current seek position.
//
// Returns:
// The number of bytes successfully written.
// < 0 on error.
static ssize_t
devfile_write(struct Fd *fd, const void *buf, size_t n)
{
// Make an FSREQ_WRITE request to the file system server. Be
// careful: fsipcbuf.write.req_buf is only so large, but
// remember that write is always allowed to write *fewer*
// bytes than requested.
// LAB 5: Your code here
int r;
fsipcbuf.write.req_fileid = fd->fd_file.id;
fsipcbuf.write.req_n = n;
memmove(fsipcbuf.write.req_buf, buf, n);
if((r = fsipc(FSREQ_WRITE, NULL)) < 0)
return r;
assert(r <= n);
assert(r <= PGSIZE);
return r;
}

Exercise 7

首先判断得到env,然后从18~20行依次为,开启中断,设置IOPL为0,将保护权限设置为3(CPL 3)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Set envid's trap frame to 'tf'.
// tf is modified to make sure that user environments always run at code
// protection level 3 (CPL 3), interrupts enabled, and IOPL of 0.
//
// Returns 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if environment envid doesn't currently exist,
// or the caller doesn't have permission to change envid.
static int
sys_env_set_trapframe(envid_t envid, struct Trapframe *tf)
{
// LAB 5: Your code here.
// Remember to check whether the user has supplied us with a good
// address!
struct Env* env;
if(envid2env(envid, &env, 1))
return -E_BAD_ENV;
env->env_tf = *tf;
env->env_tf.tf_eflags |= FL_IF;
env->env_tf.tf_eflags &= ~FL_IOPL_MASK;
env->env_tf.tf_cs |= 3;
return 0;
panic("sys_env_set_trapframe not implemented");
}

Exercise 8

关于duppage()的修改,可以注意到直接只需要在可写页面修改为COW的时候进行PTE_SHARE的标志位判断就可以了,只在第11行处添加一个判断,其余内容同lab4中完全相同。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
static int
duppage(envid_t envid, unsigned pn)
{
int r;

// LAB 4: Your code here.
pte_t pte = uvpt[pn];
void * addr = (void *)(pn * PGSIZE);

uint32_t perm = pte & PTE_SYSCALL;
if(perm & (PTE_W | PTE_COW) && !(perm & PTE_SHARE)){
perm &= ~PTE_W;
perm |= PTE_COW;
}
if((r = sys_page_map(0, addr, envid, addr, perm & PTE_SYSCALL))<0)
panic("duppage: %e", r);
if((r = sys_page_map(0, addr, 0, addr, perm & PTE_SYSCALL))<0)
panic("duppage: %e", r);
return 0;
}

从UTEXT到UXSTACKTOP的所有共享页面将其映射复制到子进程的地址空间即可,代码类似于lab4里面的fork()

1
2
3
4
5
6
7
8
9
10
11
// Copy the mappings for shared pages into the child address space.
static int
copy_shared_pages(envid_t child)
{
// LAB 5: Your code here.
uint8_t* addr;
for(addr = (uint8_t *)UTEXT; addr <(uint8_t *)UXSTACKTOP; addr += PGSIZE)
if((uvpd[PDX(addr)] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_SHARE))
sys_page_map(0, (void *)addr, child, (void *)addr, (uvpt[PGNUM(addr)] & PTE_SYSCALL));
return 0;
}

Exercise 9

添加对应的分发入口即可:

1
2
3
4
5
6
7
8
9
10
11
// Handle keyboard and serial interrupts.
// LAB 5: Your code here.
if(tf->tf_trapno == IRQ_OFFSET + IRQ_KBD){
kbd_intr();
return;
}

if(tf->tf_trapno == IRQ_OFFSET + IRQ_SERIAL){
serial_intr();
return;
}

Exercise 10

重定向输出流的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
case '>':	// Output redirection
// Grab the filename from the argument list
if (gettoken(0, &t) != 'w') {
cprintf("syntax error: > not followed by word\n");
exit();
}
if ((fd = open(t, O_WRONLY|O_CREAT|O_TRUNC)) < 0) {
cprintf("open %s for write: %e", t, fd);
exit();
}
if (fd != 1) {
dup(fd, 1);
close(fd);
}
break;

仿照就可以完成输入流的重定向:

1
2
3
4
5
6
7
8
9
10
// LAB 5: Your code here.
if((fd = open(t, O_RDONLY)) < 0){
cprintf("open %s for read: %e", t, fd);
exit();
}
if(fd != 0){
dup(fd, 0);
close(fd);
}
break;

到这里可以达到150/150的满分:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
internal FS tests [fs/test.c]: OK (1.0s)
fs i/o: OK
check_bc: OK
check_super: OK
check_bitmap: OK
alloc_block: OK
file_open: OK
file_get_block: OK
file_flush/file_truncate/file rewrite: OK
testfile: OK (1.6s)
serve_open/file_stat/file_close: OK
file_read: OK
file_write: OK
file_read after file_write: OK
open: OK
large file: OK
spawn via spawnhello: OK (1.8s)
Protection I/O space: OK (1.6s)
PTE_SHARE [testpteshare]: OK (0.9s)
PTE_SHARE [testfdsharing]: OK (1.4s)
start the shell [icode]: Timeout! OK (31.5s)
testshell: OK (2.4s)
(Old jos.out.testshell failure log removed)
primespipe: OK (7.2s)
Score: 150/150

Challenge 1

Challenge的要求即为清空掉所有的没有被访问的页面。那么对于单个页面,只需要调用flush_block(),之后通过系统调用unmap就可以了。evict_policy()即对于所有的block做一个便利,清除所有从未被访问过的页面。具体代码内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// challenge
void
evict_block(void *addr){
uint32_t blockno = ((uint32_t)addr - DISKMAP) / BLKSIZE;
if(addr < (void*)DISKMAP || addr >= (void*)(DISKMAP + DISKSIZE))
panic("evict_block of bad va %08x", addr);

int r;
addr = ROUNDDOWN(addr, BLKSIZE);
flush_block(addr);
if((r = sys_page_unmap(0, addr)) < 0)
panic("in evict block, sys_page_unmap: %e", r);
}

void
evict_policy(){
uint32_t blockno;
for(blockno = 3; blockno < DISKSIZE / BLKSIZE; ++blockno){
if(!(uvpt[PGNUM(diskaddr(blockno))]&PTE_A)){
evict_block(diskaddr(blockno));
}
}
}

往常的论文对于训练过程中的方法使用并不是非常关注,通常只能从代码中去寻找训练过程所采用的trick。

这篇文章介绍了一些训练图像分类CNN网络的trick,使用这些方法可以显著的提高分类网络的准确性。同时更好的准确性在迁移到object detection和semantic segmentation等任务上也会有更好的效果。

image-20200509133659544

Training Procedures

利用随机梯度下降的方法来训练神经网络的基本流程如下所示,首先先对于baseline情况的超参数进行一个假定,之后再讨论优化的方法:

Efficient Training

使用更低的精度和更大的batch size可以使得训练更加高效,这里提供了一系列这样的方法,有的可以同时提升准确率和训练速度。

Large-batch training

使用更大的batch size可能会减慢训练进程。对于凸优化问题,随着batch size的上升,收敛速率会下降,在神经网络当中也有相同的情况。在同样数目的epoch情况下,用大的batch训练的模型在验证集上的效果会更差。

下面有四个启发式的方法用来解决上面提到的问题:

Linear scaling learning rate

理论上来说,从估计梯度的角度,提升batch size并不会使得得到梯度的大小期望改变,而是使得它的方差更小。所以可以对于学习率进行成比率的缩放,能够使得效果提升。

比如在当batch size为256的时候,选用0.1的学习率。

那么当采用一个更大的batch size,例如bb的时候,就将学习率设置为0.1×b/2560.1\times b/256

Learning rate warmup

在训练开始的时候,所有参数都是随机设置的,离最终结果可能会非常远,所以采用一个较大的学习率会导致数值上的不稳定,所以可以考虑在最开始使用一个很小的学习率,再慢慢地在训练较为稳定的时候切换到初始学习率。

例如,如果我们采用前mm个batch来进行warm up的话,并且学习率的初值为η\eta,那么在第ii个batch,1im1\le i \le m,就选用大小为iη/mi\eta/m的学习率。

Zero γ\gamma

ResNet是有很多个residual block堆叠出来的,每个residual block的结果可以看成是x+block(x)x+block(x),而block的最后一层通常是BN层,他会首先进行一个standardize,得到x^\hat{x},然后在进行一个缩放γx^+β\gamma \hat{x}+\beta,其中γ\gammaβ\beta都是可学习的参数,通常被初始化为1和0。

可以考虑在初始化的时候,将所有的参数γ\gamma都设置成0,这样网络的输出就和他的输入相同,这样可以缩小网络,使得在最开始的时候训练变得更简单。

No bias decay

weight decay通常被用于所有可学习的参数来防止正则化,这里的trick就是只作用于权重,而不对bias包括BN层当中的γ\gammaβ\beta做decay。

Low-precision training

通常是采用FP32来进行训练的,但是大量的TPU在FP16上面的效率会高很多,所以可以考虑采用FP16来进行训练,能够达到几倍的加速。

实验结果如上,其中Baseline的设置为BS=256BS=256,采用FP32,Efficient的设置为BS=1024BS=1024,采用FP16。

Model Tweaks

原本的ResNet网络结构如图所示:

这里不多做介绍,文章提出了三种针对ResNet修改方式,分别记作ResNet-B/C/D,能够得到一定程度上的准确率提升。

ResNet-B

第一个所做的就是改变了downsampling的方式,由于如果采用1×11\times 1卷积同时选择stride为2的话,会丢失3/43/4的信息,这里在3×33\times 3的的卷积层来downsampling,理论上所有信息都会被接受到。

ResNet-C

一个发现就是卷积计算量随卷积核大小变化是二次多项式级别的,所以利用三层3×33\times 3的卷积来替代7×77\times 7的卷积层。

ResNet-D

从ResNet-B的角度更进一步,旁路也同样会有信息丢失的问题,所以将一层的1×11 \times 1卷积修改成一层Average Pooling再加上一层卷积,减少信息的丢失。

Experiment Result

可以看到三者对于原始的ResNet结构准确率都有一定程度上的上升。

Training Refinements

Cosine Learning Rate Decay

采用最广泛的策略是指数衰减。这里提供的一种方法是利用cosine函数进行衰减,忽略掉最开始进行warmup的部分,假设一共有TT个batch,那么在第tt个batch上,学习率为:

ηt=12(1+cos(tπT))η\eta_t = \frac{1}{2}(1+\cos(\frac{t\pi}{T}))\eta

其中η\eta就是初始的学习率,对比结果如下:

image-20200509165038277

可以看到cosine decay在一开始和最终学习率下降的比较慢,在中间接近于线性的衰减。

Label Smoothing

对于分类任务,最后一层通常都是通过一个softmax函数来得到预测的概率,对于第ii类的结果qiq_i

qi=exp(zi)j=1Kexp(zj)q_i = \frac{\exp(z_i)}{\sum_{j=1}^K \exp(z_j)}

我们在学习过程当中最小化的是负的cross entropy:

l(p,q)=i=1Kpilogqil(p,q) = -\sum_{i=1}^K p_i \log q_i

pip_i只有当i=yi=y的时候为1,其余时候为0,于是就等价于:

l(p,q)=log(qy)=zy+log(i=1Kexp(zi))l(p,q) = -\log(q_y) = -z_y + \log (\sum_{i=1}^K \exp(z_i))

要达到最小化的效果应当有zy=infz_y^\star=\inf,同时使得其他种类的ziz_i尽可能小,换言之,这样会倾向于让预测结果趋于极端值,最终导致overfit。

label smoothing的思想就是将真实概率构造为:

pi={1ϵi=yϵ/(K1)otherwisep_i = \begin{cases}1-\epsilon &i=y\\\epsilon / (K-1) & \text{otherwise}\end{cases}

其中ϵ\epsilon是一个小的常量,此时的最优解为:

zi={log((K1)(1ϵ)/ϵ)+αi=yαotherwisez_i^\star = \begin{cases}\log((K-1)(1-\epsilon)/\epsilon)+\alpha & i=y\\\alpha & \text{otherwise}\end{cases}

其中α\alpha是一个实数,这会使得最终全连接层的输出会趋向于一个有限值,能够更好地泛化。

里面log((K1)(1ϵ)/ϵ)\log((K-1)(1-\epsilon)/\epsilon)的部分就是正确分类和错误分类之间的gap,它随着ϵ\epsilon的增大不断缩小,当ϵ=(K1)/K\epsilon = (K-1)/K的时候,实际上就是均匀分布,gap就为0了。

关于gap的可视化如下:

image-20200510131500469

可以发现smooth之后gap总体均值变小,并且大gap的极端值情况有效减少了。

Knowledge Distillation

知识蒸馏中,我们尝试用一个学生模型来学习高准确率的老师模型里面的知识。方法就是利用一个distillation loss来惩罚学生模型偏离老师模型的预测概率。利用pp作为真实概率,zzrr分别作为学生网络和老师网络全连接层的输出,利用负的cross entropy作为loss,总的loss可以写成:

l(p,softmax(z))+T2l(softmax(r/T),softmax(z/T))l(p,\text{softmax}(z)) + T^2l(\text{softmax}(r/T),\text{softmax}(z/T))

其中TT为温度超参数,使得softmax函数的输出变得更为平滑。

Mixup Training

作为一个数据增强的手段,每次随机选择两个样本(xi,yi)(x_i,y_i)(xj,yj)(x_j,y_j)。然后对两个样本加权来作为新的样本:

x^=λxi+(1λ)xjy^=λyi+(1λ)yj\begin{aligned} \hat{x} &= \lambda x_i + (1-\lambda)x_j \\ \hat{y} &= \lambda y_i + (1-\lambda)y_j \end{aligned}

其中λ[0,1]\lambda \in [0,1]是一个从Beta(α,α)\text{Beta}(\alpha,\alpha)中采样出来的数,在mixup中只对于新的样本(x^,y^)(\hat{x},\hat{y})来进行训练。

原文

A monopole magnet is a magnet that only has one pole, either north or south. They don’t actually exist since real magnets have two poles, but this is a programming contest problem, so we don’t care.

There is an n×mn\times m grid. Initially, you may place some north magnets and some south magnets into the cells. You are allowed to place as many magnets as you like, even multiple in the same cell.

An operation is performed as follows. Choose a north magnet and a south magnet to activate. If they are in the same row or the same column and they occupy different cells, then the north magnet moves one unit closer to the south magnet. Otherwise, if they occupy the same cell or do not share a row or column, then nothing changes. Note that the south magnets are immovable.

Each cell of the grid is colored black or white. Let’s consider ways to place magnets in the cells so that the following conditions are met.

  1. There is at least one south magnet in every row and every column.
  2. If a cell is colored black, then it is possible for a north magnet to occupy this cell after some sequence of operations from the initial placement.
  3. If a cell is colored white, then it is impossible for a north magnet to occupy this cell after some sequence of operations from the initial placement.

Determine if it is possible to place magnets such that these conditions are met. If it is possible, find the minimum number of north magnets required (there are no requirements on the number of south magnets).

Input

The first line contains two integers nn and mm (1n,m10001\le n,m\le 1000) — the number of rows and the number of columns, respectively.

The next nn lines describe the coloring. The ii-th of these lines contains a string of length mm, where the jj-th character denotes the color of the cell in row ii and column jj. The characters “#” and “.” represent black and white, respectively. It is guaranteed, that the string will not contain any other characters.

Output

Output a single integer, the minimum possible number of north magnets required.

If there is no placement of magnets that satisfies all conditions, print a single integer −1.

Examples

input

1
2
3
4
3 3
.#.
###
##.

output

1
1

input

1
2
3
4
5
4 2
##
.#
.#
##

output

1
-1

input

1
2
3
4
5
4 5
....#
####.
.###.
.#...

output

1
2

input

1
2
3
2 1
.
#

output

1
-1

input

1
2
3
4
3 5
.....
.....
.....

output

1
0

Note

In the first test, here is an example placement of magnets:

In the second test, we can show that no required placement of magnets exists. Here are three example placements that fail to meet the requirements. The first example violates rule 3 since we can move the north magnet down onto a white square. The second example violates rule 2 since we cannot move the north magnet to the bottom-left black square by any sequence of operations. The third example violates rule 1 since there is no south magnet in the first column.

In the third test, here is an example placement of magnets. We can show that there is no required placement of magnets with fewer north magnets.

In the fourth test, we can show that no required placement of magnets exists. Here are two example placements that fail to meet the requirements. The first example violates rule 1 since there is no south magnet in the first row. The second example violates rules 1 and 3 since there is no south magnet in the second row and we can move the north magnet up one unit onto a white square.

In the fifth test, we can put the south magnet in each cell and no north magnets. Because there are no black cells, it will be a correct placement.

题意

有S和N两种磁铁,可以互相吸引,每次可以选择一个S和一个N激活,使得N向S方向移动,问需要有多少N磁铁可以满足(不限制S磁铁的数量)

  • 每一行列都存在S磁铁
  • 所有黑色区域都能够被N磁铁到达
  • 所有白色区域都不可能被N磁铁到达

如果无法满足以上三条规则,那么就输出-1。

思路

首先排除所有不可能满足的情况:

  1. 一行/一列有两段黑色区域,如果可以到达必定有S磁铁吸引,那么两段中间的白色区域就是可达的。所以每一行列必定只有一段黑色区域或全白。
  2. 全白行但是没有全白列(行列对换相同),如果存在的话,由于每一行需要有S磁铁,而S磁铁放置的列是有黑色区域的,如果N磁铁到达该黑色区域,然后激活空白行处的S磁铁,会导致白色区域可达。

综上,放置磁铁的方式为将S磁铁放置在所有黑色块中和空白行列相交处。

N磁铁需要每一个连通区域放置一个,由于N磁铁不能跨区域移动,所以这种放置方式是最优的。

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#include <bits/stdc++.h>
using namespace std;

#define ll long long
#define maxn 1010

char grid[maxn][maxn];
bool unused_row;
bool unused_col;
bool vis[maxn][maxn];
int dir[4][2] = {{-1,0},{1,0},{0,1},{0,-1}};
int n,m;

void bfs(int i, int j){
queue<pair<int,int>> q;
vis[i][j] = true;
q.push(make_pair(i,j));
while(!q.empty()){
auto temp = q.front();
i = temp.first;
j = temp.second;
q.pop();
for(int k=0;k<4;++k){
int ii = i + dir[k][0];
int jj = j + dir[k][1];
if(ii>=0 && jj>=0 && ii<n &&jj<m && grid[ii][jj] == '#' && !vis[ii][jj]){
vis[ii][jj] = true;
q.push(make_pair(ii,jj));
}
}
}
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

cin>>n>>m;
for(int i=0;i<n;++i)
cin>>grid[i];

for(int i=0;i<n;++i){
int cnt = 0;
if(grid[i][0] == '#')
cnt = 1;
for(int j=1;j<m;++j)
if(grid[i][j] == '#' && grid[i][j-1] == '.')
++cnt;
if(cnt > 1){
cout<<"-1"<<endl;
return 0;
}
if(cnt == 0){
unused_row = true;
}
}

for(int j=0;j<m;++j){
int cnt = 0;
if(grid[0][j] == '#')
cnt = 1;
for(int i=1;i<n;++i)
if(grid[i][j] == '#' && grid[i-1][j] == '.')
++cnt;
if(cnt > 1){
cout<<"-1"<<endl;
return 0;
}
if(cnt == 0)
unused_col = true;
}

if(unused_row ^ unused_col){
cout<<"-1"<<endl;
return 0;
}

int cnt = 0;
for(int i=0;i<n;++i){
for(int j=0;j<m;++j){
if(grid[i][j] == '#' && !vis[i][j]){
++cnt;
bfs(i,j);
}
}
}
cout<<cnt<<endl;
}

原文

Calculate the number of ways to place nn rooks on n×nn\times n chessboard so that both following conditions are met:

  • each empty cell is under attack;
  • exactly kk pairs of rooks attack each other.

An empty cell is under attack if there is at least one rook in the same row or at least one rook in the same column. Two rooks attack each other if they share the same row or column, and there are no other rooks between them. For example, there are only two pairs of rooks that attack each other in the following picture:

One of the ways to place the rooks for n=3n=3 and k=2k=2

Two ways to place the rooks are considered different if there exists at least one cell which is empty in one of the ways but contains a rook in another way.

The answer might be large, so print it modulo 998244353.

Input

The only line of the input contains two integers nn and kk (1n2000001\le n\le 200000; 0kn(n1)20\le k\le\frac{n(n-1)}{2}).

Output

Print one integer — the number of ways to place the rooks, taken modulo 998244353.

Examples

input

1
3 2

output

1
6

input

1
3 3

output

1
0

input

1
4 0

output

1
24

input

1
1337 42

output

1
807905441

题意

rook可以对同一行和同一列攻击,提供两个数nnkk,求在n×nn\times n的棋盘上放置nn个rook使得:

  • 所有棋盘格都可以被攻击到
  • kk对rook可以互相供给

的放置方法总数。

思路

由于所有地方都需要被攻击到,那么就相当于一定是每个行都有一个rook或者每个列都有一个rook.本质上是一个排列组合问题,可以发现,行和列实际上是对称的,所以只需要考虑行列当中的一种。

假设每个行上都有一个rook,如果k0k\ne0,那么必然是nn个rook分布在nkn-k个列上。对应的排列方法一共有:

(nk)nCnknk1(nk1)n+Cnknk2(nk2)n=i=0nk(1)iCnki(nki)n(n-k)^n - C_{n-k}^{n-k-1}(n-k-1)^n+C_{n-k}^{n-k-2}(n-k-2)^n=\sum_{i=0}^{n-k}(-1)^i C_{n-k}^{i}(n-k-i)^n

从上面的分析也可以发现,如果knk\ge n,那么不存在符合的情况。再注意当k0k\ne0的时候需要,需要乘2,而k=0k=0的时候直接作为结果就可以了。

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#include <bits/stdc++.h>
using namespace std;

#define ll long long
#define MOD 998244353

#define add(x,y) ((x+y)%MOD)
#define mul(x,y) ((x*y)%MOD)
#define sub(x,y) ((MOD+x-y)%MOD)

ll fastpow(ll x, ll y){
ll temp = 1;
while(y){
if(y&1)
temp = mul(temp, x);
x = mul(x, x);
y>>=1;
}
return temp;
}

ll inv(ll x){
return fastpow(x,MOD-2);
}

ll frac[200100];

ll C(ll n, ll i){
return mul(frac[n], inv(mul(frac[i], frac[n-i])));
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

int n,k;
cin>>n>>k;

if(k>=n){
cout<<0<<endl;
return 0;
}

ll ans = 0;
frac[0] = 1;
for(int i=1;i<=n;++i){
frac[i] = mul(i, frac[i-1]);
}
for(int i=0;i<=n-k;++i){
ll temp = mul(C(n-k,i), fastpow(n-k-i,n));
if(i&1)
ans = sub(ans, temp);
else
ans = add(ans,temp);
}

ans = mul(ans, C(n, n-k));
if(k)
ans = mul(ans, 2);
cout<<ans<<endl;
}

Exercise 1

这个建立映射的函数实际上和kern/pmap.c当中的boot_alloc()存在类似的地方,都是利用一个静态的变量来保存当前分配空间的起始地方,然后不断的增长进行分配。由于base每次都会增长,所以每次都是映射新的页面。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
//
// Reserve size bytes in the MMIO region and map [pa,pa+size) at this
// location. Return the base of the reserved region. size does *not*
// have to be multiple of PGSIZE.
//
void *
mmio_map_region(physaddr_t pa, size_t size)
{
// Where to start the next region. Initially, this is the
// beginning of the MMIO region. Because this is static, its
// value will be preserved between calls to mmio_map_region
// (just like nextfree in boot_alloc).
static uintptr_t base = MMIOBASE;

uintptr_t result;

// Reserve size bytes of virtual memory starting at base and
// map physical pages [pa,pa+size) to virtual addresses
// [base,base+size). Since this is device memory and not
// regular DRAM, you'll have to tell the CPU that it isn't
// safe to cache access to this memory. Luckily, the page
// tables provide bits for this purpose; simply create the
// mapping with PTE_PCD|PTE_PWT (cache-disable and
// write-through) in addition to PTE_W. (If you're interested
// in more details on this, see section 10.5 of IA32 volume
// 3A.)
//
// Be sure to round size up to a multiple of PGSIZE and to
// handle if this reservation would overflow MMIOLIM (it's
// okay to simply panic if this happens).
//
// Hint: The staff solution uses boot_map_region.
//
// Your code here:
if(base + ROUNDUP(size, PGSIZE) >= MMIOLIM)
panic("mmio_map_region: out of memory\n");
boot_map_region(kern_pgdir, base, size, pa, PTE_PCD | PTE_PWT | PTE_W);
result = base;
base += ROUNDUP(size, PGSIZE);
return (void *)result;
}

Exercise 2

修改page_init()的内容如下,只是增加一个特殊处理:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// case 1:
pages[0].pp_ref = 1;
pages[0].pp_link = NULL;
// case 2, 3, 4:
for (i = 1; i < npages; i++) {
if((IOPHYSMEM <= i * PGSIZE && i * PGSIZE < pa_free_start) || page2pa(pages + i) == MPENTRY_PADDR)
{
pages[i].pp_ref = 1;
pages[i].pp_link = NULL;
}
else
{
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list;
page_free_list = &pages[i];
}
}

只要在第6行添加一个判断就可以了。

之后执行make qemu就可以看到如下的check都已经成功了。

1
2
3
check_page_free_list() succeeded!
check_page_alloc() succeeded!
check_page() succeeded!

Question 1

MPBOOTPHYS的作用是将高地址转换为低地址,使得可以在实模式下进行访问。在boot.S当中,本来就已经被链接到了低地址,不需要进行转换,而mpentry.S的代码都位于KERNBASE的上方,所以需要手动的利用MPBOOTPHYS宏进行转换。

Exercise 3

这里直接利用一个循环进行映射,由于每个栈之间是存在一个Gap的,所以只有[kstacktop_i - KSTKSIZE, kstacktop_i)部分需要进行映射。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Modify mappings in kern_pgdir to support SMP
// - Map the per-CPU stacks in the region [KSTACKTOP-PTSIZE, KSTACKTOP)
//
static void
mem_init_mp(void)
{
// Map per-CPU stacks starting at KSTACKTOP, for up to 'NCPU' CPUs.
//
// For CPU i, use the physical memory that 'percpu_kstacks[i]' refers
// to as its kernel stack. CPU i's kernel stack grows down from virtual
// address kstacktop_i = KSTACKTOP - i * (KSTKSIZE + KSTKGAP), and is
// divided into two pieces, just like the single stack you set up in
// mem_init:
// * [kstacktop_i - KSTKSIZE, kstacktop_i)
// -- backed by physical memory
// * [kstacktop_i - (KSTKSIZE + KSTKGAP), kstacktop_i - KSTKSIZE)
// -- not backed; so if the kernel overflows its stack,
// it will fault rather than overwrite another CPU's stack.
// Known as a "guard page".
// Permissions: kernel RW, user NONE
//
// LAB 4: Your code here:
uint32_t i, kstacktop_i;
for(i=0, kstacktop_i=KSTACKTOP;i < NCPU; ++i, kstacktop_i -= KSTKSIZE + KSTKGAP)
boot_map_region(kern_pgdir, kstacktop_i - KSTKSIZE, KSTKSIZE, PADDR(percpu_kstacks[i]), PTE_W);
}

Exercise 4

这里实际上就是用thiscpu来替换原本的全局变量,使得本来是在lab3当中对于单CPU适用的情况可以适用于多个CPU的情形。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// Initialize and load the per-CPU TSS and IDT
void
trap_init_percpu(void)
{
// The example code here sets up the Task State Segment (TSS) and
// the TSS descriptor for CPU 0. But it is incorrect if we are
// running on other CPUs because each CPU has its own kernel stack.
// Fix the code so that it works for all CPUs.
//
// Hints:
// - The macro "thiscpu" always refers to the current CPU's
// struct CpuInfo;
// - The ID of the current CPU is given by cpunum() or
// thiscpu->cpu_id;
// - Use "thiscpu->cpu_ts" as the TSS for the current CPU,
// rather than the global "ts" variable;
// - Use gdt[(GD_TSS0 >> 3) + i] for CPU i's TSS descriptor;
// - You mapped the per-CPU kernel stacks in mem_init_mp()
// - Initialize cpu_ts.ts_iomb to prevent unauthorized environments
// from doing IO (0 is not the correct value!)
//
// ltr sets a 'busy' flag in the TSS selector, so if you
// accidentally load the same TSS on more than one CPU, you'll
// get a triple fault. If you set up an individual CPU's TSS
// wrong, you may not get a fault until you try to return from
// user space on that CPU.
//
// LAB 4: Your code here:

// Setup a TSS so that we get the right stack
// when we trap to the kernel.
thiscpu->cpu_ts.ts_esp0 = KSTACKTOP - cpunum() * (KSTKSIZE + KSTKGAP);
thiscpu->cpu_ts.ts_ss0 = GD_KD;
thiscpu->cpu_ts.ts_iomb = sizeof(struct Taskstate);

// Initialize the TSS slot of the gdt.
gdt[(GD_TSS0 >> 3) + cpunum()] = SEG16(STS_T32A, (uint32_t) (&(thiscpu->cpu_ts)),
sizeof(struct Taskstate) - 1, 0);
gdt[(GD_TSS0 >> 3) + cpunum()].sd_s = 0;

// Load the TSS selector (like other segment selectors, the
// bottom three bits are special; we leave them 0)
ltr(GD_TSS0 + (cpunum() << 3));

// Load the IDT
lidt(&idt_pd);
}

Exercise 5

根据文档的描述在四个所需要插入大内核锁的地方进行lock_kernel()unlock_kernel()的操作。

  • In i386_init(), acquire the lock before the BSP wakes up the other CPUs.

    1
    2
    3
    4
    5
    6
    // Acquire the big kernel lock before waking up APs
    // Your code here:
    lock_kernel();

    // Starting non-boot CPUs
    boot_aps();
  • In mp_main(), acquire the lock after initializing the AP, and then call sched_yield() to start running environments on this AP.

    1
    2
    3
    4
    5
    6
    7
    // Now that we have finished some basic setup, call sched_yield()
    // to start running processes on this CPU. But make sure that
    // only one CPU can enter the scheduler at a time!
    //
    // Your code here:
    lock_kernel();
    sched_yield();
  • In trap(), acquire the lock when trapped from user mode. To determine whether a trap happened in user mode or in kernel mode, check the low bits of the tf_cs.

    1
    2
    3
    4
    5
    6
    // Trapped from user mode.
    // Acquire the big kernel lock before doing any
    // serious kernel work.
    // LAB 4: Your code here.
    lock_kernel();
    assert(curenv);
  • In env_run(), release the lock right before switching to user mode. Do not do that too early or too late, otherwise you will experience races or deadlocks.

    1
    2
    3
    lcr3(PADDR(e->env_pgdir));
    unlock_kernel();
    env_pop_tf(&(e->env_tf));

Question 2

从trapentry.S当中可以看到,在调用trap()之前(还没有获得大内核锁),这个时候就已经往内核栈中压入了寄存器信息,如果内核栈不分离的话,在这个时候切换就会造成错误。

Exercise 6

这里的调度方法实际上是一个非常暴力的轮询,如果找到了一个状态是ENV_RUNNABLE的进程那么就让他上CPU。如果找了一圈都没有找到合适的进程的话,就看起始进程,如果它本来就在当前CPU上运行的话,那么就继续运行,否则的话一个进程不能在两个CPU上同时运行,就调用sched_halt()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Choose a user environment to run and run it.
void
sched_yield(void)
{
struct Env *idle;

// Implement simple round-robin scheduling.
//
// Search through 'envs' for an ENV_RUNNABLE environment in
// circular fashion starting just after the env this CPU was
// last running. Switch to the first such environment found.
//
// If no envs are runnable, but the environment previously
// running on this CPU is still ENV_RUNNING, it's okay to
// choose that environment.
//
// Never choose an environment that's currently running on
// another CPU (env_status == ENV_RUNNING). If there are
// no runnable environments, simply drop through to the code
// below to halt the cpu.

// LAB 4: Your code here.
int start_i, i;
if(!curenv)
start_i = 0;
else
start_i = curenv->env_id;
for(i = 0; i < NENV; ++i)
if(envs[(start_i + i)%NENV].env_status == ENV_RUNNABLE)
env_run(&envs[(start_i + i)%NENV]);
if(envs[start_i%NENV].env_status == ENV_RUNNING && envs[start_i%NENV].env_cpunum == cpunum())
env_run(&envs[start_i%NENV]);
// sched_halt never returns
sched_halt();
}

syscall()当中添加新的系统调用的分发:

1
2
3
case SYS_yield:
sys_yield();
return 0;

mp_main()当中调用sched_yield()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Setup code for APs
void
mp_main(void)
{
// We are in high EIP now, safe to switch to kern_pgdir
lcr3(PADDR(kern_pgdir));
cprintf("SMP: CPU %d starting\n", cpunum());

lapic_init();
env_init_percpu();
trap_init_percpu();
xchg(&thiscpu->cpu_status, CPU_STARTED); // tell boot_aps() we're up

// Now that we have finished some basic setup, call sched_yield()
// to start running processes on this CPU. But make sure that
// only one CPU can enter the scheduler at a time!
//
// Your code here:
lock_kernel();
sched_yield();

// Remove this after you finish Exercise 6
for (;;);
}

Question 3

env_run()当中对应代码部分如下:

1
2
3
4
5
6
7
8
9
10
if(curenv != NULL && curenv->env_status == ENV_RUNNING)
curenv->env_status = ENV_RUNNABLE;
curenv = e;
e->env_status = ENV_RUNNING;
++(e->env_runs);
lcr3(PADDR(e->env_pgdir));

unlock_kernel();

env_pop_tf(&(e->env_tf));

lcr3()前后都能够正常对e进行解引用是因为,在设置env_pgdir的时候是以kern_pgdir作为模板进行修改的,e地址在两个地址空间中是映射到同一个物理地址的,所以这里进行解引用的操作不会有问题。

Question 4

保存寄存器信息的操作发生在kern/trapentry.S当中:

1
2
3
4
5
6
7
8
9
10
11
.global _alltraps
_alltraps:
pushl %ds
pushl %es
pushal
pushl $GD_KD
popl %ds
pushl $GD_KD
popl %es
pushl %esp
call trap

恢复寄存器的操作发生在kern/env.c的env_pop_tf()当中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void
env_pop_tf(struct Trapframe *tf)
{
// Record the CPU we are running on for user-space debugging
curenv->env_cpunum = cpunum();

asm volatile(
"\tmovl %0,%%esp\n"
"\tpopal\n"
"\tpopl %%es\n"
"\tpopl %%ds\n"
"\taddl $0x8,%%esp\n" /* skip tf_trapno and tf_errcode */
"\tiret\n"
: : "g" (tf) : "memory");
panic("iret failed"); /* mostly to placate the compiler */
}

Exercise 7

这里每一个系统调用的主要内容都不复杂,主要的是进行许多参数有效性的检查,只需要按照注释中的内容进行参数检查就可以。

sys_exofork()

传建一个子进程,在子进程中返回值为0,在父进程中返回的是子进程的id,先将子进程的状态设置成ENV_NOT_RUNNABLE之后再进行地址空间的复制之后可以会再设置成可运行的状态。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Allocate a new environment.
// Returns envid of new environment, or < 0 on error. Errors are:
// -E_NO_FREE_ENV if no free environment is available.
// -E_NO_MEM on memory exhaustion.
static envid_t
sys_exofork(void)
{
// Create the new environment with env_alloc(), from kern/env.c.
// It should be left as env_alloc created it, except that
// status is set to ENV_NOT_RUNNABLE, and the register set is copied
// from the current environment -- but tweaked so sys_exofork
// will appear to return 0.

// LAB 4: Your code here.
struct Env* e;
int ret;
if((ret = env_alloc(&e, curenv->env_id)))
return ret;
e->env_tf = curenv->env_tf;
e->env_tf.tf_regs.reg_eax = 0;
e->env_status = ENV_NOT_RUNNABLE;

return e->env_id;
}

sys_env_set_status()

就是23行处的设置env_status。但是这个系统调用只能在ENV_RUNNABLEENV_NOT_RUNNABLE当中设置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Set envid's env_status to status, which must be ENV_RUNNABLE
// or ENV_NOT_RUNNABLE.
//
// Returns 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if environment envid doesn't currently exist,
// or the caller doesn't have permission to change envid.
// -E_INVAL if status is not a valid status for an environment.
static int
sys_env_set_status(envid_t envid, int status)
{
// Hint: Use the 'envid2env' function from kern/env.c to translate an
// envid to a struct Env.
// You should set envid2env's third argument to 1, which will
// check whether the current environment has permission to set
// envid's status.

// LAB 4: Your code here.
struct Env* env;
if(envid2env(envid, &env, 1))
return -E_BAD_ENV;
if(status != ENV_RUNNABLE && status != ENV_NOT_RUNNABLE)
return -E_INVAL;
env->env_status = status;
return 0;
}

sys_page_alloc()

envid的地址空间中分配一个页面,除去类型检查之外所做的内容就是page_alloc()page_insert()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Allocate a page of memory and map it at 'va' with permission
// 'perm' in the address space of 'envid'.
// The page's contents are set to 0.
// If a page is already mapped at 'va', that page is unmapped as a
// side effect.
//
// perm -- PTE_U | PTE_P must be set, PTE_AVAIL | PTE_W may or may not be set,
// but no other bits may be set. See PTE_SYSCALL in inc/mmu.h.
//
// Return 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if environment envid doesn't currently exist,
// or the caller doesn't have permission to change envid.
// -E_INVAL if va >= UTOP, or va is not page-aligned.
// -E_INVAL if perm is inappropriate (see above).
// -E_NO_MEM if there's no memory to allocate the new page,
// or to allocate any necessary page tables.
static int
sys_page_alloc(envid_t envid, void *va, int perm)
{
// Hint: This function is a wrapper around page_alloc() and
// page_insert() from kern/pmap.c.
// Most of the new code you write should be to check the
// parameters for correctness.
// If page_insert() fails, remember to free the page you
// allocated!

// LAB 4: Your code here.
struct Env* env;
if(envid2env(envid, &env, 1))
return -E_BAD_ENV;
if((uint32_t)va >= UTOP || va != ROUNDDOWN(va, PGSIZE))
return -E_INVAL;
if((perm & (PTE_U | PTE_P)) != (PTE_U | PTE_P) || perm & (~PTE_SYSCALL))
return -E_INVAL;

struct PageInfo * pp;
if(!(pp = page_alloc(1)))
return -E_NO_MEM;
if(page_insert(env->env_pgdir, pp, va, perm))
{
page_free(pp);
return -E_NO_MEM;
}
return 0;
}

sys_page_map()

37行之前为参数的检查,39行之后为具体执行的内容,实际上完成的就是将srcenvid对应进程的地址空间中的srcva页面映射到dstenvid对应进程的地址空间中的dstva页面。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// Map the page of memory at 'srcva' in srcenvid's address space
// at 'dstva' in dstenvid's address space with permission 'perm'.
// Perm has the same restrictions as in sys_page_alloc, except
// that it also must not grant write access to a read-only
// page.
//
// Return 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if srcenvid and/or dstenvid doesn't currently exist,
// or the caller doesn't have permission to change one of them.
// -E_INVAL if srcva >= UTOP or srcva is not page-aligned,
// or dstva >= UTOP or dstva is not page-aligned.
// -E_INVAL is srcva is not mapped in srcenvid's address space.
// -E_INVAL if perm is inappropriate (see sys_page_alloc).
// -E_INVAL if (perm & PTE_W), but srcva is read-only in srcenvid's
// address space.
// -E_NO_MEM if there's no memory to allocate any necessary page tables.
static int
sys_page_map(envid_t srcenvid, void *srcva,
envid_t dstenvid, void *dstva, int perm)
{
// Hint: This function is a wrapper around page_lookup() and
// page_insert() from kern/pmap.c.
// Again, most of the new code you write should be to check the
// parameters for correctness.
// Use the third argument to page_lookup() to
// check the current permissions on the page.

// LAB 4: Your code here.
struct Env *srcenv, *dstenv;
if(envid2env(srcenvid, &srcenv, 1) || envid2env(dstenvid, &dstenv, 1))
return -E_BAD_ENV;
if((uint32_t)srcva >= UTOP || srcva != ROUNDDOWN(srcva, PGSIZE))
return -E_INVAL;
if((uint32_t)dstva >= UTOP || dstva != ROUNDDOWN(dstva, PGSIZE))
return -E_INVAL;
if((perm & (PTE_U | PTE_P)) != (PTE_U | PTE_P) || perm & (~PTE_SYSCALL))
return -E_INVAL;

pte_t *pte;
struct PageInfo *pp;
if(!(pp = page_lookup(srcenv->env_pgdir, srcva, &pte)))
return -E_INVAL;
if((((*pte) & PTE_W) == 0) && (perm & PTE_W))
return -E_INVAL;
return page_insert(dstenv->env_pgdir, pp, dstva, perm);
}

sys_page_unmap()

实际上就是19行处的page_remove()操作,剩下的是参数的有效性检查。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Unmap the page of memory at 'va' in the address space of 'envid'.
// If no page is mapped, the function silently succeeds.
//
// Return 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if environment envid doesn't currently exist,
// or the caller doesn't have permission to change envid.
// -E_INVAL if va >= UTOP, or va is not page-aligned.
static int
sys_page_unmap(envid_t envid, void *va)
{
// Hint: This function is a wrapper around page_remove().

// LAB 4: Your code here.
struct Env* env;
if(envid2env(envid, &env, 1))
return -E_BAD_ENV;
if((uint32_t)va >= UTOP || va != ROUNDDOWN(va, PGSIZE))
return -E_INVAL;
page_remove(env->env_pgdir, va);
return 0;
}

最后要在syscall()当中添加分发的方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
case SYS_exofork:
return sys_exofork();

case SYS_env_set_status:
return sys_env_set_status((envid_t)a1, (int)a2);

case SYS_page_alloc:
return sys_page_alloc((envid_t)a1, (void *)a2, (int) a3);

case SYS_page_map:
return sys_page_map((envid_t)a1, (void *)a2, (envid_t)a3, (void *)a4, (int)a5);

case SYS_page_unmap:
return sys_page_unmap((envid_t)a1, (void *)a2);

Exercise 8

又是一个系统调用的设置,当使用envid2env()的时候需要进行权限的检查,如果能够正常的得到env的话就设置对应的env_pgfault_upcall。同样要在syscall()当中添加新的case。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Set the page fault upcall for 'envid' by modifying the corresponding struct
// Env's 'env_pgfault_upcall' field. When 'envid' causes a page fault, the
// kernel will push a fault record onto the exception stack, then branch to
// 'func'.
//
// Returns 0 on success, < 0 on error. Errors are:
// -E_BAD_ENV if environment envid doesn't currently exist,
// or the caller doesn't have permission to change envid.
static int
sys_env_set_pgfault_upcall(envid_t envid, void *func)
{
// LAB 4: Your code here.
struct Env* env;
if(envid2env(envid, &env, 1))
return -E_BAD_ENV;
env->env_pgfault_upcall = func;
return 0;
//panic("sys_env_set_pgfault_upcall not implemented");
}

Exercise 9

这里关于page_fault_handler()在有env_pgfault_upcall的情况下,分为两种情况,如果本身在Exception Stack里面的话,那么需要空出一个word的大小,具体的作用在后面Exercise 10会体现。否则的话直接将结构体压在Exception Stack的底部就可以了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// LAB 4: Your code here.
if(curenv->env_pgfault_upcall){
struct UTrapframe * utf;
if(ROUNDDOWN(tf->tf_esp, PGSIZE) == UXSTACKTOP - PGSIZE)
utf = (struct UTrapframe *)((tf->tf_esp) - sizeof(struct UTrapframe) - 4);
else
utf = (struct UTrapframe *)(UXSTACKTOP - sizeof(struct UTrapframe));
user_mem_assert(curenv, (void *)utf, sizeof(struct UTrapframe), PTE_W);
utf->utf_fault_va = fault_va;
utf->utf_err = tf->tf_err;
utf->utf_regs = tf->tf_regs;
utf->utf_eip = tf->tf_eip;
utf->utf_eflags = tf->tf_eflags;
utf->utf_esp = tf->tf_esp;
curenv->env_tf.tf_eip = (uintptr_t)curenv->env_pgfault_upcall;
curenv->env_tf.tf_esp = (uintptr_t)utf;
env_run(curenv);
}
else{
// Destroy the environment that caused the fault.
cprintf("[%08x] user fault va %08x ip %08x\n",
curenv->env_id, fault_va, tf->tf_eip);
print_trapframe(tf);
env_destroy(curenv);
}

第8行为写入的权限检查,之后9-14行为struct UTrapframe整个结构体的压入,然后修改curenv里面的内容,转入env_pgfault_upcall当中执行。

如果没有env_pgfault_upcall的话,那么就执行env_destroy()的操作。

Exercise 10

Exception Stack中的结构如下所示:

1
2
3
4
5
6
7
8
9
//	trap-time esp
// trap-time eflags
// trap-time eip
// utf_regs.reg_eax
// ...
// utf_regs.reg_esi
// utf_regs.reg_edi
// utf_err (error code)
// utf_fault_va <-- %esp

补全的_pgfault_upcall的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// LAB 4: Your code here.
movl 0x28(%esp), %edi
movl 0x30(%esp), %esi
subl $4, %esi
movl %edi, (%esi)
movl %esi, 0x30(%esp)

// Restore the trap-time registers. After you do this, you
// can no longer modify any general-purpose registers.
// LAB 4: Your code here.
addl $8, %esp
popal

// Restore eflags from the stack. After you do this, you can
// no longer use arithmetic operations or anything else that
// modifies eflags.
// LAB 4: Your code here.
addl $4, %esp
popfl

// Switch back to the adjusted trap-time stack.
// LAB 4: Your code here.
popl %esp

// Return to re-execute the instruction that faulted.
// LAB 4: Your code here.
ret

这里要实现栈切换同时需要保存%eip,首先在2、3行,将%eip取出放入%edi中,%esp取出放入%esi中,之后将%esp向下延伸一个word的大小,然后把%eip填入,之后将修改后的%esp放回保存的位置。

这样最终得到的%esp所指向的栈顶第一个元素就是我们之前所保存的%eip寄存器的值,就同时完成了栈的切换和%eip的恢复。后面就是不断退栈恢复寄存器的过程了,非常简单。

这里如果是在Exception Stack当中的重复调用,由于之前确保重复调用会在每两个结构之间留下一个word大小的gap,这个空隙就可以填入%eip保证以上的upcall在重复调用的情况下也能正常工作。

Exercise 11

如果是第一次进行调用的话,那么需要进行初始化的设置,即给Exception Stack分配空间(17行),同时设置pgfault_upcall(19行)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
//
// Set the page fault handler function.
// If there isn't one yet, _pgfault_handler will be 0.
// The first time we register a handler, we need to
// allocate an exception stack (one page of memory with its top
// at UXSTACKTOP), and tell the kernel to call the assembly-language
// _pgfault_upcall routine when a page fault occurs.
//
void
set_pgfault_handler(void (*handler)(struct UTrapframe *utf))
{
int r;

if (_pgfault_handler == 0) {
// First time through!
// LAB 4: Your code here.
if(sys_page_alloc(0, (void *)(UXSTACKTOP - PGSIZE), PTE_U | PTE_P | PTE_W))
panic("set_pgfault_handler: page alloc fault!");
if(sys_env_set_pgfault_upcall(0, (void *)_pgfault_upcall))
panic("set_pgfault handler: set pgfault upcall failed!");
}
// Save handler pointer for assembly to call.
_pgfault_handler = handler;
}

Exercise 12

pgfault() 可以参照dumbfork.c里面的duppage(),事实上dumbfork就是全部都进行一个复制,而COW的fork()只有在写入写时复制页面的时候才会进行复制,所以这里首先进行一个检查,看是不是写入一个COW页面所产生的错误。如果是的话,就分配一个新的页面并且将整个页面的内容拷贝一份,这里如注释中所写明的利用三次系统调用实现。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
//
// Custom page fault handler - if faulting page is copy-on-write,
// map in our own private writable copy.
//
static void
pgfault(struct UTrapframe *utf)
{
void *addr = (void *) utf->utf_fault_va;
uint32_t err = utf->utf_err;
int r;

// Check that the faulting access was (1) a write, and (2) to a
// copy-on-write page. If not, panic.
// Hint:
// Use the read-only page table mappings at uvpt
// (see <inc/memlayout.h>).

// LAB 4: Your code here.
if(!((err & FEC_WR) && (uvpt[PGNUM(addr)] & PTE_COW)))
panic("pgfault: 0x%08x the fault page is not writable or copy-on-write page!", addr);

// Allocate a new page, map it at a temporary location (PFTEMP),
// copy the data from the old page to the new page, then move the new
// page to the old page's address.
// Hint:
// You should make three system calls.

// LAB 4: Your code here.
addr = ROUNDDOWN(addr, PGSIZE);
if((r = sys_page_alloc(0, PFTEMP, PTE_P|PTE_U|PTE_W)) < 0)
panic("pgfault: sys_page_alloc fail, %e", r);
memmove(PFTEMP, addr, PGSIZE);
if ((r = sys_page_map(0, PFTEMP, 0, addr, PTE_P|PTE_U|PTE_W)) < 0)
panic("pgfault: sys_page_map, %e", r);
if ((r = sys_page_unmap(0, PFTEMP)) < 0)
panic("pgfault: sys_page_unmap, %e", r);
}

这里duppage()的实现就是按照注释中的内容进行,首先判断原本的页面是不是writable或者COW的,如果是的话那么就将其perm设置成写时复制的。之后现在子进程的地址空间中进行映射,再在父进程的地址空间中进行映射。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
//
// Map our virtual page pn (address pn*PGSIZE) into the target envid
// at the same virtual address. If the page is writable or copy-on-write,
// the new mapping must be created copy-on-write, and then our mapping must be
// marked copy-on-write as well. (Exercise: Why do we need to mark ours
// copy-on-write again if it was already copy-on-write at the beginning of
// this function?)
//
// Returns: 0 on success, < 0 on error.
// It is also OK to panic on error.
//
static int
duppage(envid_t envid, unsigned pn)
{
int r;

// LAB 4: Your code here.
pte_t pte = uvpt[pn];
void * addr = (void *)(pn * PGSIZE);

uint32_t perm = pte & 0xFFF;
if(perm & (PTE_W | PTE_COW)){
perm &= ~PTE_W;
perm |= PTE_COW;
}
if((r = sys_page_map(0, addr, envid, addr, perm & PTE_SYSCALL))<0)
panic("duppage: %e", r);
if((r = sys_page_map(0, addr, 0, addr, perm & PTE_SYSCALL))<0)
panic("duppage: %e", r);
return 0;
}

fork()函数可以参照dumbfork的主体部分,由于只要赋值UTOP以下的地址空间,而Exception Stack是另外进行分配的,所以采用COW的复制方式到USTACKTOP就为止了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
//
// User-level fork with copy-on-write.
// Set up our page fault handler appropriately.
// Create a child.
// Copy our address space and page fault handler setup to the child.
// Then mark the child as runnable and return.
//
// Returns: child's envid to the parent, 0 to the child, < 0 on error.
// It is also OK to panic on error.
//
// Hint:
// Use uvpd, uvpt, and duppage.
// Remember to fix "thisenv" in the child process.
// Neither user exception stack should ever be marked copy-on-write,
// so you must allocate a new page for the child's user exception stack.
//
envid_t
fork(void)
{
// LAB 4: Your code here.
int r;
envid_t envid;
uint8_t * addr;
set_pgfault_handler(pgfault);
envid = sys_exofork();
if(envid < 0)
panic("fork: sys_exofork failed!");
if(envid == 0){
thisenv = &envs[ENVX(sys_getenvid())];
return 0;
}

for(addr = (uint8_t *)UTEXT; addr <(uint8_t *)USTACKTOP; addr += PGSIZE)
if((uvpd[PDX(addr)] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_P))
duppage(envid, PGNUM(addr));

if((r = sys_page_alloc(envid, (void *)(UXSTACKTOP-PGSIZE), PTE_W|PTE_P|PTE_U))<0)
panic("fork: sys_page_alloc failed, %e", r);

extern void _pgfault_upcall();
if((r = sys_env_set_pgfault_upcall(envid, _pgfault_upcall)))
panic("fork: sys_env_set_pgfault_upcall failed, %e", r);

if((r = sys_env_set_status(envid, ENV_RUNNABLE))<0)
panic("fork: sys_env_set_status failed, %e", r);

return envid;
}

Exercise 13

kern/trapentry.S和kern/trap.c当中由于我是用的是lab3里面challenge所描述的循环写法,这里并不需要做修改。

在kern/env.c的env_alloc()函数中设定EFLAG

1
2
3
// Enable interrupts while in user mode.
// LAB 4: Your code here.
e->env_tf.tf_eflags |= FL_IF;

sched_halt()当中所需要注意的就是取消掉sti的注释,设置IF位使得空闲CPU并不会屏蔽中断。

1
2
3
4
5
6
7
8
9
10
11
asm volatile (
"movl $0, %%ebp\n"
"movl %0, %%esp\n"
"pushl $0\n"
"pushl $0\n"
// Uncomment the following line after completing exercise 13
"sti\n"
"1:\n"
"hlt\n"
"jmp 1b\n"
: : "a" (thiscpu->cpu_ts.ts_esp0));

Exercise 14

只需要在trap_dispatch()当中添加分发的分支即可,这里需要按照注释内容在进行sched_yield()之前调用lapic_eoi()来确认中断。

1
2
3
4
5
6
7
8
// Handle clock interrupts. Don't forget to acknowledge the
// interrupt using lapic_eoi() before calling the scheduler!
// LAB 4: Your code here.
if(tf->tf_trapno == IRQ_OFFSET + IRQ_TIMER){
lapic_eoi();
sched_yield();
return;
}

Exercise 15

sys_ipc_recv()当中主要做的操作就是首先进行参数的检查,检查完了之后将其填入env当中,并且让出CPU等待发送消息的进程将其重新设置为RUNNABLE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Block until a value is ready.  Record that you want to receive
// using the env_ipc_recving and env_ipc_dstva fields of struct Env,
// mark yourself not runnable, and then give up the CPU.
//
// If 'dstva' is < UTOP, then you are willing to receive a page of data.
// 'dstva' is the virtual address at which the sent page should be mapped.
//
// This function only returns on error, but the system call will eventually
// return 0 on success.
// Return < 0 on error. Errors are:
// -E_INVAL if dstva < UTOP but dstva is not page-aligned.
static int
sys_ipc_recv(void *dstva)
{
// LAB 4: Your code here.
struct Env * env;
if(envid2env(0, &env, 0))
return -E_BAD_ENV;
if((uint32_t)dstva < UTOP && (dstva != ROUNDDOWN(dstva, PGSIZE)))
return -E_INVAL;

env->env_ipc_dstva = dstva;
env->env_ipc_recving = true;
env->env_status = ENV_NOT_RUNNABLE;
sys_yield();

return 0;
}

sys_ipc_try_send()的操作主要是对于注释里面所提到的所有可能的错误情形进行检查,当srcva < UTOP的时候,和sys_page_map()当中的处理非常相似。在最终修改接收方env里面对应的值,并且将返回值设置成0。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// Try to send 'value' to the target env 'envid'.
// If srcva < UTOP, then also send page currently mapped at 'srcva',
// so that receiver gets a duplicate mapping of the same page.
//
// The send fails with a return value of -E_IPC_NOT_RECV if the
// target is not blocked, waiting for an IPC.
//
// The send also can fail for the other reasons listed below.
//
// Otherwise, the send succeeds, and the target's ipc fields are
// updated as follows:
// env_ipc_recving is set to 0 to block future sends;
// env_ipc_from is set to the sending envid;
// env_ipc_value is set to the 'value' parameter;
// env_ipc_perm is set to 'perm' if a page was transferred, 0 otherwise.
// The target environment is marked runnable again, returning 0
// from the paused sys_ipc_recv system call. (Hint: does the
// sys_ipc_recv function ever actually return?)
//
// If the sender wants to send a page but the receiver isn't asking for one,
// then no page mapping is transferred, but no error occurs.
// The ipc only happens when no errors occur.
//
// Returns 0 on success, < 0 on error.
// Errors are:
// -E_BAD_ENV if environment envid doesn't currently exist.
// (No need to check permissions.)
// -E_IPC_NOT_RECV if envid is not currently blocked in sys_ipc_recv,
// or another environment managed to send first.
// -E_INVAL if srcva < UTOP but srcva is not page-aligned.
// -E_INVAL if srcva < UTOP and perm is inappropriate
// (see sys_page_alloc).
// -E_INVAL if srcva < UTOP but srcva is not mapped in the caller's
// address space.
// -E_INVAL if (perm & PTE_W), but srcva is read-only in the
// current environment's address space.
// -E_NO_MEM if there's not enough memory to map srcva in envid's
// address space.
static int
sys_ipc_try_send(envid_t envid, uint32_t value, void *srcva, unsigned perm)
{
// LAB 4: Your code here.
struct Env* dstenv, * srcenv;
if(envid2env(envid, &dstenv, 0) || envid2env(0, &srcenv, 0))
return -E_BAD_ENV;
if(!dstenv->env_ipc_recving)
return -E_IPC_NOT_RECV;

dstenv->env_ipc_perm = 0;

if((uint32_t)srcva < UTOP){
pte_t *pte;
struct PageInfo *pp;
if(srcva != ROUNDDOWN(srcva, PGSIZE))
return -E_INVAL;
if((perm & (PTE_U | PTE_P)) != (PTE_U | PTE_P) || perm & (~PTE_SYSCALL))
return -E_INVAL;
if(!(pp = page_lookup(srcenv->env_pgdir, srcva, &pte)))
return -E_INVAL;
if((((*pte) & PTE_W) == 0) && (perm & PTE_W))
return -E_INVAL;
if(page_insert(dstenv->env_pgdir, pp, dstenv->env_ipc_dstva, perm))
return -E_NO_MEM;
dstenv->env_ipc_perm = perm;
}

dstenv->env_ipc_recving = false;
dstenv->env_ipc_value = value;
dstenv->env_ipc_from = srcenv->env_id;
dstenv->env_status = ENV_RUNNABLE;
dstenv->env_tf.tf_regs.reg_eax = 0;

return 0;
}

在lib/ipc.c当中要提供用户态可用的进行send和recv操作的接口。两个函数的相同之处在于如果没有传递地址映射的话,那么要讲地址设置成一个UTOP上方的值。

这里ipc_recv()只要根据返回值r进行两种情况的区分即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Receive a value via IPC and return it.
// If 'pg' is nonnull, then any page sent by the sender will be mapped at
// that address.
// If 'from_env_store' is nonnull, then store the IPC sender's envid in
// *from_env_store.
// If 'perm_store' is nonnull, then store the IPC sender's page permission
// in *perm_store (this is nonzero iff a page was successfully
// transferred to 'pg').
// If the system call fails, then store 0 in *fromenv and *perm (if
// they're nonnull) and return the error.
// Otherwise, return the value sent by the sender
//
// Hint:
// Use 'thisenv' to discover the value and who sent it.
// If 'pg' is null, pass sys_ipc_recv a value that it will understand
// as meaning "no page". (Zero is not the right value, since that's
// a perfectly valid place to map a page.)
int32_t
ipc_recv(envid_t *from_env_store, void *pg, int *perm_store)
{
// LAB 4: Your code here.
int r;

r = sys_ipc_recv(pg ? pg : (void *)UTOP);
if(r){
if(from_env_store)
*from_env_store = 0;
if(perm_store)
*perm_store = 0;
return r;
}
else{
if(from_env_store)
*from_env_store = thisenv->env_ipc_from;
if(perm_store)
*perm_store = thisenv->env_ipc_perm;
return thisenv->env_ipc_value;
}
return 0;
}

而对于ipc_send()则是通过一个循环来不断地尝试发送信息,为了防止一直占用CPU,每次循环中都会调用sys_yield()主动让出。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Send 'val' (and 'pg' with 'perm', if 'pg' is nonnull) to 'toenv'.
// This function keeps trying until it succeeds.
// It should panic() on any error other than -E_IPC_NOT_RECV.
//
// Hint:
// Use sys_yield() to be CPU-friendly.
// If 'pg' is null, pass sys_ipc_try_send a value that it will understand
// as meaning "no page". (Zero is not the right value.)
void
ipc_send(envid_t to_env, uint32_t val, void *pg, int perm)
{
// LAB 4: Your code here.
int r;
do{
sys_yield();
r = sys_ipc_try_send(to_env, val, pg ? pg : (void *)UTOP, perm);
if(r != 0 && r != -E_IPC_NOT_RECV)
panic("ipc_send: faild, %e", r);
}while(r);
}

实现完了之后lab4的基础内容就已经结束了,执行make grade可以得到如下的输出:

1
2
3
4
5
6
7
8
9
10
11
spin: OK (1.8s) 
stresssched: OK (3.2s)
sendpage: OK (0.9s)
(Old jos.out.sendpage failure log removed)
pingpong: OK (1.9s)
(Old jos.out.pingpong failure log removed)
primes: OK (9.1s)
(Old jos.out.primes failure log removed)
Part C score: 25/25

Score: 80/80

看到三部分都可以拿到全部分数。

Challenge 6: sfork()

这一个challenge所要完成的是一个共享除了栈之外所有的地址空间的fork操作,记为sfork()

首先实现了一个sduppage()函数,所做的是将父进程的地址映射给复制到子进程上,对于权限并不做修改,可以看做只是在sys_page_map()的基础上的封装。

1
2
3
4
5
6
7
8
9
10
11
12
13
static int
sduppage(envid_t envid, unsigned pn)
{
int r;

pte_t pte = uvpt[pn];
void * addr = (void *)(pn * PGSIZE);

uint32_t perm = pte & 0xFFF;
if((r = sys_page_map(0, addr, envid, addr, perm & PTE_SYSCALL))<0)
panic("sduppage: %e", r);
return 0;
}

之后就是sfork()函数的实现,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
int
sfork(void)
{
int r;
envid_t envid;
uint8_t * addr;
set_pgfault_handler(pgfault);
envid = sys_exofork();
if(envid < 0)
panic("fork: sys_exofork failed!");
if(envid == 0){
thisenv = &envs[ENVX(sys_getenvid())];
return 0;
}

bool in_stack = true;
for(addr = (uint8_t *)(USTACKTOP - PGSIZE); addr >= (uint8_t *)UTEXT; addr -= PGSIZE){
if((uvpd[PDX(addr)] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_P)){
if(in_stack)
duppage(envid, PGNUM(addr));
else
sduppage(envid, PGNUM(addr));
}
else
in_stack = false;
}

if((r = sys_page_alloc(envid, (void *)(UXSTACKTOP-PGSIZE), PTE_W|PTE_P|PTE_U))<0)
panic("fork: sys_page_alloc failed, %e", r);

extern void _pgfault_upcall();
if((r = sys_env_set_pgfault_upcall(envid, _pgfault_upcall)))
panic("fork: sys_env_set_pgfault_upcall failed, %e", r);

if((r = sys_env_set_status(envid, ENV_RUNNABLE))<0)
panic("fork: sys_env_set_status failed, %e", r);

return envid;
}

可以看到与之前所实现的fork()函数的主体相对比,区别只存在16-26行。这里从栈底往下来进行duppage的操作,当超过栈顶之后,in_stack会被设置成false,之后就是共享的地址空间,全部调用sduppage()

到这里都是没有什么问题的,重点在于如何将thisenv这个全局变量能够使得不同的进程都能够得到其自身对应的Env结构体,否则的话,在sfork()的过程中,子进程会修改thisenv的指针。导致无论在父进程还是子进程当中,thisenv指向的都是子进程!

最为简单的想法就是通过sys_getenvid()系统调用得到envid,之后查找对应的Env结构体,由于两个进程共享地址空间,所以利用全局变量是不太方便的,一个简单方法是利用宏进行实现。

考虑第一种解决方案:

1
#define thisenv ((const volatile struct Env *)(&envs[ENVX(sys_getenvid())]))

这种写法可以完成需求,但是他是一个地址,而在libmain()当中以及fork()当中都有对于thisenv进行初始化的操作,这样需要进行额外的代码修改。

第二种解决方案:

1
2
extern const volatile struct Env *realenv;
#define thisenv (realenv = (const volatile struct Env *)(&envs[ENVX(sys_getenvid())])), realenv

利用逗号进行分隔,首先进行一个赋值操作,然后提供一个可以作为运算左值的对象,问题在于thisenv会被用作是cprintf()当中的参数,而逗号分隔会使得参数数量改变。

第三种解决方案:

1
2
extern const volatile struct Env *realenv;
#define thisenv ((const volatile struct Env *)*((realenv = (const volatile struct Env *)(&envs[ENVX(sys_getenvid())])), &realenv))

由于C中的逗号表达式以及赋值表达式所返回的都是值而不是对象,所以用先取地址再解引用的方式可以获得一个能作为运算左值的对象。这种方式理论上是没有问题的,但是由于当中会进行赋值操作,所以编译器会认为可能会导致结果出现偏差,会报warning。编译方式将warning视作error,所以这行不通。

最终采用的解决方案为利用一个新的指针数组存下所有Env结构体的地址,然后采用类似第一种解决方案的操作,不过得到的是一个可以作为赋值左值的对象。在inc/lib.c当中,添加关于penvs指针数组的声明,以及将thisenv作为一个宏进行声明。

1
2
3
4
extern const volatile struct Env *penvs[NENV];
extern const volatile struct Env envs[NENV];
extern const volatile struct PageInfo pages[];
# define thisenv penvs[ENVX(sys_getenvid())]

在lib/libmain.c当中声明penvs数组,并将其初始化。

1
2
3
4
5
6
7
8
9
10
const volatile struct Env * penvs[NENV];

//extern const volatile struct Env *thisenv;

void
libmain(int argc, char **argv)
{
int i;
for(i = 0; i < NENV; ++i)
penvs[i] = &envs[i];

在这样的操作下thisenv就可以完美兼容所有代码当中的情况了,不需要修改其他任何的实现。

执行pingpongs.c可以得到如下的输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
enabled interrupts: 1 2
[00000000] new env 00001000
[00001000] new env 00001001
i am 00001000; thisenv is 0xeec00000
send 0 from 1000 to 1001
1001 got 0 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 1 from 1001 (thisenv is 0xeec00000 1000)
1001 got 2 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 3 from 1001 (thisenv is 0xeec00000 1000)
1001 got 4 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 5 from 1001 (thisenv is 0xeec00000 1000)
1001 got 6 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 7 from 1001 (thisenv is 0xeec00000 1000)
1001 got 8 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 9 from 1001 (thisenv is 0xeec00000 1000)
[00001000] exiting gracefully
[00001000] free env 00001000
1001 got 10 from 1000 (thisenv is 0xeec0007c 1001)
[00001001] exiting gracefully
[00001001] free env 00001001

可以发现实际上两个进程确实是共享了地址空间,并且thisenv能够正确的指向进程自身了。

如果将其中的sfork()修改成fork()的话,得到的输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
enabled interrupts: 1 2
[00000000] new env 00001000
[00001000] new env 00001001
i am 00001000; thisenv is 0xeec00000
send 0 from 1000 to 1001
1001 got 0 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 0 from 1001 (thisenv is 0xeec00000 1000)
1001 got 1 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 1 from 1001 (thisenv is 0xeec00000 1000)
1001 got 2 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 2 from 1001 (thisenv is 0xeec00000 1000)
1001 got 3 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 3 from 1001 (thisenv is 0xeec00000 1000)
1001 got 4 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 4 from 1001 (thisenv is 0xeec00000 1000)
...

不同进程中的val值是不会共享的,综上测试可以说明sfork()的实现没有问题。

原文

Denis came to Nastya and discovered that she was not happy to see him… There is only one chance that she can become happy. Denis wants to buy all things that Nastya likes so she will certainly agree to talk to him.

The map of the city where they live has a lot of squares, some of which are connected by roads. There is exactly one way between each pair of squares which does not visit any vertex twice. It turns out that the graph of the city is a tree.

Denis is located at vertex 1 at the time 0. He wants to visit every vertex at least once and get back as soon as possible.

Denis can walk one road in 1 time. Unfortunately, the city is so large that it will take a very long time to visit all squares. Therefore, Denis took a desperate step. He pulled out his pocket time machine, which he constructed in his basement. With its help, Denis can change the time to any non-negative time, which is less than the current time.

But the time machine has one feature. If the hero finds himself in the same place and at the same time twice, there will be an explosion of universal proportions and Nastya will stay unhappy. Therefore, Denis asks you to find him a route using a time machine that he will get around all squares and will return to the first and at the same time the maximum time in which he visited any square will be minimal.

Formally, Denis’s route can be represented as a sequence of pairs:{v1,t1},{v2,t2},{v3,t3},,{vk,tk}\{v_1,t_1\},\{v_2,t_2\},\{v_3,t_3\},\ldots,\{v_k,t_k\}, where viv_i is number of square, and tit_i is time in which the boy is now.

The following conditions must be met:

  • The route starts on square 1 at time 0, i.e. v1=1,t1=0v_1=1,t_1=0 and ends on the square 1, i.e. vk=1v_k=1.
  • All transitions are divided into two types:
    1. Being in the square change the time: {vi,ti}{vi+1,ti+1}:vi+1=vi\{v_i,t_i\}\rightarrow\{v_{i+1},t_{i+1}\}:v_{i+1}=v_i, 0ti+1<ti\ 0 \le t_{i+1}<t_i.
    2. Walk along one of the roads: {vi,ti}{vi+1,ti+1}\{v_i,t_i\}\rightarrow\{v_{i+1},t_{i+1}\}. Herewith, viv_i and vi+1v_{i+1} are connected by road, and ti+1=ti+1t_{i+1}=t_i+1
  • All pairs {vi,ti}\{v_i,t_i\} must be different.
  • All squares are among v1,v2,,vkv_1,v_2,\ldots,v_k.

You need to find a route such that the maximum time in any square will be minimal, that is, the route for which max(t1,t2,,tk)\max(t_1,t_2,\ldots,t_k) will be the minimum possible.

Input

The first line contains a single integer nn (1n105)(1\le n\le 10^5) — the number of squares in the city.

The next n−1 lines contain two integers uu and vv (1v,un,uv)(1\le v,u\le n,u\ne v) - the numbers of the squares connected by the road.

It is guaranteed that the given graph is a tree.

Output

In the first line output the integer kk (1k106)(1\le k\le 10^6) — the length of the path of Denis.

In the next kk lines output pairs vi,tiv_i,t_i — pairs that describe Denis’s route (as in the statement).

All route requirements described in the statements must be met.

It is guaranteed that under given restrictions there is at least one route and an answer whose length does not exceed 10610^6. If there are several possible answers, print any.

Example

input

1
2
3
4
5
5
1 2
2 3
2 4
4 5

output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
13
1 0
2 1
3 2
3 1
2 2
4 3
4 1
5 2
5 1
4 2
2 3
2 0
1 1

题意

一个树形的结构,每次可以往相邻节点移动,需要花费一单位时间,或者是修改时间为任意非负值,但是不能够两次在同一时间到达同一节点({idx,t}\{idx,t\}的组合必须是惟一的)。问从根节点出发,最后回到根节点所需要的最大时间的最小的时候的遍历方法。如果有多个方法只需要输出任何一种。

思路

可以知道一个事情,遍历整棵树的最大时间,至少为当中所有节点的最大度数:maxv=1ndegv=T\max_{v=1}^n \deg v = T

考虑到任何节点,需要通过该节点degv1\deg v-1次来遍历所有邻居然后还需要再通过一次就才能够返回该节点的祖先。那么只要能够构造一种遍历的方法,使得可以满足最大时间为TT,即满足了题意。考虑对任何一个子树,到达子树uu的根节点父节点的时候为tt,子树一共有k=degu1k=\deg u -1个子节点:

情况1: t+1Tkt+1 \le T-k,在中途不需要进行时间回溯

(v,t)(u,t+1)(w1,t+2)(v,t)\rightarrow(u,t+1)\rightarrow(w_1,t+2)\rightarrow(u,t+2)\rightarrow(u,t+2) \rightarrow(wk,t+k+1)\rightarrow(w_k,t+k+1)\rightarrow(u,t+k)\rightarrow(u,t+k) \rightarrow (u,t)(u,t) \rightarrow (v,t+1)(v,t+1)

情况2:需要进行时间回溯

(v,t)(u,t+1)(w1,t+2)(v,t)\rightarrow(u,t+1)\rightarrow(w_1,t+2)\rightarrow(u,t+2)\rightarrow(u,t+2)\rightarrow\ldots\rightarrow(u,T)(u,T) (u,t)\rightarrow(u,t^\prime) \rightarrow\ldots(wk,t+k+1)\rightarrow(w_k,t+k+1) \rightarrow\ldots(u,t+k)\rightarrow(u,t+k)(u,t)(v,t+1)\rightarrow(u,t)\rightarrow(v,t+1)

在代码中backt就是返回父节点的时间,idx为当前的节点下标,t为当前时间节点,fa记录的是父节点的下标。整体进行一遍dfs遍历,按照上述的逻辑进行时间的修改。

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <bits/stdc++.h>
using namespace std;

#define ll long long

int n;
vector<vector<int> > edges;
vector<pair<int,int> > ans;
int maxdeg = 0;

void dfs(int idx, int& t,int backt = -1, int fa = -1){
ans.push_back(make_pair(idx, t));
int cnt = edges[idx].size() - (fa == -1? 0: 1);
for(int to: edges[idx]){
if(to == fa)
continue;
if(t == maxdeg){ // go back
t = backt - cnt - 1;
ans.push_back(make_pair(idx, t));
}
t++;
dfs(to, t, t, idx);
ans.push_back(make_pair(idx, t));
--cnt;
}
if(fa == -1) // root
return;
if(t >= backt){
t = backt - 1;
ans.push_back(make_pair(idx, t));
}
++t;
}

int main(){
ios::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);

cin>>n;
edges = vector<vector<int> >(n+1, vector<int>());
for(int i=1;i<n;++i){
int s, e;
cin>>s>>e;
edges[s].push_back(e);
edges[e].push_back(s);
}

for(int i=1;i<=n;++i)
maxdeg = max(maxdeg,(int)edges[i].size());

int t = 0;
dfs(1, t);
cout<< ans.size()<<endl;
for(auto temp: ans)
cout << temp.first <<' '<< temp.second <<endl;
}

原文

If the girl doesn’t go to Denis, then Denis will go to the girl. Using this rule, the young man left home, bought flowers and went to Nastya.

On the way from Denis’s house to the girl’s house is a road of nn lines. This road can’t be always crossed in one green light. Foreseeing this, the good mayor decided to place safety islands in some parts of the road. Each safety island is located after a line, as well as at the beginning and at the end of the road. Pedestrians can relax on them, gain strength and wait for a green light.

Denis came to the edge of the road exactly at the moment when the green light turned on. The boy knows that the traffic light first lights up gg seconds green, and then rr seconds red, then again gg seconds green and so on.

Formally, the road can be represented as a segment [0,n][0,n]. Initially, Denis is at point 00. His task is to get to point nn in the shortest possible time.

He knows many different integers d1,d2,,dmd_1,d_2,\ldots,d_m, where 0din0\le d_i\le n — are the coordinates of points, in which the safety islands are located. Only at one of these points, the boy can be at a time when the red light is on.

Unfortunately, Denis isn’t always able to control himself because of the excitement, so some restrictions are imposed:

  • He must always move while the green light is on because it’s difficult to stand when so beautiful girl is waiting for you. Denis can change his position by ±1\pm 1 in 11 second. While doing so, he must always stay inside the segment [0,n][0,n]
  • He can change his direction only on the safety islands (because it is safe). This means that if in the previous second the boy changed his position by +1+ 1 and he walked on a safety island, then he can change his position by ±1\pm 1. Otherwise, he can change his position only by +1+1. Similarly, if in the previous second he changed his position by 1-1, on a safety island he can change position by ±1\pm 1, and at any other point by 1- 1.
  • At the moment when the red light is on, the boy must be on one of the safety islands. He can continue moving in any direction when the green light is on.

Denis has crossed the road as soon as his coordinate becomes equal to nn.

This task was not so simple, because it’s possible that it is impossible to cross the road. Since Denis has all thoughts about his love, he couldn’t solve this problem and asked us to help him. Find the minimal possible time for which he can cross the road according to these rules, or find that it is impossible to do.

Input

The first line contains two integers nn and mm, (1n106,2mmin(n+1,104))(1\le n\le 10^6,2\le m\le \min(n+1,104)) — road width and the number of safety islands.

The second line contains mm distinct integers d1,d2,,dm (0din)d_1,d_2,\ldots,d_m\ (0\le d_i\le n) — the points where the safety islands are located. It is guaranteed that there are 00 and nn among them.

The third line contains two integers g,r (1g,r1000)g,r\ (1\le g,r\le 1000) — the time that the green light stays on and the time that the red light stays on.

Output

Output a single integer — the minimum time for which Denis can cross the road with obeying all the rules.

If it is impossible to cross the road output −1.

Examples

input

1
2
3
15 5
0 3 7 14 15
11 11

output

1
45

input

1
2
3
13 4
0 3 7 13
9 9

output

1
-1

Note

In the first test, the optimal route is:

  • for the first green light, go to 7 and return to 3. In this case, we will change the direction of movement at the point 7, which is allowed, since there is a safety island at this point. In the end, we will be at the point of 3, where there is also a safety island. The next 11 seconds we have to wait for the red light.
  • for the second green light reaches 14. Wait for the red light again.
  • for 11 second go to 15. As a result, Denis is at the end of the road.

In total, 45 seconds are obtained.

In the second test, it is impossible to cross the road according to all the rules.

大致意思

当绿灯亮起的时候可以不受限制的移动,但是绿灯结束的时候必须要停留在安全岛。只有在安全岛上可以进行方向的转换,问到达终点(第m个节点)所需要的最短时间是多少,如果不存在则输出-1。

思路

01-BFS,在同一个绿灯时间内的移动看做图上的边权重为0,利用push_front()否则的话认为边权重为1,利用push_back()。对于同一节点,距离绿灯开始时间相同的时刻,认为是图上相同节点。

总的时间复杂度为O(gm)O(g*m)

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <iostream>
#include <algorithm>
#include <cmath>
#include <map>
#include <queue>
#include <vector>
#include <stack>
#include <deque>
#include <set>
#include <cstring>
#include <cstdlib>
#include <cstdio>
using namespace std;

#define ll long long

int n, m;
int dist[10010];
int g, r;
int dp[10010][1010];

int main(){
ios::sync_with_stdio(0);
cin>>n>>m;
for(int i=0;i<m;++i)
cin>>dist[i];
cin>>g>>r;
sort(dist, dist+m);

deque<pair<int,int> > q;
q.push_back(make_pair(0,0));
dp[0][0] = 1;
int ans = 0x7fffffff;
while(!q.empty()){
pair<int,int> temp = q.front();
int idx = temp.first;
int val = temp.second;
q.pop_front();
for(int i=-1;i<=1;i+=2){
int new_idx = idx+i;
if(new_idx<0 || new_idx>=m)
continue;
else if(new_idx == m-1){
int d = abs(dist[idx]-dist[new_idx]);
if(val + d <= g){
ans = min(ans, (dp[idx][val]-1)*(g+r)+val+d);
}
}
else{
int d = abs(dist[idx]-dist[new_idx]);
if(val + d == g && dp[new_idx][0] == 0){
dp[new_idx][0] = dp[idx][val] + 1;
q.push_back(make_pair(new_idx, 0));
}
else if(val + d < g && dp[new_idx][val+d] == 0){
dp[new_idx][val+d] = dp[idx][val];
q.push_front(make_pair(new_idx, val+d));
}
}
}
}
cout<<((ans == 0x7fffffff)? -1: ans)<<endl;
}

Exercise 1

如同Exercise描述中所说的,类似pages分配的方法一样利用boot_alloc()就好,由于后面有env_init()函数负责结构体的初始化,这个地方不需要进行初始化的操作。

1
2
3
// Make 'envs' point to an array of size 'NENV' of 'struct Env'.
// LAB 3: Your code here.
envs = (struct Env *) boot_alloc(sizeof(struct Env) * NENV);

同时采用相似的方法将其map到UENVS处:

1
2
3
4
5
6
7
8
9
// Map the 'envs' array read-only by the user at linear address UENVS
// (ie. perm = PTE_U | PTE_P).
// Permissions:
// - the new image at UENVS -- kernel R, user R
// - envs itself -- kernel RW, user NONE
// LAB 3: Your code here.
n = ROUNDUP(NENV*sizeof(struct Env), PGSIZE);
for (i = 0; i < n; i+=PGSIZE)
page_insert(kern_pgdir, pa2page(PADDR(envs) + i), (void *)(UENVS + i), PTE_U | PTE_P);

Exericse 2

env_init()

这里的操作实际上就是进行一个envs的初始化,最开始所有的都是空闲状态,将其插入env_free_list就可以了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Mark all environments in 'envs' as free, set their env_ids to 0,
// and insert them into the env_free_list.
// Make sure the environments are in the free list in the same order
// they are in the envs array (i.e., so that the first call to
// env_alloc() returns envs[0]).
//
void
env_init(void)
{
// Set up envs array
// LAB 3: Your code here.
memset(envs, 0, sizeof(struct Env) * NENV);

int i;
env_free_list = envs;
for(i = 1; i < NENV; ++i)
envs[i-1].env_link = envs + i;

// Per-CPU part of the initialization
env_init_percpu();
}

env_setup_vm()

这里可以知道,在UTOP下面应当是空白的,在UTOP上面都是相同的,所以首先对整个page进行清空,之后利用memcpy以kern_pgdir为模板,只需要进行page table的修改就可以了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
//
// Initialize the kernel virtual memory layout for environment e.
// Allocate a page directory, set e->env_pgdir accordingly,
// and initialize the kernel portion of the new environment's address space.
// Do NOT (yet) map anything into the user portion
// of the environment's virtual address space.
//
// Returns 0 on success, < 0 on error. Errors include:
// -E_NO_MEM if page directory or table could not be allocated.
//
static int
env_setup_vm(struct Env *e)
{
int i;
struct PageInfo *p = NULL;

// Allocate a page for the page directory
if (!(p = page_alloc(ALLOC_ZERO)))
return -E_NO_MEM;

// Now, set e->env_pgdir and initialize the page directory.
//
// Hint:
// - The VA space of all envs is identical above UTOP
// (except at UVPT, which we've set below).
// See inc/memlayout.h for permissions and layout.
// Can you use kern_pgdir as a template? Hint: Yes.
// (Make sure you got the permissions right in Lab 2.)
// - The initial VA below UTOP is empty.
// - You do not need to make any more calls to page_alloc.
// - Note: In general, pp_ref is not maintained for
// physical pages mapped only above UTOP, but env_pgdir
// is an exception -- you need to increment env_pgdir's
// pp_ref for env_free to work correctly.
// - The functions in kern/pmap.h are handy.

// LAB 3: Your code here.
e->env_pgdir = page2kva(p);
++(p->pp_ref);
memset(e->env_pgdir, 0, PGSIZE);
memcpy(e->env_pgdir + PDX(UTOP), kern_pgdir + PDX(UTOP), PGSIZE - (PDX(UTOP)<<2));

// UVPT maps the env's own page table read-only.
// Permissions: kernel R, user R
e->env_pgdir[PDX(UVPT)] = PADDR(e->env_pgdir) | PTE_P | PTE_U;

return 0;
}

region_alloc()

这里可以仿照在pmap.c中多次实现的alloc操作,只不过这里的page是利用page_alloc()得到的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
//
// Allocate len bytes of physical memory for environment env,
// and map it at virtual address va in the environment's address space.
// Does not zero or otherwise initialize the mapped pages in any way.
// Pages should be writable by user and kernel.
// Panic if any allocation attempt fails.
//
static void
region_alloc(struct Env *e, void *va, size_t len)
{
// LAB 3: Your code here.
// (But only if you need it for load_icode.)
//
// Hint: It is easier to use region_alloc if the caller can pass
// 'va' and 'len' values that are not page-aligned.
// You should round va down, and round (va + len) up.
// (Watch out for corner-cases!)
void *start_va = ROUNDDOWN(va, PGSIZE);
void *end_va = ROUNDUP(va + len, PGSIZE);
void *cur_va;
for(cur_va = start_va; cur_va < end_va; cur_va += PGSIZE)
{
struct PageInfo * pp = page_alloc(0);
if(!pp)
panic("region_alloc: Out of memory!\n");
page_insert(e->env_pgdir, pp, (void *)cur_va, PTE_U | PTE_W);
}
}

load_icode()

先看看bootmain()当中是怎么操作的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void
bootmain(void)
{
struct Proghdr *ph, *eph;

// read 1st page off disk
readseg((uint32_t) ELFHDR, SECTSIZE*8, 0);

// is this a valid ELF?
if (ELFHDR->e_magic != ELF_MAGIC)
goto bad;

// load each program segment (ignores ph flags)
ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
eph = ph + ELFHDR->e_phnum;
for (; ph < eph; ph++)
// p_pa is the load address of this segment (as well
// as the physical address)
readseg(ph->p_pa, ph->p_memsz, ph->p_offset);

// call the entry point from the ELF header
// note: does not return!
((void (*)(void)) (ELFHDR->e_entry))();

bad:
outw(0x8A00, 0x8A00);
outw(0x8A00, 0x8E00);
while (1)
/* do nothing */;
}

仿照第13-19行进行下面的code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
//
// Set up the initial program binary, stack, and processor flags
// for a user process.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
//
// This function loads all loadable segments from the ELF binary image
// into the environment's user memory, starting at the appropriate
// virtual addresses indicated in the ELF program header.
// At the same time it clears to zero any portions of these segments
// that are marked in the program header as being mapped
// but not actually present in the ELF file - i.e., the program's bss section.
//
// All this is very similar to what our boot loader does, except the boot
// loader also needs to read the code from disk. Take a look at
// boot/main.c to get ideas.
//
// Finally, this function maps one page for the program's initial stack.
//
// load_icode panics if it encounters problems.
// - How might load_icode fail? What might be wrong with the given input?
//
static void
load_icode(struct Env *e, uint8_t *binary)
{
// Hints:
// Load each program segment into virtual memory
// at the address specified in the ELF segment header.
// You should only load segments with ph->p_type == ELF_PROG_LOAD.
// Each segment's virtual address can be found in ph->p_va
// and its size in memory can be found in ph->p_memsz.
// The ph->p_filesz bytes from the ELF binary, starting at
// 'binary + ph->p_offset', should be copied to virtual address
// ph->p_va. Any remaining memory bytes should be cleared to zero.
// (The ELF header should have ph->p_filesz <= ph->p_memsz.)
// Use functions from the previous lab to allocate and map pages.
//
// All page protection bits should be user read/write for now.
// ELF segments are not necessarily page-aligned, but you can
// assume for this function that no two segments will touch
// the same virtual page.
//
// You may find a function like region_alloc useful.
//
// Loading the segments is much simpler if you can move data
// directly into the virtual addresses stored in the ELF binary.
// So which page directory should be in force during
// this function?
//
// You must also do something with the program's entry point,
// to make sure that the environment starts executing there.
// What? (See env_run() and env_pop_tf() below.)

// LAB 3: Your code here.
struct Elf * ELFHDR = (struct Elf *)binary;
struct Proghdr * ph, * eph;
lcr3(PADDR(e->env_pgdir));
ph = (struct Proghdr *)(binary + ELFHDR->e_phoff);
eph = ph + ELFHDR->e_phnum;
for(; ph < eph; ++ph)
{
if(ph->p_type == ELF_PROG_LOAD)
{
region_alloc(e, (void *)ph->p_va, ph->p_memsz);
memset((void *)ph->p_va, 0, ph->p_memsz);
memcpy((void *)ph->p_va, binary + ph->p_offset, ph->p_filesz);
}
}
e->env_tf.tf_eip = ELFHDR->e_entry;

lcr3(PADDR(kern_pgdir));
// Now map one page for the program's initial stack
// at virtual address USTACKTOP - PGSIZE.

// LAB 3: Your code here.
region_alloc(e, (void *)(USTACKTOP - PGSIZE), PGSIZE);
}

env_create()

直接调用env_alloc()load_icode()就可以了,可以看做是在上面的一层封装:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
//
// Allocates a new env with env_alloc, loads the named elf
// binary into it with load_icode, and sets its env_type.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
// The new env's parent ID is set to 0.
//
void
env_create(uint8_t *binary, enum EnvType type)
{
// LAB 3: Your code here.
struct Env * e;
if(env_alloc(&e, 0))
panic("env_create: env alloc failed!\n");
load_icode(e, binary);
e->env_type = type;
}

env_run()

要完成的是进行一个上下文的切换,这里主要做的就是首先对于env需要进行状态的改变,之后需要进行地址空间的切换。同时利用已经存在的env_pop_tf()函数来进行寄存器的恢复,具体的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
//
// Context switch from curenv to env e.
// Note: if this is the first call to env_run, curenv is NULL.
//
// This function does not return.
//
void
env_run(struct Env *e)
{
// Step 1: If this is a context switch (a new environment is running):
// 1. Set the current environment (if any) back to
// ENV_RUNNABLE if it is ENV_RUNNING (think about
// what other states it can be in),
// 2. Set 'curenv' to the new environment,
// 3. Set its status to ENV_RUNNING,
// 4. Update its 'env_runs' counter,
// 5. Use lcr3() to switch to its address space.
// Step 2: Use env_pop_tf() to restore the environment's
// registers and drop into user mode in the
// environment.

// Hint: This function loads the new environment's state from
// e->env_tf. Go back through the code you wrote above
// and make sure you have set the relevant parts of
// e->env_tf to sensible values.

// LAB 3: Your code here.

if(curenv != NULL && curenv->env_status == ENV_RUNNING)
curenv->env_status = ENV_RUNNABLE;
curenv = e;
e->env_status = ENV_RUNNING;
++(e->env_runs);
lcr3(PADDR(e->env_pgdir));

env_pop_tf(&(e->env_tf));

//panic("env_run not yet implemented");
}

gdb对hello进行断点调试

在obj/user/hello.asm里面可以看到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void
sys_cputs(const char *s, size_t len)
{
800a1c: 55 push %ebp
800a1d: 89 e5 mov %esp,%ebp
800a1f: 57 push %edi
800a20: 56 push %esi
800a21: 53 push %ebx
//
// The last clause tells the assembler that this can
// potentially change the condition codes and arbitrary
// memory locations.

asm volatile("int %1\n"
800a22: b8 00 00 00 00 mov $0x0,%eax
800a27: 8b 4d 0c mov 0xc(%ebp),%ecx
800a2a: 8b 55 08 mov 0x8(%ebp),%edx
800a2d: 89 c3 mov %eax,%ebx
800a2f: 89 c7 mov %eax,%edi
800a31: 89 c6 mov %eax,%esi
800a33: cd 30 int $0x30

对应的地址为0x800a33,利用gdb进行断点设置:

1
2
3
4
5
6
7
(gdb) b *0x800a33
Breakpoint 2 at 0x800a33
(gdb) c
Continuing.
=> 0x800a33: int $0x30

Breakpoint 2, 0x00800a33 in ?? ()

发现确实执行到了这一条指令,以上的实现应该是没有问题。

Exercise 3

内容为阅读Chapter 9,是关于Exceptions和Interrupts的内容。

Exercise 4

从inc/trap.h当中可以发现,TrapFrame有着如下的结构:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
struct Trapframe {
struct PushRegs tf_regs;
uint16_t tf_es;
uint16_t tf_padding1;
uint16_t tf_ds;
uint16_t tf_padding2;
uint32_t tf_trapno;
/* below here defined by x86 hardware */
uint32_t tf_err;
uintptr_t tf_eip;
uint16_t tf_cs;
uint16_t tf_padding3;
uint32_t tf_eflags;
/* below here only when crossing rings, such as from user to kernel */
uintptr_t tf_esp;
uint16_t tf_ss;
uint16_t tf_padding4;
} __attribute__((packed));

可以知道对于剩下的就是要保存%es和%ds,来使得最终结构为一个Trapframe,剩下的按照Exercise的描述操作就可以了,得到_alltraps的结构如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/*
* Lab 3: Your code here for _alltraps
*/
.global _alltraps
_alltraps:
pushl %ds
pushl %es
pushal
pushl $GD_KD
popl %ds
pushl $GD_KD
popl %es
pushl %esp
call trap

利用已经存在的TRAPHANDLERTRAPHANDLER_NOEC宏可以来生成handler的入口,只需要区分有没有错误码就可以了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/*
* Lab 3: Your code here for generating entry points for the different traps.
*/

TRAPHANDLER_NOEC(handler_divide, T_DIVIDE)
TRAPHANDLER_NOEC(handler_debug, T_DEBUG)
TRAPHANDLER_NOEC(handler_nmi, T_NMI)
TRAPHANDLER_NOEC(handler_brkpt, T_BRKPT)
TRAPHANDLER_NOEC(handler_oflow, T_OFLOW)
TRAPHANDLER_NOEC(handler_bound, T_BOUND)
TRAPHANDLER_NOEC(handler_illop, T_ILLOP)
TRAPHANDLER_NOEC(handler_device, T_DEVICE)
TRAPHANDLER(handler_dblflt, T_DBLFLT)
TRAPHANDLER(handler_tss, T_TSS)
TRAPHANDLER(handler_segnp, T_SEGNP)
TRAPHANDLER(handler_stack, T_STACK)
TRAPHANDLER(handler_gpflt, T_GPFLT)
TRAPHANDLER(handler_pgflt, T_PGFLT)
TRAPHANDLER_NOEC(handler_fperr, T_FPERR)
TRAPHANDLER(handler_align, T_ALIGN)
TRAPHANDLER_NOEC(handler_mchk, T_MCHK)
TRAPHANDLER_NOEC(handler_simderr, T_SIMDERR)
TRAPHANDLER_NOEC(handler_syscall, T_SYSCALL)
TRAPHANDLER_NOEC(handler_default, T_DEFAULT)

通过查询80386手册的9.10可以看到如下关于error code的总结:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Description                       Interrupt     Error Code
Number

Divide error 0 No
Debug exceptions 1 No
Breakpoint 3 No
Overflow 4 No
Bounds check 5 No
Invalid opcode 6 No
Coprocessor not available 7 No
System error 8 Yes (always 0)
Coprocessor Segment Overrun 9 No
Invalid TSS 10 Yes
Segment not present 11 Yes
Stack exception 12 Yes
General protection fault 13 Yes
Page fault 14 Yes
Coprocessor error 16 No
Two-byte SW interrupt 0-255 No

在inc/mmu.h当中可以看到有关SETGATE的描述:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Set up a normal interrupt/trap gate descriptor.
// - istrap: 1 for a trap (= exception) gate, 0 for an interrupt gate.
// see section 9.6.1.3 of the i386 reference: "The difference between
// an interrupt gate and a trap gate is in the effect on IF (the
// interrupt-enable flag). An interrupt that vectors through an
// interrupt gate resets IF, thereby preventing other interrupts from
// interfering with the current interrupt handler. A subsequent IRET
// instruction restores IF to the value in the EFLAGS image on the
// stack. An interrupt through a trap gate does not change IF."
// - sel: Code segment selector for interrupt/trap handler
// - off: Offset in code segment for interrupt/trap handler
// - dpl: Descriptor Privilege Level -
// the privilege level required for software to invoke
// this interrupt/trap gate explicitly using an int instruction.
#define SETGATE(gate, istrap, sel, off, dpl) \
{ \
(gate).gd_off_15_0 = (uint32_t) (off) & 0xffff; \
(gate).gd_sel = (sel); \
(gate).gd_args = 0; \
(gate).gd_rsv1 = 0; \
(gate).gd_type = (istrap) ? STS_TG32 : STS_IG32; \
(gate).gd_s = 0; \
(gate).gd_dpl = (dpl); \
(gate).gd_p = 1; \
(gate).gd_off_31_16 = (uint32_t) (off) >> 16; \
}

之后再trap_init()当中进行这样的填充,要注意到断点和系统调用的dpl需要设置为3(用户):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// LAB 3: Your code here.
void handler_divide();
SETGATE(idt[T_DIVIDE], 0, GD_KT, handler_divide, 0);
void handler_debug();
SETGATE(idt[T_DEBUG], 0, GD_KT, handler_debug, 0);
void handler_nmi();
SETGATE(idt[T_NMI], 0, GD_KT, handler_nmi, 0);
void handler_brkpt();
SETGATE(idt[T_BRKPT], 0, GD_KT, handler_brkpt, 3);
void handler_oflow();
SETGATE(idt[T_OFLOW], 0, GD_KT, handler_oflow, 0);
void handler_bound();
SETGATE(idt[T_BOUND], 0, GD_KT, handler_bound, 0);
void handler_illop();
SETGATE(idt[T_ILLOP], 0, GD_KT, handler_illop, 0);
void handler_device();
SETGATE(idt[T_DEVICE], 0, GD_KT, handler_device, 0);
void handler_dblflt();
SETGATE(idt[T_DBLFLT], 0, GD_KT, handler_dblflt, 0);
void handler_tss();
SETGATE(idt[T_TSS], 0, GD_KT, handler_tss, 0);
void handler_segnp();
SETGATE(idt[T_SEGNP], 0, GD_KT, handler_segnp, 0);
void handler_stack();
SETGATE(idt[T_STACK], 0, GD_KT, handler_stack, 0);
void handler_gpflt();
SETGATE(idt[T_GPFLT], 0, GD_KT, handler_gpflt, 0);
void handler_pgflt();
SETGATE(idt[T_PGFLT], 0, GD_KT, handler_pgflt, 0);
void handler_fperr();
SETGATE(idt[T_FPERR], 0, GD_KT, handler_fperr, 0);
void handler_align();
SETGATE(idt[T_ALIGN], 0, GD_KT, handler_align, 0);
void handler_mchk();
SETGATE(idt[T_MCHK], 0, GD_KT, handler_mchk, 0);
void handler_simderr();
SETGATE(idt[T_SIMDERR], 0, GD_KT, handler_simderr, 0);
void handler_syscall();
SETGATE(idt[T_SYSCALL], 1, GD_KT, handler_syscall, 3);
void handler_default();
SETGATE(idt[T_DEFAULT], 0, GD_KT, handler_default, 0);

利用make grade可以得到下面的输出:

1
2
3
4
5
6
7
divzero: OK (1.8s)
(Old jos.out.divzero failure log removed)
softint: OK (1.7s)
(Old jos.out.softint failure log removed)
badsegment: OK (2.1s)
(Old jos.out.badsegment failure log removed)
Part A score: 30/30

说明这里Part A的实现没有问题。

Question

  1. 如果所有的exception/interrupt都通过同样一个handler,那么就没有办法知道是通过哪一个中断进来的,不能设置对应的中断号,后面不能进行分发。

  2. 除了系统调用门,其他的特权级都设置成0,这里int $14本来应当触发page fault,但是这个时候权限不对,所以会触发general protection fault。如果允许他能够触发page fault的话,那么者会造成安全隐患。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [00000000] new env 00001000
    Incoming TRAP frame at 0xefffffbc
    TRAP frame at 0xf0226000
    edi 0x00000000
    esi 0x00000000
    ebp 0xeebfdfd0
    oesp 0xefffffdc
    ebx 0x00000000
    edx 0x00000000
    ecx 0x00000000
    eax 0x00000000
    es 0x----0023
    ds 0x----0023
    trap 0x0000000d General Protection
    err 0x00000072
    eip 0x00800037
    cs 0x----001b
    flag 0x00000046
    esp 0xeebfdfd0
    ss 0x----0023
    [00001000] free env 00001000

    当允许触发page fault的时候,可以看到保存的内容如下

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    [00000000] new env 00001000
    Incoming TRAP frame at 0xefffffc0
    TRAP frame at 0xefffffc0
    edi 0x00000000
    esi 0x00000000
    ebp 0xeebfdfd0
    oesp 0xefffffe0
    ebx 0x00000000
    edx 0x00000000
    ecx 0x00000000
    eax 0x00000000
    es 0x----0023
    ds 0x----0023
    trap 0x0000000e Page Fault
    cr2 0x00000000
    err 0x00800039 [kernel, read, protection]
    eip 0x0000001b
    cs 0x----0046
    flag 0xeebfdfd0
    esp 0x00000023
    ss 0x----ff53
    [00001000] free env 00001000

    在这里我的想法是第15行往后全部进行了一位的上移,可以看到之后的err当中的内容实际上应该是eip,eip的内容实际上是cs,以此类推。应该是cr2或者err code没有进行压入导致的。

Exercise 5

只需要在trap_dispatch()当中添加一个分支即可:

1
2
3
4
5
if(tf->tf_trapno == T_PGFLT)
{
page_fault_handler(tf);
return;
}

Exercise 6

和Exercise 5相同,只需要在trap_dispatch()里面添加一个分支:

1
2
3
4
5
if(tf->tf_trapno == T_BRKPT)
{
monitor(tf);
return;
}

Question

  1. 在于前面Exercise 4中的设置:

    1
    SETGATE(idt[T_BRKPT], 0, GD_KT, handler_brkpt, 3);

    这里当最后的dpl设置为3的时候,会正确的触发为break point exception,当设置为0的时候,会触发为general protection fault。其原因在于,如果设置为0,会导致断点触发需要内核级的权限,因为权限不够从而触发GPF。

  2. 这个测试的目的主要是检查权限是否设置正确,需要正确的区分用户和内核,防止用户对于内核代码进行操作产生安全隐患。

Exercise 7

在kern/trap.c里面,同之前两个Exercise一样进行分发的设置:

1
2
3
4
5
6
7
if(tf->tf_trapno == T_SYSCALL)
{
tf->tf_regs.reg_eax = syscall(tf->tf_regs.reg_eax,
tf->tf_regs.reg_edx, tf->tf_regs.reg_ecx, tf->tf_regs.reg_ebx,
tf->tf_regs.reg_edi, tf->tf_regs.reg_esi);
return;
}

在kern/syscall.c当中,利用switch进行分发即可,注意不同系统调用的参数就可以了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Dispatches to the correct kernel function, passing the arguments.
int32_t
syscall(uint32_t syscallno, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
// Call the function corresponding to the 'syscallno' parameter.
// Return any appropriate return value.
// LAB 3: Your code here.

//panic("syscall not implemented");

switch (syscallno) {
case SYS_cputs:
sys_cputs((const char *)a1, (size_t)a2);
return 0;

case SYS_cgetc:
return sys_cgetc();

case SYS_getenvid:
return sys_getenvid();

case SYS_env_destroy:
return sys_env_destroy((envid_t)a1);

case NSYSCALLS:
return 0;

default:
return -E_INVAL;
}
}

Exercise 8

在lib/libmain.c当中,进行env_id的指定:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
void
libmain(int argc, char **argv)
{
// set thisenv to point at our Env structure in envs[].
// LAB 3: Your code here.
envid_t envid = sys_getenvid();
thisenv = &envs[ENVX(envid)];

// save the name of the program so that panic() can use it
if (argc > 0)
binaryname = argv[0];

// call user main routine
umain(argc, argv);

// exit gracefully
exit();
}

Exercise 9

在kern/trap.c当中的page_fault_handler()函数当中,利用tf_cs来判断是不是在kernel-mode,如果是直接触发一个panic:

1
2
3
4
// Handle kernel-mode page faults.
// LAB 3: Your code here.
if(((tf->tf_cs)&3) == 0)
panic("page fault: happen in kernel mode! %08x\n", tf->tf_cs);

在kern/pmap.c当中,采用一个for循环对虚拟地址区间进行权限的检查,具体内容遵循注释就可以。当检查没有问题的时候返回值为0,否则返回值为-E_FAULT。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
//
// Check that an environment is allowed to access the range of memory
// [va, va+len) with permissions 'perm | PTE_P'.
// Normally 'perm' will contain PTE_U at least, but this is not required.
// 'va' and 'len' need not be page-aligned; you must test every page that
// contains any of that range. You will test either 'len/PGSIZE',
// 'len/PGSIZE + 1', or 'len/PGSIZE + 2' pages.
//
// A user program can access a virtual address if (1) the address is below
// ULIM, and (2) the page table gives it permission. These are exactly
// the tests you should implement here.
//
// If there is an error, set the 'user_mem_check_addr' variable to the first
// erroneous virtual address.
//
// Returns 0 if the user program can access this range of addresses,
// and -E_FAULT otherwise.
//
int
user_mem_check(struct Env *env, const void *va, size_t len, int perm)
{
// LAB 3: Your code here.
int newperm = perm | PTE_P;
uint32_t cur_addr;
pte_t * pte;
for(cur_addr = (uint32_t)va; cur_addr < (uint32_t)(va + len); cur_addr = ROUNDDOWN((cur_addr+PGSIZE),PGSIZE))
{
if(cur_addr >= ULIM)
{
user_mem_check_addr = cur_addr;
return -E_FAULT;
}
pte = pgdir_walk(env->env_pgdir, (void *)cur_addr, 0);
if((!pte) || ((*pte) & newperm) != newperm){
user_mem_check_addr = cur_addr;
return -E_FAULT;
}
}
return 0;
}

需要注意的是要在kern/syscall.c当中需要填充上有关检查的部分!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Print a string to the system console.
// The string is exactly 'len' characters long.
// Destroys the environment on memory errors.
static void
sys_cputs(const char *s, size_t len)
{
// Check that the user has permission to read memory [s, s+len).
// Destroy the environment if not.

// LAB 3: Your code here.
user_mem_assert(curenv, s, len, PTE_W);

// Print the string supplied by the user.
cprintf("%.*s", len, s);
}

如果这里没有进行user_mem_assert()的话,执行buggyhello会进入系统调用然后在内核态触发page fault。

之后为backtrace相关的内容,在kern/kdebug.c当中添加有关usd,stabs,stabstr的检查如下,这里注意user_mem_check()当正常的时候返回值为0:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
else {
// The user-application linker script, user/user.ld,
// puts information about the application's stabs (equivalent
// to __STAB_BEGIN__, __STAB_END__, __STABSTR_BEGIN__, and
// __STABSTR_END__) in a structure located at virtual address
// USTABDATA.
const struct UserStabData *usd = (const struct UserStabData *) USTABDATA;

// Make sure this memory is valid.
// Return -1 if it is not. Hint: Call user_mem_check.
// LAB 3: Your code here.
if(user_mem_check(curenv, (void *)usd, sizeof(struct UserStabData), PTE_U))
return -1;

stabs = usd->stabs;
stab_end = usd->stab_end;
stabstr = usd->stabstr;
stabstr_end = usd->stabstr_end;

// Make sure the STABS and string table memory is valid.
// LAB 3: Your code here.
if(user_mem_check(curenv, (void *)stabs, (uint32_t)stabs-(uint32_t)stab_end, PTE_U))
return -1;
if(user_mem_check(curenv, (void *)stabstr, (uint32_t)stabstr_end-(uint32_t)stabstr, PTE_U))
return -1;
}

在执行breakpoint之后,利用backtrace得到的结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
K> backtrace
Stack backtrace:
ebp efffff10 eip f01010d6 args 00000001 efffff28 f0228000 00000000 f01e6a40
kern/monitor.c:448: monitor+260
ebp efffff80 eip f01048ca args f0228000 efffffbc 00000000 00000000 00000000
kern/trap.c:195: trap+180
ebp efffffb0 eip f0104a43 args efffffbc 00000000 00000000 eebfdfd0 efffffdc
kern/trapentry.S:85: <unknown>+0
ebp eebfdfd0 eip 0080007b args 00000000 00000000 00000000 00000000 00000000
lib/libmain.c:27: libmain+63
ebp eebfdff0 eip 00800031 args 00000000 00000000Incoming TRAP frame at 0xeffffec
kernel panic at kern/trap.c:270: page fault: happen in kernel mode! 00000008

可以看到最后为lib/libmain.c,并且最终在内核态发生了page fault。可以发现,这个地方args在输出到第三个参数的时候突然触发,那应该是从ebp向上读取args触发的page fault。

结合mom_backtrace()的实现如下,应该是在13行的语句处出现的错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
cprintf("Stack backtrace:\n");
uint32_t* ebp = (uint32_t*)read_ebp();
struct Eipdebuginfo info;
while(ebp){
cprintf("ebp %08x ",ebp);
cprintf("eip %08x ",ebp[1]);
cprintf("args");
int i;
for(i=2;i<=6;++i)
cprintf(" %08x",ebp[i]);
cprintf("\n");

Exercise 10

运行evilhello可以看到如下的输出:

1
2
3
4
5
6
7
[00000000] new env 00001000
Incoming TRAP frame at 0xefffffbc
Incoming TRAP frame at 0xefffffbc
[00001000] user_mem_check assertion failure for va f010000c
[00001000] free env 00001000
Destroyed the only environment - nothing more to do!
Welcome to the JOS kernel monitor!

用户环境被销毁了,并且kernel没有panic,说明行为符合预期。

使用make grade命令可以得到如下结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
divzero: OK (1.4s)
softint: OK (1.4s)
badsegment: OK (2.0s)
Part A score: 30/30

faultread: OK (1.9s)
faultreadkernel: OK (1.6s)
faultwrite: OK (0.9s)
faultwritekernel: OK (1.6s)
breakpoint: OK (2.0s)
testbss: OK (2.1s)
hello: OK (1.8s)
buggyhello: OK (1.7s)
buggyhello2: OK (0.8s)
evilhello: OK (1.6s)
Part B score: 50/50

Score: 80/80

说明lab3的内容已经被完成了。

Challenge 1

参考了github上https://github.com/SimpCosm/6.828/tree/master/lab3的实现。

其中TRAPHANDLER和TRAPHANDLER_NOEC的主要区别就在于有没有压入error code,这里采用一个if语句来进行判断:

1
2
3
4
5
6
7
8
9
10
11
12
13
#define GENERALHANDLER(name, num)	\
.data; \
.long name; \
.text; \
.globl name; \
.type name, @function; \
.align 2; \
name: \
.if !(num == 8 || (num >= 10 && num <= 14) || num == 17 ); \
pushl $0; \
.endif; \
pushl $(num); \
jmp _alltraps

之后构建一个数组vectors用来保存相关函数,就可以采用脚本语言批量生成重复代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
.data
.globl vectors
vectors:
GENERALHANDLER(handler0, 0)
GENERALHANDLER(handler1, 1)
GENERALHANDLER(handler2, 2)
GENERALHANDLER(handler3, 3)
GENERALHANDLER(handler4, 4)
GENERALHANDLER(handler5, 5)
GENERALHANDLER(handler6, 6)
GENERALHANDLER(handler7, 7)
GENERALHANDLER(handler8, 8)
GENERALHANDLER(handler9, 9)
GENERALHANDLER(handler10, 10)
GENERALHANDLER(handler11, 11)
GENERALHANDLER(handler12, 12)
GENERALHANDLER(handler13, 13)
GENERALHANDLER(handler14, 14)
GENERALHANDLER(handler15, 15)
GENERALHANDLER(handler16, 16)
GENERALHANDLER(handler17, 17)
GENERALHANDLER(handler18, 18)
GENERALHANDLER(handler19, 19)
GENERALHANDLER(handler20, 20)
GENERALHANDLER(handler21, 21)
GENERALHANDLER(handler22, 22)
GENERALHANDLER(handler23, 23)
GENERALHANDLER(handler24, 24)
GENERALHANDLER(handler25, 25)
GENERALHANDLER(handler26, 26)
GENERALHANDLER(handler27, 27)
GENERALHANDLER(handler28, 28)
GENERALHANDLER(handler29, 29)
GENERALHANDLER(handler30, 30)
GENERALHANDLER(handler31, 31)
GENERALHANDLER(handler32, 32)
GENERALHANDLER(handler33, 33)
GENERALHANDLER(handler34, 34)
GENERALHANDLER(handler35, 35)
GENERALHANDLER(handler36, 36)
GENERALHANDLER(handler37, 37)
GENERALHANDLER(handler38, 38)
GENERALHANDLER(handler39, 39)
GENERALHANDLER(handler40, 40)
GENERALHANDLER(handler41, 41)
GENERALHANDLER(handler42, 42)
GENERALHANDLER(handler43, 43)
GENERALHANDLER(handler44, 44)
GENERALHANDLER(handler45, 45)
GENERALHANDLER(handler46, 46)
GENERALHANDLER(handler47, 47)
GENERALHANDLER(handler48, 48)
GENERALHANDLER(handler49, 49)
GENERALHANDLER(handler50, 50)
GENERALHANDLER(handler51, 51)
GENERALHANDLER(handler52, 52)
GENERALHANDLER(handler53, 53)

之后就可以在kern/trap.c中对trap_init()采用循环构建入口,节省大量代码,对于特殊的可以单独提出来进行构造:

1
2
3
4
5
int i;
for(i = 0; i < 54; ++i)
SETGATE(idt[i], 0, GD_KT, vectors[i], 0);
SETGATE(idt[T_BRKPT], 0, GD_KT, vectors[T_BRKPT], 3);
SETGATE(idt[T_SYSCALL], 1, GD_KT, vectors[T_SYSCALL], 3);

在完成以上的修改之后,通过make grade仍然可以得到80分结果,说明没有问题:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
divzero: OK (1.4s)
softint: OK (1.4s)
badsegment: OK (1.6s)
Part A score: 30/30

faultread: OK (0.9s)
faultreadkernel: OK (1.5s)
faultwrite: OK (2.0s)
faultwritekernel: OK (1.6s)
breakpoint: OK (0.9s)
(Old jos.out.breakpoint failure log removed)
testbss: OK (1.5s)
hello: OK (1.6s)
buggyhello: OK (1.0s)
buggyhello2: OK (1.4s)
evilhello: OK (1.6s)
Part B score: 50/50

Score: 80/80

Challenge 2

Intel手册中12.3.1.4节为关于单步调试的相关内容:

This debug condition occurs at the end of an instruction if the trap flag (TF) of the flags register held the value one at the beginning of that instruction. Note that the exception does not occur at the end of an instruction that sets TF. For example, if POPF is used to set TF, a single-step trap does not occur until after the instruction that follows POPF.

意思就是设置了TF之后,执行完下一条命令会触发一个DEBUG。于是可以照如下写continue和stepi指令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int
mon_continue(int argc, char **argv, struct Trapframe *tf)
{
if(!tf)
return 0;
tf->tf_eflags &= ~FL_TF;
return -1;
}

int
mon_stepi(int argc, char **argv, struct Trapframe *tf)
{
if(!tf)
return 0;
tf->tf_eflags |= FL_TF;
return -1;
}

这里需要注意第六行,如果在continue里面不进行eflags维护单纯返回的话,会导致在执行了stepi指令之后,continue指令无效的情况。

同时为了使得stepi触发DEBUG之后能够回到monitor,需要在trap_dispatch()当中添加关于T_DEBUG的分发:

1
2
3
4
5
if(tf->tf_trapno == T_BRKPT || tf->tf_trapno == T_DEBUG)
{
monitor(tf);
return;
}

此时利用continue可以在断点程序之后继续执行,效果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[00000000] new env 00001000
Incoming TRAP frame at 0xefffffbc
Incoming TRAP frame at 0xefffffbc
Welcome to the JOS kernel monitor!
Type 'help' for a list of commands.
TRAP frame at 0xf0228000
edi 0x00000000
esi 0x00000000
ebp 0xeebfdfd0
oesp 0xefffffdc
ebx 0x00000000
edx 0x00000000
ecx 0x00000000
eax 0xeec00000
es 0x----0023
ds 0x----0023
trap 0x00000003 Breakpoint
err 0x00000000
eip 0x00800038
cs 0x----001b
flag 0x00000046
esp 0xeebfdfd0
ss 0x----0023
K> continue
Incoming TRAP frame at 0xefffffbc
[00001000] exiting gracefully
[00001000] free env 00001000
Destroyed the only environment - nothing more to do!
Welcome to the JOS kernel monitor!
Type 'help' for a list of commands.
K>

Challenge 3

从lab中所给的链接可以找到rdmsrwrmsr的宏定义:

1
2
3
4
5
6
7
8
9
#define rdmsr(msr,val1,val2) \
__asm__ __volatile__("rdmsr" \
: "=a" (val1), "=d" (val2) \
: "c" (msr))

#define wrmsr(msr,val1,val2) \
__asm__ __volatile__("wrmsr" \
: /* no outputs */ \
: "c" (msr), "a" (val1), "d" (val2))

从IA32的手册当中可以找到在使用SYSENTER之前所需要设置的相关内容:

  • IA32_SYSENTER_CS (MSR address 174H) — The lower 16 bits of this MSR are the segment selector for the privilege level 0 code segment. This value is also used to determine the segment selector of the privilege level 0 stack segment (see the Operation section). This value cannot indicate a null selector.
  • IA32_SYSENTER_EIP (MSR address 176H) — The value of this MSR is loaded into RIP (thus, this value references the first instruction of the selected operating procedure or routine). In protected mode, only bits 31:0 are loaded.
  • IA32_SYSENTER_ESP (MSR address 175H) — The value of this MSR is loaded into RSP (thus, this value contains the stack pointer for the privilege level 0 stack). This value cannot represent a non-canonical address. In protected mode, only bits 31:0 are loaded.

添加一个sysenter_init()并且在trap_init()内进行调用来实现初始化:

1
2
3
4
5
6
7
8
void
sysenter_init(void)
{
wrmsr(0x174, GD_KT, 0);
wrmsr(0x176, syscall_fast, 0);
wrmsr(0x175, KSTACKTOP, 0);
return;
}

在lib/syscall.c里面需要仿照syscall写一个syscall_fast,与syscall不同的是,这里采用sysenter而不是int 0x30。同时需要将%esi保存为sysenter之后的位置,并且push和pop保存%ebp。这里参数同syscall相似,只是少了最后一个a5。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static inline int32_t
syscall_fast(int num, int check, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4)
{
int32_t ret;
asm volatile(
"leal .after_sysenter_label, %%esi\n"
"push %%ebp\n"
"movl %%esp, %%ebp\n"
"sysenter\n"
".after_sysenter_label: popl %%ebp\n"
: "=a" (ret)
: "a" (num),
"d" (a1),
"c" (a2),
"b" (a3),
"D" (a4)
:);

if(check && ret > 0)
panic("syscall %d returned %d (> 0)", num, ret);

return ret;
}

在kern/syscall.c中要写一个handler用来处理对应的系统调用,这个就是之前在init里面所填充的入口,流程为从保存的寄存器中取得参数,执行相应的内容,结束之后将返回值保存并利用sysexit返回(这里如果参数不都采用"=m"的约束会出现bug,原因未知):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
void
syscall_fast(void)
{
uint32_t syscallno, a1, a2, a3, a4, ret;
uint32_t eip, esp;

asm volatile(
"mov %%eax, %0\n"
"mov %%edx, %1\n"
"mov %%ecx, %2\n"
"mov %%ebx, %3\n"
"mov %%edi, %4\n"
"mov %%esi, %5\n"
"mov (%%ebp), %6\n"
:"=r" (syscallno),
"=m" (a1),
"=m" (a2),
"=m" (a3),
"=m" (a4),
"=r" (eip),
"=r" (esp)
);

switch (syscallno) {
case SYS_cputs:
sys_cputs((const char *)a1, (size_t)a2);
ret = 0;
break;

case SYS_cgetc:
ret = sys_cgetc();
break;

case SYS_getenvid:
ret = sys_getenvid();
break;

case SYS_env_destroy:
ret = sys_env_destroy((envid_t)a1);
break;

case NSYSCALLS:
ret = 0;
break;

default:
panic("syscall_fast: wrong syscallno\n");
}

asm volatile(
"sysexit\n"
:
: "a" (ret),
"d" (eip),
"c" (esp)
:);
}

在lib/syscall.c中修改sys_cputs()让其调用syscall_fast()进行测试(实际上就是丢掉最后一个传入的参数就可以了):

1
2
3
4
5
6
void
sys_cputs(const char *s, size_t len)
{
syscall_fast(SYS_cputs, 0, (uint32_t)s, len, 0, 0);
//syscall(SYS_cputs, 0, (uint32_t)s, len, 0, 0, 0);
}

执行hello能够得到的输出如下:

1
2
3
4
5
6
7
8
[00000000] new env 00001000
Incoming TRAP frame at 0xefffffbc
hello, world
i am environment 00001000
Incoming TRAP frame at 0xefffffbc
[00001000] exiting gracefully
[00001000] free env 00001000
Destroyed the only environment - nothing more to do!

对比原来的输出:

1
2
3
4
5
6
7
8
9
10
[00000000] new env 00001000
Incoming TRAP frame at 0xefffffbc
Incoming TRAP frame at 0xefffffbc
hello, world
Incoming TRAP frame at 0xefffffbc
i am environment 00001000
Incoming TRAP frame at 0xefffffbc
[00001000] exiting gracefully
[00001000] free env 00001000
Destroyed the only environment - nothing more to do!

可以发现由于在进行系统调用的时候没有采用int 0x30,所以这里在每次输出前并没有都进入trap()函数,使得少去了两行Incoming TRAP frame at ....的输出。

替换后使用make grade也能够得到满分80分,至少在这个lab中采用sysenter不会有问题。

Before start

以下是memlayout.h中对于虚拟地址空间布局的描述:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
/*
* Virtual memory map: Permissions
* kernel/user
*
* 4 Gig --------> +------------------------------+
* | | RW/--
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* : . :
* : . :
* : . :
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
* | | RW/--
* | Remapped Physical Memory | RW/--
* | | RW/--
* KERNBASE, ----> +------------------------------+ 0xf0000000 --+
* KSTACKTOP | CPU0's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| |
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* | CPU1's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| PTSIZE
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* : . : |
* : . : |
* MMIOLIM ------> +------------------------------+ 0xefc00000 --+
* | Memory-mapped I/O | RW/-- PTSIZE
* ULIM, MMIOBASE --> +------------------------------+ 0xef800000
* | Cur. Page Table (User R-) | R-/R- PTSIZE
* UVPT ----> +------------------------------+ 0xef400000
* | RO PAGES | R-/R- PTSIZE
* UPAGES ----> +------------------------------+ 0xef000000
* | RO ENVS | R-/R- PTSIZE
* UTOP,UENVS ------> +------------------------------+ 0xeec00000
* UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebff000
* | Empty Memory (*) | --/-- PGSIZE
* USTACKTOP ---> +------------------------------+ 0xeebfe000
* | Normal User Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebfd000
* | |
* | |
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* . .
* . .
* . .
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
* | Program Data & Heap |
* UTEXT --------> +------------------------------+ 0x00800000
* PFTEMP -------> | Empty Memory (*) | PTSIZE
* | |
* UTEMP --------> +------------------------------+ 0x00400000 --+
* | Empty Memory (*) | |
* | - - - - - - - - - - - - - - -| |
* | User STAB Data (optional) | PTSIZE
* USTABDATA ----> +------------------------------+ 0x00200000 |
* | Empty Memory (*) | |
* 0 ------------> +------------------------------+ --+
*
* (*) Note: The kernel ensures that "Invalid Memory" is *never* mapped.
* "Empty Memory" is normally unmapped, but user programs may map pages
* there if desired. JOS user programs map pages temporarily at UTEMP.
*/

总体的虚拟内存布局是如上的一个状态。

在inc/mmu.h文件的注释中可以看到对于线性地址的结构描述如下,是按二级页表的方式进行地址转换的。前十位是一级页表的索引,中间十位是二级页表索引,最后的12位表示的是4K页面内部的偏移量。

1
2
3
4
5
6
7
8
// A linear address 'la' has a three-part structure as follows:
//
// +--------10------+-------10-------+---------12----------+
// | Page Directory | Page Table | Offset within Page |
// | Index | Index | |
// +----------------+----------------+---------------------+
// \--- PDX(la) --/ \--- PTX(la) --/ \---- PGOFF(la) ----/
// \---------- PGNUM(la) ----------/

在inc/memlayout.h中可以看到PageInfo的结构如下:

1
2
3
4
5
6
7
8
9
10
11
12
struct PageInfo {
// Next page on the free list.
struct PageInfo *pp_link;

// pp_ref is the count of pointers (usually in page table entries)
// to this page, for pages allocated using page_alloc.
// Pages allocated at boot time using pmap.c's
// boot_alloc do not have valid reference count fields.

uint16_t pp_ref;
};

其中pp_link链接的是free list当中下一个空闲的页面,而pp_ref表示的是指向该页面的指针的个数,当清零的时候说明页面就没有被指向了。在全局是利用一个PageInfo的数组来存放物理页面状态:

1
extern struct PageInfo *pages;

可以发现页面是通过一个PageInfo类型进行描述,指针与pages地址的差值就是页面号,物理地址就直接是一个32-bit的整数,相互转换依照上方的三级结构进行:

1
2
3
4
5
6
7
8
9
10
11
12
13
static inline physaddr_t
page2pa(struct PageInfo *pp)
{
return (pp - pages) << PGSHIFT;
}

static inline struct PageInfo*
pa2page(physaddr_t pa)
{
if (PGNUM(pa) >= npages)
panic("pa2page called with invalid pa");
return &pages[PGNUM(pa)];
}

所以以上两个函数提供了一个在page和物理地址之间进行相互转换的方式。

Exercise 1

boot_alloc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// This simple physical memory allocator is used only while JOS is setting
// up its virtual memory system. page_alloc() is the real allocator.
//
// If n>0, allocates enough pages of contiguous physical memory to hold 'n'
// bytes. Doesn't initialize the memory. Returns a kernel virtual address.
//
// If n==0, returns the address of the next free page without allocating
// anything.
//
// If we're out of memory, boot_alloc should panic.
// This function may ONLY be used during initialization,
// before the page_free_list list has been set up.
static void *
boot_alloc(uint32_t n)
{
static char *nextfree; // virtual address of next byte of free memory
char *result;

// Initialize nextfree if this is the first time.
// 'end' is a magic symbol automatically generated by the linker,
// which points to the end of the kernel's bss segment:
// the first virtual address that the linker did *not* assign
// to any kernel code or global variables.
if (!nextfree) {
extern char end[];
nextfree = ROUNDUP((char *) end, PGSIZE);
}

// Allocate a chunk large enough to hold 'n' bytes, then update
// nextfree. Make sure nextfree is kept aligned
// to a multiple of PGSIZE.
//
// LAB 2: Your code here.
if(npages * PGSIZE < (uint32_t)(nextfree + n - KERNBASE)) // out of memory
panic("boot_alloc: We are out of memory.\n");
result = nextfree;
nextfree = ROUNDUP(nextfree + n, PGSIZE);

return result;
}

从34行开始为补充的代码部分,nextfree作为static变量只会初始化一次,用来表示往后第一个没有被分配的virtual address。

由于在注释中要求要对于对于out of memory的情况需要触发panic,所以这里在34行进行了一个分配内容是否超过物理内存限制的检查。如果一切正常的话就进行分配,采用已经定义好的ROUNDUP宏来进行页面对齐。

如果n为0的时候,37行代码不会产生任何改变,符合注释中所描述的代码逻辑。

mem_init

1
2
3
4
5
6
7
8
9
// Allocate an array of npages 'struct PageInfo's and store it in 'pages'.
// The kernel uses this array to keep track of physical pages: for
// each physical page, there is a corresponding struct PageInfo in this
// array. 'npages' is the number of physical pages in memory. Use memset
// to initialize all fields of each struct PageInfo to 0.
// Your code goes here:

pages = (struct PageInfo *) boot_alloc(sizeof(struct PageInfo) * npages);
memset(pages, 0, sizeof(struct PageInfo)*npages);

这里所需要进行的就是对于pages这样一个存储PageInfo的数组进行空间分配,并且初始化为0。首先先利用boot_alloc进行空间的分配,之后利用memset进行清零就可以了。

这里可以看到,pages实际上就是在页目录之后进行分配的一串物理地址。

page_init

在memlayout.h中可以看到

1
2
3
4
5
// At IOPHYSMEM (640K) there is a 384K hole for I/O.  From the kernel,
// IOPHYSMEM can be addressed at KERNBASE + IOPHYSMEM. The hole ends
// at physical address EXTPHYSMEM.
#define IOPHYSMEM 0x0A0000
#define EXTPHYSMEM 0x100000

IOPHYSMEM对应的是640K的位置,EXTPHYSMEM对应的是1M的位置。在lab1当中内核代码就是被加载到了1M的后面,之后再之前的mem_init()当中,我们又在上面进行了pages的分配,当前可以使用的free空间应当是从之前分配的内容后面开始。boot_alloc()返回的是一个kernel virtual address,需要将其转换得到对应的physical address

从pmap.h文件中可以看到从PA向KVA的转换如下:

1
2
3
4
5
6
7
8
9
10
11
/* This macro takes a physical address and returns the corresponding kernel
* virtual address. It panics if you pass an invalid physical address. */
#define KADDR(pa) _kaddr(__FILE__, __LINE__, pa)

static inline void*
_kaddr(const char *file, int line, physaddr_t pa)
{
if (PGNUM(pa) >= npages)
_panic(file, line, "KADDR called with invalid pa %08lx", pa);
return (void *)(pa + KERNBASE);
}

那么从KVA向PA的转换只需要进行一个逆操作。

最后的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
//
// Initialize page structure and memory free list.
// After this is done, NEVER use boot_alloc again. ONLY use the page
// allocator functions below to allocate and deallocate physical
// memory via the page_free_list.
//
void
page_init(void)
{
// The example code here marks all physical pages as free.
// However this is not truly the case. What memory is free?
// 1) Mark physical page 0 as in use.
// This way we preserve the real-mode IDT and BIOS structures
// in case we ever need them. (Currently we don't, but...)
// 2) The rest of base memory, [PGSIZE, npages_basemem * PGSIZE)
// is free.
// 3) Then comes the IO hole [IOPHYSMEM, EXTPHYSMEM), which must
// never be allocated.
// 4) Then extended memory [EXTPHYSMEM, ...).
// Some of it is in use, some is free. Where is the kernel
// in physical memory? Which pages are already in use for
// page tables and other data structures?
//
// Change the code to reflect this.
// NB: DO NOT actually touch the physical memory corresponding to
// free pages!
size_t i;
uint32_t pa_free_start = (uint32_t)((char *)boot_alloc(0) - KERNBASE);
// case 1:
pages[0].pp_ref = 1;
pages[0].pp_link = NULL;
// case 2, 3, 4:
for (i = 1; i < npages; i++) {
if(IOPHYSMEM <= i * PGSIZE && i * PGSIZE < pa_free_start)
{
pages[i].pp_ref = 1;
pages[i].pp_link = NULL;
}
else
{
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list;
page_free_list = &pages[i];
}
}
}

pa_free_start表示,这个物理地址后面的空间在当前都是空闲的,所以3和4的一部分都需要被设置成已经分配的状态。剩下的内容都被设置成空闲页面,加入到page_free_list当中。

page_alloc

这里完成的是对于页面的分配,根据alloc_flags来判断是否对于页面进行初始化。如果进行分配的话,那么就将page_free_list的头一个页面取出进行分配即可,初始化利用page2kva得到对应的地址,然后进行初始化操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
//
// Allocates a physical page. If (alloc_flags & ALLOC_ZERO), fills the entire
// returned physical page with '\0' bytes. Does NOT increment the reference
// count of the page - the caller must do these if necessary (either explicitly
// or via page_insert).
//
// Be sure to set the pp_link field of the allocated page to NULL so
// page_free can check for double-free bugs.
//
// Returns NULL if out of free memory.
//
// Hint: use page2kva and memset
struct PageInfo *
page_alloc(int alloc_flags)
{
// Fill this function in
struct PageInfo* alloc_page = page_free_list;
if(alloc_page == NULL)
return NULL;
page_free_list = alloc_page->pp_link;
alloc_page->pp_link = NULL;
if(alloc_flags && ALLOC_ZERO)
memset(page2kva(alloc_page), 0, PGSIZE);
return alloc_page;
}

page_free

这里做的操作是释放页面,将页面插入page_free_list的头部就可以,但是首先需要检查是否pp_ref为0且pp_link为_NULL,前者不为0表示对于仍在使用的页面进行了释放的操作,后者不为NULL说明它本身就已经是被释放的页面,进行了double free的操作,都要触发panic。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
//
// Return a page to the free list.
// (This function should only be called when pp->pp_ref reaches 0.)
//
void
page_free(struct PageInfo *pp)
{
// Fill this function in
// Hint: You may want to panic if pp->pp_ref is nonzero or
// pp->pp_link is not NULL.
if(pp->pp_ref != 0 || pp->pp_link != NULL)
panic("page_free: pp_ref or pp_link is not zero!\n");
pp->pp_link = page_free_list;
page_free_list = pp;
return;
}

到这里可以得到如下的输出内容

1
2
check_page_free_list() succeeded!
check_page_alloc() succeeded!

说明关于page_free_list的维护和page_alloc的操作都没有问题。

Exercise 2

主要是关于Intel 80386手册的描述,其中第五章主要是对分段机制的描述,第六章是对分页机制的描述。由于在JOS当中将整个空间看做一个段,所以段偏移量就是线性地址,只需要明白关于分页机制以及对应的线性地址转换到物理地址的过程就好。

Exercise 3

主要是GDB和QEMU命令的熟悉。

Exercise 4

pgdir_walk

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Given 'pgdir', a pointer to a page directory, pgdir_walk returns
// a pointer to the page table entry (PTE) for linear address 'va'.
// This requires walking the two-level page table structure.
//
// The relevant page table page might not exist yet.
// If this is true, and create == false, then pgdir_walk returns NULL.
// Otherwise, pgdir_walk allocates a new page table page with page_alloc.
// - If the allocation fails, pgdir_walk returns NULL.
// - Otherwise, the new page's reference count is incremented,
// the page is cleared,
// and pgdir_walk returns a pointer into the new page table page.
//
// Hint 1: you can turn a PageInfo * into the physical address of the
// page it refers to with page2pa() from kern/pmap.h.
//
// Hint 2: the x86 MMU checks permission bits in both the page directory
// and the page table, so it's safe to leave permissions in the page
// directory more permissive than strictly necessary.
//
// Hint 3: look at inc/mmu.h for useful macros that manipulate page
// table and page directory entries.
//
pte_t *
pgdir_walk(pde_t *pgdir, const void *va, int create)
{
// Fill this function in
uint32_t pdx = PDX(va);
uint32_t ptx = PTX(va);
if(pgdir[pdx] == 0)
{
if(create)
{
struct PageInfo* newpte = page_alloc(1);
if(newpte == NULL)
return NULL;
++(newpte->pp_ref);
pgdir[pdx] = page2pa(newpte) | PTE_P | PTE_W | PTE_U;
}
else
return NULL;
}
physaddr_t pte = PTE_ADDR(pgdir[pdx]) | (ptx << 2);
return KADDR(pte);
}

pgdir_walk()函数做的内容实际上是通过虚拟地址va来进行一个地址翻译,找到所对应的pte。这里传入的三个参数,pgdir是一个指向页目录基址的指针,va是要进行翻译的虚拟地址,create是一个标志,如果非0说明对于对应的va不存在pte的话需要进行分配。

那么首先检查就是页目录中对应的页表到底存在不存在,如果存在的话,直接取出然后进行pte的计算。那么如果不存在的话,就需要page_alloc来分配一个物理页面用来存储页表,并且将该物理页面的引用添加,之后由于关于权限的确认在后面的pte项当中也会进行,所以这里关于页表就可以直接提供全部的权限,将其填入对应的页目录的项中。

那创建了页表之后,就如同之前一样进行进一步的地址转换。但是由于这里物理地址是不能直接进行解引用操作的,所以利用KADDR宏将得到的物理地址转换成remap过的虚拟地址,这样可以通过解引用来获得对应的物理地址也能对于所存储的内容进行修改。

boot_map_region

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
//
// Map [va, va+size) of virtual address space to physical [pa, pa+size)
// in the page table rooted at pgdir. Size is a multiple of PGSIZE, and
// va and pa are both page-aligned.
// Use permission bits perm|PTE_P for the entries.
//
// This function is only intended to set up the ``static'' mappings
// above UTOP. As such, it should *not* change the pp_ref field on the
// mapped pages.
//
// Hint: the TA solution uses pgdir_walk
static void
boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
// Fill this function in
while(size > 0)
{
pte_t* pte = pgdir_walk(pgdir, (void *)va, 1);
if(pte == NULL)
panic("boot_map_region: Fail to alloc new page, run out of memory!\n");
*pte = pa | perm | PTE_P;
size -= PGSIZE;
va += PGSIZE, pa += PGSIZE;
}
}

这个函数的作用是将一串连续的虚拟地址映射到一串连续的物理地址,其中映射的地址的大小是页面大小的整数倍。那么可以知道,直接的想法就是通过虚拟地址进行地址查询,然后将页表中对应的表项修改为映射到的物理地址就可以了。那么以每个页面单位来进行这样的操作。

首先通过pgdir_walk()来找到虚拟地址对应的表项,如果对应的二级页表不存在那么就进行空间的分配,如果分配失败则进行报错,出发一个panic。

之后就将物理地址以及对应的权限填到表项里面,然后对下一个需要映射的页进行相同的操作。

page_lookup

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
//
// Return the page mapped at virtual address 'va'.
// If pte_store is not zero, then we store in it the address
// of the pte for this page. This is used by page_remove and
// can be used to verify page permissions for syscall arguments,
// but should not be used by most callers.
//
// Return NULL if there is no page mapped at va.
//
// Hint: the TA solution uses pgdir_walk and pa2page.
//
struct PageInfo *
page_lookup(pde_t *pgdir, void *va, pte_t **pte_store)
{
// Fill this function in
pte_t* pte = pgdir_walk(pgdir, va, 0);
if(pte == NULL)
return NULL;
physaddr_t pa = PTE_ADDR(*pte);
if(pte_store)
*pte_store = pte;
return pa2page(pa);
}

这个地方的page_lookup()想要做的是通过虚拟地址va来查找对应的映射页的PageInfo结构,这边的操作就是首先去找pte,如果找到说明该虚拟地址被映射到了一个页面,得到映射页面的物理页面首地址,再通过pa2page()完成转换。那么如果没有找到的话,说明这个虚拟地址并没有映射到任何物理页面。如果传入的pte_store非空的话,就将其进行保存。

page_remove

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
//
// Unmaps the physical page at virtual address 'va'.
// If there is no physical page at that address, silently does nothing.
//
// Details:
// - The ref count on the physical page should decrement.
// - The physical page should be freed if the refcount reaches 0.
// - The pg table entry corresponding to 'va' should be set to 0.
// (if such a PTE exists)
// - The TLB must be invalidated if you remove an entry from
// the page table.
//
// Hint: The TA solution is implemented using page_lookup,
// tlb_invalidate, and page_decref.
//
void
page_remove(pde_t *pgdir, void *va)
{
// Fill this function in
pte_t* pte_store;
struct PageInfo* pp = page_lookup(pgdir, va, &pte_store);
if(pp == NULL)
return;
*pte_store = 0;
page_decref(pp);
tlb_invalidate(pgdir, va);
}

page_remove()所做的操作是将va映射到的物理页面给取消映射。要完成remove的操作需要做两件事情,一个就是将页表项中的对应内容给修改,另外一个就是对于PageInfo结构的修改,需要将其引用数减少,如果引用数为0,那么就将其加入空闲链表。在25行处的page_decref()函数做的实际上就是上述这个减少引用的操作。

那么一开始利用page_lookup()来找到对应的页面和pte,在24行修改pte,在25行修改链表结构,在26行调用tlb_invalidate()函数把TLB里面的内容给标注为无效。

page_insert

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
//
// Map the physical page 'pp' at virtual address 'va'.
// The permissions (the low 12 bits) of the page table entry
// should be set to 'perm|PTE_P'.
//
// Requirements
// - If there is already a page mapped at 'va', it should be page_remove()d.
// - If necessary, on demand, a page table should be allocated and inserted
// into 'pgdir'.
// - pp->pp_ref should be incremented if the insertion succeeds.
// - The TLB must be invalidated if a page was formerly present at 'va'.
//
// Corner-case hint: Make sure to consider what happens when the same
// pp is re-inserted at the same virtual address in the same pgdir.
// However, try not to distinguish this case in your code, as this
// frequently leads to subtle bugs; there's an elegant way to handle
// everything in one code path.
//
// RETURNS:
// 0 on success
// -E_NO_MEM, if page table couldn't be allocated
//
// Hint: The TA solution is implemented using pgdir_walk, page_remove,
// and page2pa.
//
int
page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)
{
// Fill this function in
pte_t* pte = pgdir_walk(pgdir, va, 1);
if(pte == NULL)
return -E_NO_MEM;
physaddr_t pa = page2pa(pp);
++(pp->pp_ref);
if(*pte)
page_remove(pgdir, va);
*pte = pa | perm | PTE_P;
return 0;
}

page_insert()的操作就是将va映射到pp所指向的物理页面上去,而对应的权限通过perm来进行表示。那么利用pgdir_walk()函数来获得pte, 如果没有就进行创建。那么这个情况下如果返回为NULL,只有可能是空间不足无法创建,于是返回-E_NO_MEM。那么如果能够得到pte,就对于对应的物理页面进行处理,添加引用数,然后把本来pte可能存在的映射关系给消除,之后再进行映射。

这里存在的一个问题是,如果我这里提供的pp本来就是va映射的对象,可能会出现问题。考虑将34行进行引用数增加的内容移到36行后面,那么他首先进行了page_remove。如果之前的引用数为1,那这个页面将被加入空闲链表。而之后再给他加了一个引用数,这就相当于空闲链表中存在着不空闲的页面,他可能会被二次分配。存在一个bug。而将该语句保存在34行的位置,就可以确保remove之后,如果本来就是va映射的页面,也不会被加入到空闲链表中,规避了之前所说的那种bug的出现。

Exercise 5

1
2
3
4
5
6
7
8
9
// Map 'pages' read-only by the user at linear address UPAGES
// Permissions:
// - the new image at UPAGES -- kernel R, user R
// (ie. perm = PTE_U | PTE_P)
// - pages itself -- kernel RW, user NONE
// Your code goes here:
n = ROUNDUP(npages*sizeof(struct PageInfo), PGSIZE);
for (i = 0; i < n; i += PGSIZE)
page_insert(kern_pgdir, pa2page(PADDR(pages) + i), (void *)(UPAGES + i), PTE_U | PTE_P);

这里是要将pages个映射到UPAGES以上的内容,那么这里要考虑到pages这整个内容实际上是对应着许多PageInfo结构的,在进行映射的同时需要对于PageInfo内部的引用数进行修改,这里采用一个for循环将所有页面依次进行映射。权限位由于在注释中说明,需要内核和用户都可读,所以标注成PTE_U|PTE_P。

1
2
3
4
5
6
7
8
9
10
11
// Use the physical memory that 'bootstack' refers to as the kernel
// stack. The kernel stack grows down from virtual address KSTACKTOP.
// We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP)
// to be the kernel stack, but break this into two pieces:
// * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory
// * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if
// the kernel overflows its stack, it will fault rather than
// overwrite memory. Known as a "guard page".
// Permissions: kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir, KSTACKTOP - KSTKSIZE, KSTKSIZE, PADDR(bootstack), PTE_W);

这里进行的是一个连续地址的映射,完成的是内核栈的一个映射。这个地方被划分成了[KSTACKTOP-KSTKSIZE, KSTACKTOP)[KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE)两个部分,通过注释内容可以知道,前一段是需要映射到物理地址的,后一段是不需要的。所以我们要做的只是将前一段进行映射。这里bootstack是已经知道的,通过PADDR将其转换为物理地址,然后映射以KSTACKTOP-KSTKSIZE为起点,KSTKSIZE大小的内容。权限由于在boot_map_region()当中会自动加上PTE_P,所以这里只要标注PTE_W。

1
2
3
4
5
6
7
8
9
// Map all of physical memory at KERNBASE.
// Ie. the VA range [KERNBASE, 2^32) should map to
// the PA range [0, 2^32 - KERNBASE)
// We might not have 2^32 - KERNBASE bytes of physical memory, but
// we just set up the mapping anyway.
// Permissions: kernel RW, user NONE
// Your code goes here:
//cprintf("kernbase: %x 2^32-kernbase: %x", KERNBASE, (~KERNBASE)+1);
boot_map_region(kern_pgdir, KERNBASE, (~KERNBASE) + 1, 0, PTE_W);

同样的是一个进行简单的连续地址映射的操作,那么这个地方也是采用boot_map_region()来进行,但是这里需要得到2^32,而32位大小是表示不出这么大的数的,所以这里采用(~KERNBASE)+1来得到需要进行映射的大小。这里的权限同样由于boot_map_region()会自动加上PTE_P,所以只需要标注PTE_W就可以了。

到这里为止,通过执行

1
make grade

可以得到如下的结果:

1
2
3
4
5
6
running JOS: (2.8s)
Physical page allocator: OK
Page management: OK
Kernel page directory: OK
Page management 2: OK
Score: 70/70

说明已经满足所有check函数的需求,完成了虚拟内存系统的一个初始化。

Questions:

这里的value应当是一个虚拟地址,在程序里面,并不能直接对于物理地址进行操控,所有的指针都应当是虚拟地址。

这里需要注意的是:JOS将从0开始的所有物理内存映射到虚拟地址0xf0000000就是为了让内核能够读写只知道物理地址的内容。那么为了完成从物理地址到虚拟地址的转换,对于只知道物理地址的,就将其物理地址加上0xf0000000,就可以得到对应的虚拟地址了。利用定义好的宏KADDA(pa)可以做到,而宏PADDA(va)就是这个的逆操作。在Exercise4当中这两个宏能够有效地进行虚拟地址物理地址之间的转换,从而使的解引用等操作可以进行执行。

  • 表格如下:
Entry Base Virtual Address Points to (logically)
960 0xf0000000 以上映射到物理地址从0开始的位置
959 0xefff8000 内核栈
958 0xef800000 页表(UVPT)
957 0xef400000 pages数组(UPAGES)
. . .
0 0x00000000 [see next question]
  • We have placed the kernel and user environment in the same address space. Why will user programs not be able to read or write the kernel’s memory? What specific mechanisms protect the kernel memory?

因为在表项中存在权限为,只有PTE_U被设置成1的时候才可以让user访问,kernel memory只需要修改权限为就可以不被user读写。

  • What is the maximum amount of physical memory that this operating system can support? Why?

在kern/pmap.c中的mem_init()函数中可以看到将pages数组映射到了线性地址的UPAGES上方。那么在inc/memlayout.h的图中可以看到,给只读的pages数组分配的空间为4M大小(一个PTSIZE)。

1
2
3
*    UVPT      ---->  +------------------------------+ 0xef400000
* | RO PAGES | R-/R- PTSIZE
* UPAGES ----> +------------------------------+ 0xef000000

一个PageInfo的大小是8Byte,一个页面的大小是4K。所以可以得到4M的pages数组对应的物理内存大小是:

422084210=2230\frac{4*2^{20}}{8}*4*2^{10} = 2*2^{30}

即操作系统能够支持的物理内存大小不会超过2G,理由如上。

  • How much space overhead is there for managing memory, if we actually had the maximum amount of physical memory? How is this overhead broken down?

如果所有虚拟地址都被映射的话,那么页表的开销,一级页表需要1个page,二级页表需要1024个page。总共需要1025个page。所以页表上的开销为10254=4100KB1025*4=4100KB

采用大页可以减少开销,这样只需要一级页表就可以进行索引,需要一个page也就是4KB就可以了。

  • Revisit the page table setup in kern/entry.S and kern/entrypgdir.c. Immediately after we turn on paging, EIP is still a low number (a little over 1MB). At what point do we transition to running at an EIP above KERNBASE? What makes it possible for us to continue executing at a low EIP between when we enable paging and when we begin running at an EIP above KERNBASE? Why is this transition necessary?

在27行处利用jmp *%eax进行了跳转,完成了在高地址执行的转换。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
.globl entry
entry:
movw $0x1234,0x472 # warm boot

# We haven't set up virtual memory yet, so we're running from
# the physical address the boot loader loaded the kernel at: 1MB
# (plus a few bytes). However, the C code is linked to run at
# KERNBASE+1MB. Hence, we set up a trivial page directory that
# translates virtual addresses [KERNBASE, KERNBASE+4MB) to
# physical addresses [0, 4MB). This 4MB region will be
# sufficient until we set up our real page table in mem_init
# in lab 2.

# Load the physical address of entry_pgdir into cr3. entry_pgdir
# is defined in entrypgdir.c.
movl $(RELOC(entry_pgdir)), %eax
movl %eax, %cr3
# Turn on paging.
movl %cr0, %eax
orl $(CR0_PE|CR0_PG|CR0_WP), %eax
movl %eax, %cr0

# Now paging is enabled, but we're still running at a low EIP
# (why is this okay?). Jump up above KERNBASE before entering
# C code.
mov $relocated, %eax
jmp *%eax
relocated:

# Clear the frame pointer register (EBP)
# so that once we get into debugging C code,
# stack backtraces will be terminated properly.
movl $0x0,%ebp # nuke frame pointer

# Set the stack pointer
movl $(bootstacktop),%esp

# now to C code
call i386_init

同时在低EIP和高EIP访问的原因是,我们将虚拟地址的[0, 4MB)和[KERNBASE, KERNBASE+4MB)都映射到了物理地址的[0, 4MB),所以无论从低地址还是高地址都可以进行访问。

1
2
3
4
5
6
7
8
9
__attribute__((__aligned__(PGSIZE)))
pde_t entry_pgdir[NPDENTRIES] = {
// Map VA's [0, 4MB) to PA's [0, 4MB)
[0]
= ((uintptr_t)entry_pgtable - KERNBASE) + PTE_P,
// Map VA's [KERNBASE, KERNBASE+4MB) to PA's [0, 4MB)
[KERNBASE>>PDXSHIFT]
= ((uintptr_t)entry_pgtable - KERNBASE) + PTE_P + PTE_W
};

从前面的内容可以看到,他在完成分页之后还有在低地址执行的语句,如果不同时将高地址和低地址都映射到物理地址的最低4M的话,那么在低地址运行的代码会出错。

Challenge

可以在inc/mmu.h当中找到关于PTE/PDE flag的描述,具体内容如下:

1
2
3
4
5
6
7
8
9
10
// Page table/directory entry flags.
#define PTE_P 0x001 // Present
#define PTE_W 0x002 // Writeable
#define PTE_U 0x004 // User
#define PTE_PWT 0x008 // Write-Through
#define PTE_PCD 0x010 // Cache-Disable
#define PTE_A 0x020 // Accessed
#define PTE_D 0x040 // Dirty
#define PTE_PS 0x080 // Page Size
#define PTE_G 0x100 // Global

第九行所示的就是PTE_PS位,是用来调整Page Size大小的。

采用大页只有一级页表,对应的地址翻译方式如下:

image-20200310182755807

通过Intel IA32手册3.6.1节关于Page Option的描述可以知道,需要开启cr4里面的PSE标志位,来说明提供对于大页的支持,在mem_init()当中添加如下代码进行实现:

1
2
3
4
// Set CR4_PSE
cr4 = rcr4();
cr4 |= CR4_PSE;
lcr4(cr4);

考虑到lab整体要对于这种大小页混合的方式进行适配的话,需要对于页面相关的许多函数进行重写。所以这里只考虑虚拟地址高256M到物理地址低256M的映射采用大页实现,只对于boot_map_region及其相关函数进行修改。

修改pgdir_walk()函数如下,其中normal状态是针对仅存在4K大小页的情况,而ex表示的是大小页混合状态的情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
pte_t *
pgdir_walk_normal(pde_t *pgdir, const void *va, int create)
{
// Fill this function in
uint32_t pdx = PDX(va);
uint32_t ptx = PTX(va);
if(pgdir[pdx] == 0)
{
if(create)
{
struct PageInfo* newpte = page_alloc(1);
if(newpte == NULL)
return NULL;
++(newpte->pp_ref);
pgdir[pdx] = page2pa(newpte) | PTE_P | PTE_W | PTE_U;
}
else
return NULL;
}
physaddr_t pte = PTE_ADDR(pgdir[pdx]) | (ptx << 2);
return KADDR(pte);
}

pte_t *
pgdir_walk_ex(pde_t *pgdir, const void *va, int create)
{
uint32_t pdx = PDX(va);
if(pgdir[pdx] == 0)
{
if(create == 1)
{
struct PageInfo* newpte = page_alloc(1);
if(newpte == NULL)
return NULL;
++(newpte->pp_ref);
pgdir[pdx] = page2pa(newpte) | PTE_P | PTE_W | PTE_U;
}
else if(create == 2)
{
pgdir[pdx] = PTE_PS;
}
else
return NULL;
}
else if(create == 2 && (!(pgdir[pdx] & PTE_PS)))
{
struct PageInfo * pp = pa2page(PTE_ADDR(pgdir[pdx]));
page_decref(pp);
tlb_invalidate(pgdir, (void*)va);
pgdir[pdx] = PTE_PS;
}
uint32_t pde = pgdir[pdx];
if(pde & PTE_PS)
{
return pgdir + pdx;
}
else
{
return KADDR(PTE_ADDR(pgdir[pdx]) | (PTX(va) << 2));
}
}

pte_t *
pgdir_walk(pde_t *pgdir, const void *va, int create)
{
uint32_t size_ex = rcr4() & CR4_PSE;
if(size_ex)
return pgdir_walk_normal(pgdir, va, create);
else
return pgdir_walk_ex(pgdir, va, create);
}

这里仅仅对ex函数进行讨论,首先create可能为0、1或者2,不同于normal函数只存在0、1两种情况。0的时候表示不进行额外的分配。1的情况表示是一个小页,2的情况表示是一个大页,只要非0都是表示若不存在则进行分配。

那么如果当前当做一个大页的话,进行分配的情况不需要再去分配一个页面作为二级页表,只需要标记PTE_PS位返回填充对应的物理地址基址就好了。但是存在一个情况在于,原本这个pde指向的是一个二级页表,但是当前是采用大页进行分配的。所以在45行处有针对这种情况的特判。需要做的是将对应的二级页表的页面给清空,然后当做一个新分配的大页进行返回就可以了。

之后考虑的是boot_map_region()函数,同样重写:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
static void
boot_map_region_normal(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
// Fill this function in
while(size > 0)
{
pte_t* pte = pgdir_walk(pgdir, (void *)va, 1);
if(pte == NULL)
panic("boot_map_region: Fail to alloc new page, run out of memory!\n");
*pte = pa | perm | PTE_P;
size -= PGSIZE;
va += PGSIZE, pa += PGSIZE;
}
}

static void
boot_map_region_ex(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
while(size > 0)
{
pte_t* pte = pgdir_walk_ex(pgdir, (void *)va, 2);
if(pte == NULL)
panic("boot_map_region: Fail to alloc new page, run out of memory!\n");
*pte = pa | perm | PTE_P | PTE_PS;
size -= PTSIZE;
va += PTSIZE, pa+= PTSIZE;
}
}

static void
boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
uint32_t size_ex = rcr4() & CR4_PSE;
if(size_ex)
{
if(ROUNDUP(pa, PTSIZE) < ROUNDDOWN(pa+size, PTSIZE))
{
boot_map_region_normal(pgdir, va, ROUNDUP(pa, PTSIZE) - pa, pa, perm);
boot_map_region_ex(pgdir, va+ROUNDUP(pa, PTSIZE)-pa, ROUNDDOWN(pa+size, PTSIZE) - ROUNDUP(pa, PTSIZE), ROUNDUP(pa, PTSIZE), perm);
boot_map_region_normal(pgdir, va+ROUNDDOWN(pa+size, PTSIZE)-pa, pa+size - ROUNDDOWN(pa+size, PTSIZE), ROUNDDOWN(pa+size, PTSIZE), perm);
}
else
boot_map_region_normal(pgdir, va, size, pa, perm);
}
else
boot_map_region_normal(pgdir, va, size, pa, perm);
}

这里boot_map_region也被分成了两种情况,normal表示的是以4K为一个页面进行映射构造,ex表示的是以4M为一个页面大小进行映射构造。

在36行处的判断说明,仅当cr4被标识成拓展的页面大小且进行映射的区间内存在连续的4M空间的时候,对可以采用大页进行分配的部分采用大页,即调用boot_map_region_ex()函数。所以像是内核栈这种大小只有几十K的映射,在cr4设置之后仍然是采用原有的小页方法进行映射的。

这个时候采用原有的check函数会产生错误,原因在于原有的check函数所进行的地址转换方法是二级的。

重写原来给定的va2pa函数如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
static physaddr_t
check_va2pa(pde_t *pgdir, uintptr_t va)
{
pte_t *p;
pgdir = &pgdir[PDX(va)];
if(!(*pgdir & PTE_P))
return ~0;
if((*pgdir & PTE_PS))
{
return ((*pgdir) & 0xffc00000) | (va & 0x3ff000);
}
else
{
p = (pte_t*) KADDR(PTE_ADDR(*pgdir));
if (!(p[PTX(va)] & PTE_P))
return ~0;
return PTE_ADDR(p[PTX(va)]);
}

}

根据PTE_PS标志位来决定采用一级寻址还是二级寻址,这样就可以得到正常的结果:

1
2
3
4
5
6
7
8
check_page_free_list() succeeded!
check_page_alloc() succeeded!
check_page() succeeded!
check_kern_pgdir() succeeded!
check_page_installed_pgdir() succeeded!
Welcome to the JOS kernel monitor!
Type 'help' for a list of commands.
K>

这里在清空cr4之后都是采用normal的方式来进行映射和寻址,所以会保持和原来相同的行为。

Challenge2

showmappings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
int 
mon_showmappings(int argc, char **argv, struct Trapframe *tf)
{
if(argc != 3)
{
cprintf("showmappings: should input 3 arguments!\n");
return 0;
}
uint32_t lowerbound = strtol(argv[1], '\0', 16);
uint32_t upperbound = strtol(argv[2], '\0', 16);
uint32_t va;
cprintf("Virtual Address\tPhysical Address\turw\n");
for(va = ROUNDDOWN(lowerbound, PGSIZE); va <= ROUNDUP(upperbound, PGSIZE); va += PGSIZE)
{

pte_t * pte = pgdir_walk(kern_pgdir, (void *)va, 0);
if(pte && ((*pte) & PTE_P))
{
physaddr_t pa = PTE_ADDR(*pte);
char perm_U = ((*pte) & PTE_U) ? 'u' : '-';
char perm_P = ((*pte) & PTE_P) ? 'r' : '-';
char perm_W = ((*pte) & PTE_W) ? 'w' : '-';
cprintf(" 0x%08x\t 0x%08x\t%c%c%c\n" , va, pa, perm_U, perm_P, perm_W);
}
else
cprintf(" 0x%08x\t 0x--------\t---\n", va);
}
return 0;
}

代码如上,可以采用如下形式进行[start_va, end_va]区间内虚拟地址到物理地址页面映射的查询:

1
showmappings <start_va> <end_va>

结果如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
K> showmappings 0xefff0000 0xf0000000
Virtual Address Physical Address urw
0xefff0000 0x-------- ---
0xefff1000 0x-------- ---
0xefff2000 0x-------- ---
0xefff3000 0x-------- ---
0xefff4000 0x-------- ---
0xefff5000 0x-------- ---
0xefff6000 0x-------- ---
0xefff7000 0x-------- ---
0xefff8000 0x00117000 -rw
0xefff9000 0x00118000 -rw
0xefffa000 0x00119000 -rw
0xefffb000 0x0011a000 -rw
0xefffc000 0x0011b000 -rw
0xefffd000 0x0011c000 -rw
0xefffe000 0x0011d000 -rw
0xeffff000 0x0011e000 -rw
0xf0000000 0x00000000 -rw

setperm

提供权限位的设置方法,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
int
mon_setperm(int argc, char **argv, struct Trapframe * tf)
{
if(argc == 3)
{
uint32_t va = strtol(argv[1], '\0', 16);
int perm_U = 0;
int perm_P = 0;
int perm_W = 0;
int i;
for(i = 0; argv[2][i]; ++i)
{
if(argv[2][i] == 'u')
perm_U = 1;
else if(argv[2][i] == 'p')
perm_P = 1;
else if(argv[2][i] == 'w')
perm_W = 1;
}
pte_t * pte = pgdir_walk(kern_pgdir, (void *)va, 0);
if(pte)
{
if(perm_U)
*pte = (*pte) | PTE_U;
if(perm_P)
*pte = (*pte) | PTE_P;
if(perm_W)
*pte = (*pte) | PTE_W;
}
else
{
cprintf("The virtual address 0x%08x is unmapped\n", va);
}
}
else if(argc == 4)
{
uint32_t lowerbound = strtol(argv[1], '\0', 16);
uint32_t upperbound = strtol(argv[2], '\0', 16);
int perm_U = 0;
int perm_P = 0;
int perm_W = 0;
int i, va;
for(i = 0; argv[3][i]; ++i)
{
if(argv[3][i] == 'u')
perm_U = 1;
else if(argv[3][i] == 'p')
perm_P = 1;
else if(argv[3][i] == 'w')
perm_W = 1;
}
for(va = ROUNDDOWN(lowerbound, PGSIZE); va <= ROUNDUP(upperbound, PGSIZE); va += PGSIZE)
{
pte_t * pte = pgdir_walk(kern_pgdir, (void *)va, 0);
if(pte)
{
if(perm_U)
*pte = (*pte) | PTE_U;
if(perm_P)
*pte = (*pte) | PTE_P;
if(perm_W)
*pte = (*pte) | PTE_W;
}
else
{
cprintf("The virtual address 0x%08x is unmapped\n", va);
}
}
}
else
{
cprintf("setperm: should give one address or an address range!\n");
}
return 0;
}

支持输入单个虚拟地址对对应的页表项进行更改,或者对一个虚拟地址区间进行修改:

1
2
setperm <va> <perm>
setperm <start_va> <end_va> <perm>

样例如下:

1
2
3
4
5
6
7
8
9
K> showmappings 0xf0000000 0xf0001000
Virtual Address Physical Address urw
0xf0000000 0x00000000 -rw
0xf0001000 0x00001000 -rw
K> setperm 0xf0000000 u
K> showmappings 0xf0000000 0xf0001000
Virtual Address Physical Address urw
0xf0000000 0x00000000 urw
0xf0001000 0x00001000 -rw

可以发现确实对权限位进行了设置。

clearperm

对权限位进行清空,整体框架和setperm相同,只需要将修改部分的代码改成:

1
2
3
4
5
6
if(perm_U)
*pte = (*pte) & (~PTE_U);
if(perm_P)
*pte = (*pte) & (~PTE_P);
if(perm_W)
*pte = (*pte) & (~PTE_W);

测试结果如下:

1
2
3
4
5
6
7
8
9
K> showmappings 0xf0000000 0xf0001000
Virtual Address Physical Address urw
0xf0000000 0x00000000 -rw
0xf0001000 0x00001000 -rw
K> clearperm 0xf0000000 w
K> showmappings 0xf0000000 0xf0001000
Virtual Address Physical Address urw
0xf0000000 0x00000000 -r-
0xf0001000 0x00001000 -rw

changeperm

对权限位进行修改,整体框架和setperm相同,只需要将修改部分的代码改成:

1
2
3
4
5
6
if(perm_U)
*pte = (*pte) ^ PTE_U;
if(perm_P)
*pte = (*pte) ^ PTE_P;
if(perm_W)
*pte = (*pte) ^ PTE_W;

测试结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
K> showmappings 0xf0000000 0xf0001000
Virtual Address Physical Address urw
0xf0000000 0x00000000 -rw
0xf0001000 0x00001000 -rw
K> changeperm 0xf0000000 w
K> showmappings 0xf0000000 0xf0001000
Virtual Address Physical Address urw
0xf0000000 0x00000000 -r-
0xf0001000 0x00001000 -rw
K> changeperm 0xf0000000 w
K> showmappings 0xf0000000 0xf0001000
Virtual Address Physical Address urw
0xf0000000 0x00000000 -rw
0xf0001000 0x00001000 -rw

content

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
int
mon_content(int argc, char **argv, struct Trapframe *tf)
{
if(argc != 4)
{
cprintf("content: type \"help\" to see the example!\n");
return 0;
}
physaddr_t pa;
if(argv[1][1] == 'v')
{
uint32_t base_va = strtol(argv[2], '\0', 16);
pte_t * pte = pgdir_walk(kern_pgdir, (void *)base_va, 0);
pa = PTE_ADDR(*pte) | PGOFF(base_va);
}
else if(argv[1][1] == 'p')
{
pa = strtol(argv[2], '\0', 16);
}
else
{
cprintf("content: -p means physical address, -v means virtual address!\n");
return 0;
}
uint32_t count = strtol(argv[3], '\0', 10);
int check;
for(check = 0; count > 0; --count, ++check)
{
if(check == 0)
cprintf("0x%08x:", pa);
cprintf(" 0x%08x", *(uint32_t*)(KADDR(pa)));
pa += 4;
if(check == 3)
{
check = -1;
cprintf("\n");
}
}
if(check)
cprintf("\n");
return 0;
}

代码如上,用来查看虚拟地址或者物理地址对应的具体内容,可以利用如下的命令形式进行查询:

1
2
content -p <pa> <number>
content -v <va> <number>

其中利用-p或者-v来表明查询的是物理地址还是虚拟地址,number表示要查询的多少,示例结果如下:

1
2
3
4
5
6
K> content -p 0x0 8
0x00000000: 0xf000ff53 0xf000ff53 0xf000e2c3 0xf000ff53
0x00000010: 0xf000ff53 0xf000ff54 0xf000ff53 0xf000ff53
K> content -v 0xf0000000 8
0x00000000: 0xf000ff53 0xf000ff53 0xf000e2c3 0xf000ff53
0x00000010: 0xf000ff53 0xf000ff54 0xf000ff53 0xf000ff53

之前可以知道,KERNBASE以上的虚拟地址映射到的是从零开始的虚拟地址,所以上面得到的结果是完全相同的。利用qemu的指令进行检查:

1
2
3
(qemu) xp /8x 0x0
0000000000000000: 0xf000ff53 0xf000ff53 0xf000e2c3 0xf000ff53
0000000000000010: 0xf000ff53 0xf000ff54 0xf000ff53 0xf000ff53

可以发现得到的结果完全相同,说明指令运行没有问题。

Challenge3&4

没写代码,感觉两个Challenge是递进的关系。challenge3可以考虑只保存包括内核自身的页目录,以及内核栈地址用来往内核栈写入参数保存信息,中断向量表等陷入内核态需要的信息。这样可能只需要几个page就足够了。陷入内核之后通过内核自身的页目录来完成地址映射进行寻址以及执行。这几个和内核相关的页面都应该是内核可读写,用户没有权限。

之后Challenge4由于Challenge3已经将内核相关的地址空间大小缩小了。如果进程想要对于这些地址进行分配的话,那么由于权限不够,会触发异常。处理的手段就是将这部分内容放到暂时还没有使用的地址,并且对相应的地址链接等内容进行修改,然后再次进行分配操作。我感觉这可能是bouncing kernel的意思,找了很久也没有找到bouncing kernel相关的资料或者论文。由于在challenge3当中把需要内核相关的内容缩小到了几个page,所以就可以大大减少需要触发弹跳机制的频率,降低为了更大地址空间所带来的额外时间开销。

Challenge5

我觉得可以考虑采用类似ICS中malloc lab里面的方法,在PageInfo里面加入前后page的链接以及这个连续页面的大小(是PGSIZE的整数倍)。之后利用first-fit或者best-fit的方式进行适配,删除的时候考虑前后的合并。这样应该就可以完成连续地址的分配。对于比较大的连续分配还可以结合Challenge1当中的大页从而节省掉二级页表的空间。

题面中所说的"power-of-two allocation unit sizes from 4KB up to some reasonable maximum of your choice."应该就是伙伴系统了。感觉要完全实现除去修改自己写的函数之外需要修改check_page_free_list()以及kern/pmap.h当中的宏以及辅助函数,不知道会不会引发什么其他地方未知的错误,没有进行代码实现。