MIT6.828 Lab4

Exercise 1

这个建立映射的函数实际上和kern/pmap.c当中的boot_alloc()存在类似的地方，都是利用一个静态的变量来保存当前分配空间的起始地方，然后不断的增长进行分配。由于base每次都会增长，所以每次都是映射新的页面。

//
// Reserve size bytes in the MMIO region and map [pa,pa+size) at this
// location.  Return the base of the reserved region.  size does *not*
// have to be multiple of PGSIZE.
//
void *
mmio_map_region(physaddr_t pa, size_t size)
{
	// Where to start the next region.  Initially, this is the
	// beginning of the MMIO region.  Because this is static, its
	// value will be preserved between calls to mmio_map_region
	// (just like nextfree in boot_alloc).
	static uintptr_t base = MMIOBASE;

	uintptr_t result;

	// Reserve size bytes of virtual memory starting at base and
	// map physical pages [pa,pa+size) to virtual addresses
	// [base,base+size).  Since this is device memory and not
	// regular DRAM, you'll have to tell the CPU that it isn't
	// safe to cache access to this memory.  Luckily, the page
	// tables provide bits for this purpose; simply create the
	// mapping with PTE_PCD|PTE_PWT (cache-disable and
	// write-through) in addition to PTE_W.  (If you're interested
	// in more details on this, see section 10.5 of IA32 volume
	// 3A.)
	//
	// Be sure to round size up to a multiple of PGSIZE and to
	// handle if this reservation would overflow MMIOLIM (it's
	// okay to simply panic if this happens).
	//
	// Hint: The staff solution uses boot_map_region.
	//
	// Your code here:
	if(base + ROUNDUP(size, PGSIZE) >= MMIOLIM)
		panic("mmio_map_region: out of memory\n");
	boot_map_region(kern_pgdir, base, size, pa, PTE_PCD | PTE_PWT | PTE_W);
	result = base;
	base += ROUNDUP(size, PGSIZE);
	return (void *)result;
}

Exercise 2

修改page_init()的内容如下，只是增加一个特殊处理：

// case 1:
pages[0].pp_ref = 1;
pages[0].pp_link = NULL;
// case 2, 3, 4:
for (i = 1; i < npages; i++) {
	if((IOPHYSMEM <= i * PGSIZE && i * PGSIZE < pa_free_start) || page2pa(pages + i) == MPENTRY_PADDR)
	{
		pages[i].pp_ref = 1;
		pages[i].pp_link = NULL;
	}
	else
	{
		pages[i].pp_ref = 0;
		pages[i].pp_link = page_free_list;
		page_free_list = &pages[i];
	}
}

只要在第6行添加一个判断就可以了。

之后执行make qemu就可以看到如下的check都已经成功了。

1
2
3

check_page_free_list() succeeded!
check_page_alloc() succeeded!
check_page() succeeded!

Question 1

宏MPBOOTPHYS的作用是将高地址转换为低地址，使得可以在实模式下进行访问。在boot.S当中，本来就已经被链接到了低地址，不需要进行转换，而mpentry.S的代码都位于KERNBASE的上方，所以需要手动的利用MPBOOTPHYS宏进行转换。

Exercise 3

这里直接利用一个循环进行映射，由于每个栈之间是存在一个Gap的，所以只有[kstacktop_i - KSTKSIZE, kstacktop_i)部分需要进行映射。

// Modify mappings in kern_pgdir to support SMP
//   - Map the per-CPU stacks in the region [KSTACKTOP-PTSIZE, KSTACKTOP)
//
static void
mem_init_mp(void)
{
	// Map per-CPU stacks starting at KSTACKTOP, for up to 'NCPU' CPUs.
	//
	// For CPU i, use the physical memory that 'percpu_kstacks[i]' refers
	// to as its kernel stack. CPU i's kernel stack grows down from virtual
	// address kstacktop_i = KSTACKTOP - i * (KSTKSIZE + KSTKGAP), and is
	// divided into two pieces, just like the single stack you set up in
	// mem_init:
	//     * [kstacktop_i - KSTKSIZE, kstacktop_i)
	//          -- backed by physical memory
	//     * [kstacktop_i - (KSTKSIZE + KSTKGAP), kstacktop_i - KSTKSIZE)
	//          -- not backed; so if the kernel overflows its stack,
	//             it will fault rather than overwrite another CPU's stack.
	//             Known as a "guard page".
	//     Permissions: kernel RW, user NONE
	//
	// LAB 4: Your code here:
	uint32_t i, kstacktop_i;
	for(i=0, kstacktop_i=KSTACKTOP;i < NCPU; ++i, kstacktop_i -= KSTKSIZE + KSTKGAP)
		boot_map_region(kern_pgdir, kstacktop_i - KSTKSIZE, KSTKSIZE, PADDR(percpu_kstacks[i]), PTE_W);
}

Exercise 4

这里实际上就是用thiscpu来替换原本的全局变量，使得本来是在lab3当中对于单CPU适用的情况可以适用于多个CPU的情形。

// Initialize and load the per-CPU TSS and IDT
void
trap_init_percpu(void)
{
	// The example code here sets up the Task State Segment (TSS) and
	// the TSS descriptor for CPU 0. But it is incorrect if we are
	// running on other CPUs because each CPU has its own kernel stack.
	// Fix the code so that it works for all CPUs.
	//
	// Hints:
	//   - The macro "thiscpu" always refers to the current CPU's
	//     struct CpuInfo;
	//   - The ID of the current CPU is given by cpunum() or
	//     thiscpu->cpu_id;
	//   - Use "thiscpu->cpu_ts" as the TSS for the current CPU,
	//     rather than the global "ts" variable;
	//   - Use gdt[(GD_TSS0 >> 3) + i] for CPU i's TSS descriptor;
	//   - You mapped the per-CPU kernel stacks in mem_init_mp()
	//   - Initialize cpu_ts.ts_iomb to prevent unauthorized environments
	//     from doing IO (0 is not the correct value!)
	//
	// ltr sets a 'busy' flag in the TSS selector, so if you
	// accidentally load the same TSS on more than one CPU, you'll
	// get a triple fault.  If you set up an individual CPU's TSS
	// wrong, you may not get a fault until you try to return from
	// user space on that CPU.
	//
	// LAB 4: Your code here:

	// Setup a TSS so that we get the right stack
	// when we trap to the kernel.
	thiscpu->cpu_ts.ts_esp0 = KSTACKTOP - cpunum() * (KSTKSIZE + KSTKGAP);
	thiscpu->cpu_ts.ts_ss0 = GD_KD;
	thiscpu->cpu_ts.ts_iomb = sizeof(struct Taskstate);

	// Initialize the TSS slot of the gdt.
	gdt[(GD_TSS0 >> 3) + cpunum()] = SEG16(STS_T32A, (uint32_t) (&(thiscpu->cpu_ts)),
					sizeof(struct Taskstate) - 1, 0);
	gdt[(GD_TSS0 >> 3) + cpunum()].sd_s = 0;

	// Load the TSS selector (like other segment selectors, the
	// bottom three bits are special; we leave them 0)
	ltr(GD_TSS0 + (cpunum() << 3));

	// Load the IDT
	lidt(&idt_pd);
}

Exercise 5

根据文档的描述在四个所需要插入大内核锁的地方进行lock_kernel()和unlock_kernel()的操作。

In i386_init(), acquire the lock before the BSP wakes up the other CPUs.

// Acquire the big kernel lock before waking up APs
// Your code here:
lock_kernel();
  
// Starting non-boot CPUs
boot_aps();

In mp_main(), acquire the lock after initializing the AP, and then call sched_yield() to start running environments on this AP.

// Now that we have finished some basic setup, call sched_yield()
// to start running processes on this CPU.  But make sure that
// only one CPU can enter the scheduler at a time!
//
// Your code here:
lock_kernel();
sched_yield();

In trap(), acquire the lock when trapped from user mode. To determine whether a trap happened in user mode or in kernel mode, check the low bits of the tf_cs.
1
2
3
4
5
6
// Trapped from user mode.
// Acquire the big kernel lock before doing any
// serious kernel work.
// LAB 4: Your code here.
lock_kernel();
assert(curenv);
In env_run(), release the lock right before switching to user mode. Do not do that too early or too late, otherwise you will experience races or deadlocks.
1
2
3
lcr3(PADDR(e->env_pgdir));
unlock_kernel();
env_pop_tf(&(e->env_tf));

Question 2

从trapentry.S当中可以看到，在调用trap()之前（还没有获得大内核锁），这个时候就已经往内核栈中压入了寄存器信息，如果内核栈不分离的话，在这个时候切换就会造成错误。

Exercise 6

这里的调度方法实际上是一个非常暴力的轮询，如果找到了一个状态是ENV_RUNNABLE的进程那么就让他上CPU。如果找了一圈都没有找到合适的进程的话，就看起始进程，如果它本来就在当前CPU上运行的话，那么就继续运行，否则的话一个进程不能在两个CPU上同时运行，就调用sched_halt()。

// Choose a user environment to run and run it.
void
sched_yield(void)
{
	struct Env *idle;

	// Implement simple round-robin scheduling.
	//
	// Search through 'envs' for an ENV_RUNNABLE environment in
	// circular fashion starting just after the env this CPU was
	// last running.  Switch to the first such environment found.
	//
	// If no envs are runnable, but the environment previously
	// running on this CPU is still ENV_RUNNING, it's okay to
	// choose that environment.
	//
	// Never choose an environment that's currently running on
	// another CPU (env_status == ENV_RUNNING). If there are
	// no runnable environments, simply drop through to the code
	// below to halt the cpu.

	// LAB 4: Your code here.
	int start_i, i;
	if(!curenv)
		start_i = 0;
	else
		start_i = curenv->env_id;
	for(i = 0; i < NENV; ++i)
		if(envs[(start_i + i)%NENV].env_status == ENV_RUNNABLE)
			env_run(&envs[(start_i + i)%NENV]);
	if(envs[start_i%NENV].env_status == ENV_RUNNING && envs[start_i%NENV].env_cpunum == cpunum())
		env_run(&envs[start_i%NENV]);
	// sched_halt never returns
	sched_halt();
}

在syscall()当中添加新的系统调用的分发：

1
2
3

case SYS_yield:
	sys_yield();
	return 0;

在mp_main()当中调用sched_yield()。

// Setup code for APs
void
mp_main(void)
{
	// We are in high EIP now, safe to switch to kern_pgdir 
	lcr3(PADDR(kern_pgdir));
	cprintf("SMP: CPU %d starting\n", cpunum());

	lapic_init();
	env_init_percpu();
	trap_init_percpu();
	xchg(&thiscpu->cpu_status, CPU_STARTED); // tell boot_aps() we're up

	// Now that we have finished some basic setup, call sched_yield()
	// to start running processes on this CPU.  But make sure that
	// only one CPU can enter the scheduler at a time!
	//
	// Your code here:
	lock_kernel();
	sched_yield();

	// Remove this after you finish Exercise 6
	for (;;);
}

Question 3

看env_run()当中对应代码部分如下：

if(curenv != NULL && curenv->env_status == ENV_RUNNING)
	curenv->env_status = ENV_RUNNABLE;
curenv = e;
e->env_status = ENV_RUNNING;
++(e->env_runs);
lcr3(PADDR(e->env_pgdir));

unlock_kernel();

env_pop_tf(&(e->env_tf));

在lcr3()前后都能够正常对e进行解引用是因为，在设置env_pgdir的时候是以kern_pgdir作为模板进行修改的，e地址在两个地址空间中是映射到同一个物理地址的，所以这里进行解引用的操作不会有问题。

Question 4

保存寄存器信息的操作发生在kern/trapentry.S当中：

.global _alltraps
_alltraps:
	pushl %ds
	pushl %es
	pushal
	pushl $GD_KD
	popl %ds
	pushl $GD_KD
	popl %es
	pushl %esp
	call trap

恢复寄存器的操作发生在kern/env.c的env_pop_tf()当中：

void
env_pop_tf(struct Trapframe *tf)
{
	// Record the CPU we are running on for user-space debugging
	curenv->env_cpunum = cpunum();

	asm volatile(
		"\tmovl %0,%%esp\n"
		"\tpopal\n"
		"\tpopl %%es\n"
		"\tpopl %%ds\n"
		"\taddl $0x8,%%esp\n" /* skip tf_trapno and tf_errcode */
		"\tiret\n"
		: : "g" (tf) : "memory");
	panic("iret failed");  /* mostly to placate the compiler */
}

Exercise 7

这里每一个系统调用的主要内容都不复杂，主要的是进行许多参数有效性的检查，只需要按照注释中的内容进行参数检查就可以。

sys_exofork()

传建一个子进程，在子进程中返回值为0，在父进程中返回的是子进程的id，先将子进程的状态设置成ENV_NOT_RUNNABLE之后再进行地址空间的复制之后可以会再设置成可运行的状态。

// Allocate a new environment.
// Returns envid of new environment, or < 0 on error.  Errors are:
//	-E_NO_FREE_ENV if no free environment is available.
//	-E_NO_MEM on memory exhaustion.
static envid_t
sys_exofork(void)
{
	// Create the new environment with env_alloc(), from kern/env.c.
	// It should be left as env_alloc created it, except that
	// status is set to ENV_NOT_RUNNABLE, and the register set is copied
	// from the current environment -- but tweaked so sys_exofork
	// will appear to return 0.

	// LAB 4: Your code here.
	struct Env* e;
	int ret;
	if((ret = env_alloc(&e, curenv->env_id)))
		return ret;
	e->env_tf = curenv->env_tf;
	e->env_tf.tf_regs.reg_eax = 0;
	e->env_status = ENV_NOT_RUNNABLE;

	return e->env_id;
}

sys_env_set_status()

就是23行处的设置env_status。但是这个系统调用只能在ENV_RUNNABLE和ENV_NOT_RUNNABLE当中设置。

// Set envid's env_status to status, which must be ENV_RUNNABLE
// or ENV_NOT_RUNNABLE.
//
// Returns 0 on success, < 0 on error.  Errors are:
//	-E_BAD_ENV if environment envid doesn't currently exist,
//		or the caller doesn't have permission to change envid.
//	-E_INVAL if status is not a valid status for an environment.
static int
sys_env_set_status(envid_t envid, int status)
{
	// Hint: Use the 'envid2env' function from kern/env.c to translate an
	// envid to a struct Env.
	// You should set envid2env's third argument to 1, which will
	// check whether the current environment has permission to set
	// envid's status.

	// LAB 4: Your code here.
	struct Env* env;
	if(envid2env(envid, &env, 1))
		return -E_BAD_ENV;
	if(status != ENV_RUNNABLE && status != ENV_NOT_RUNNABLE)
		return -E_INVAL;
	env->env_status = status;
	return 0;
}

sys_page_alloc()

在envid的地址空间中分配一个页面，除去类型检查之外所做的内容就是page_alloc()和page_insert()。

// Allocate a page of memory and map it at 'va' with permission
// 'perm' in the address space of 'envid'.
// The page's contents are set to 0.
// If a page is already mapped at 'va', that page is unmapped as a
// side effect.
//
// perm -- PTE_U | PTE_P must be set, PTE_AVAIL | PTE_W may or may not be set,
//         but no other bits may be set.  See PTE_SYSCALL in inc/mmu.h.
//
// Return 0 on success, < 0 on error.  Errors are:
//	-E_BAD_ENV if environment envid doesn't currently exist,
//		or the caller doesn't have permission to change envid.
//	-E_INVAL if va >= UTOP, or va is not page-aligned.
//	-E_INVAL if perm is inappropriate (see above).
//	-E_NO_MEM if there's no memory to allocate the new page,
//		or to allocate any necessary page tables.
static int
sys_page_alloc(envid_t envid, void *va, int perm)
{
	// Hint: This function is a wrapper around page_alloc() and
	//   page_insert() from kern/pmap.c.
	//   Most of the new code you write should be to check the
	//   parameters for correctness.
	//   If page_insert() fails, remember to free the page you
	//   allocated!

	// LAB 4: Your code here.
	struct Env* env;
	if(envid2env(envid, &env, 1))
		return -E_BAD_ENV;	
	if((uint32_t)va >= UTOP || va != ROUNDDOWN(va, PGSIZE))
		return -E_INVAL;
	if((perm & (PTE_U | PTE_P)) != (PTE_U | PTE_P) || perm & (~PTE_SYSCALL))
		return -E_INVAL;
	
	struct PageInfo * pp;
	if(!(pp = page_alloc(1)))
		return -E_NO_MEM;
	if(page_insert(env->env_pgdir, pp, va, perm))
	{
		page_free(pp);
		return -E_NO_MEM;
	}
	return 0;
}

sys_page_map()

37行之前为参数的检查，39行之后为具体执行的内容，实际上完成的就是将srcenvid对应进程的地址空间中的srcva页面映射到dstenvid对应进程的地址空间中的dstva页面。

// Map the page of memory at 'srcva' in srcenvid's address space
// at 'dstva' in dstenvid's address space with permission 'perm'.
// Perm has the same restrictions as in sys_page_alloc, except
// that it also must not grant write access to a read-only
// page.
//
// Return 0 on success, < 0 on error.  Errors are:
//	-E_BAD_ENV if srcenvid and/or dstenvid doesn't currently exist,
//		or the caller doesn't have permission to change one of them.
//	-E_INVAL if srcva >= UTOP or srcva is not page-aligned,
//		or dstva >= UTOP or dstva is not page-aligned.
//	-E_INVAL is srcva is not mapped in srcenvid's address space.
//	-E_INVAL if perm is inappropriate (see sys_page_alloc).
//	-E_INVAL if (perm & PTE_W), but srcva is read-only in srcenvid's
//		address space.
//	-E_NO_MEM if there's no memory to allocate any necessary page tables.
static int
sys_page_map(envid_t srcenvid, void *srcva,
	     envid_t dstenvid, void *dstva, int perm)
{
	// Hint: This function is a wrapper around page_lookup() and
	//   page_insert() from kern/pmap.c.
	//   Again, most of the new code you write should be to check the
	//   parameters for correctness.
	//   Use the third argument to page_lookup() to
	//   check the current permissions on the page.

	// LAB 4: Your code here.
	struct Env *srcenv, *dstenv;
	if(envid2env(srcenvid, &srcenv, 1) || envid2env(dstenvid, &dstenv, 1))
		return -E_BAD_ENV;	
	if((uint32_t)srcva >= UTOP || srcva != ROUNDDOWN(srcva, PGSIZE))
		return -E_INVAL;
	if((uint32_t)dstva >= UTOP || dstva != ROUNDDOWN(dstva, PGSIZE))
		return -E_INVAL;
	if((perm & (PTE_U | PTE_P)) != (PTE_U | PTE_P) || perm & (~PTE_SYSCALL))
		return -E_INVAL;

	pte_t *pte;
	struct PageInfo *pp;
	if(!(pp = page_lookup(srcenv->env_pgdir, srcva, &pte)))
		return -E_INVAL;
	if((((*pte) & PTE_W) == 0) && (perm & PTE_W))
		return -E_INVAL;
	return page_insert(dstenv->env_pgdir, pp, dstva, perm);
}

sys_page_unmap()

实际上就是19行处的page_remove()操作，剩下的是参数的有效性检查。

// Unmap the page of memory at 'va' in the address space of 'envid'.
// If no page is mapped, the function silently succeeds.
//
// Return 0 on success, < 0 on error.  Errors are:
//	-E_BAD_ENV if environment envid doesn't currently exist,
//		or the caller doesn't have permission to change envid.
//	-E_INVAL if va >= UTOP, or va is not page-aligned.
static int
sys_page_unmap(envid_t envid, void *va)
{
	// Hint: This function is a wrapper around page_remove().

	// LAB 4: Your code here.
	struct Env* env;
	if(envid2env(envid, &env, 1))
		return -E_BAD_ENV;	
	if((uint32_t)va >= UTOP || va != ROUNDDOWN(va, PGSIZE))
		return -E_INVAL;
	page_remove(env->env_pgdir, va);
	return 0;
}

最后要在syscall()当中添加分发的方法：

case SYS_exofork:
	return sys_exofork();

case SYS_env_set_status:
	return sys_env_set_status((envid_t)a1, (int)a2);

case SYS_page_alloc:
	return sys_page_alloc((envid_t)a1, (void *)a2, (int) a3);

case SYS_page_map:
	return sys_page_map((envid_t)a1, (void *)a2, (envid_t)a3, (void *)a4, (int)a5);

case SYS_page_unmap:
	return sys_page_unmap((envid_t)a1, (void *)a2);

Exercise 8

又是一个系统调用的设置，当使用envid2env()的时候需要进行权限的检查，如果能够正常的得到env的话就设置对应的env_pgfault_upcall。同样要在syscall()当中添加新的case。

// Set the page fault upcall for 'envid' by modifying the corresponding struct
// Env's 'env_pgfault_upcall' field.  When 'envid' causes a page fault, the
// kernel will push a fault record onto the exception stack, then branch to
// 'func'.
//
// Returns 0 on success, < 0 on error.  Errors are:
//	-E_BAD_ENV if environment envid doesn't currently exist,
//		or the caller doesn't have permission to change envid.
static int
sys_env_set_pgfault_upcall(envid_t envid, void *func)
{
	// LAB 4: Your code here.
	struct Env* env;
	if(envid2env(envid, &env, 1))
		return -E_BAD_ENV;
	env->env_pgfault_upcall = func;
	return 0;
	//panic("sys_env_set_pgfault_upcall not implemented");
}

Exercise 9

这里关于page_fault_handler()在有env_pgfault_upcall的情况下，分为两种情况，如果本身在Exception Stack里面的话，那么需要空出一个word的大小，具体的作用在后面Exercise 10会体现。否则的话直接将结构体压在Exception Stack的底部就可以了。

// LAB 4: Your code here.
if(curenv->env_pgfault_upcall){
	struct UTrapframe * utf;
	if(ROUNDDOWN(tf->tf_esp, PGSIZE) == UXSTACKTOP - PGSIZE)
		utf = (struct UTrapframe *)((tf->tf_esp) - sizeof(struct UTrapframe) - 4);
	else
		utf = (struct UTrapframe *)(UXSTACKTOP - sizeof(struct UTrapframe));
	user_mem_assert(curenv, (void *)utf, sizeof(struct UTrapframe), PTE_W);
	utf->utf_fault_va = fault_va;
	utf->utf_err = tf->tf_err;
	utf->utf_regs = tf->tf_regs;
	utf->utf_eip = tf->tf_eip;
	utf->utf_eflags = tf->tf_eflags;
	utf->utf_esp = tf->tf_esp;
	curenv->env_tf.tf_eip = (uintptr_t)curenv->env_pgfault_upcall;
	curenv->env_tf.tf_esp = (uintptr_t)utf;
	env_run(curenv);
}
else{
	// Destroy the environment that caused the fault.
	cprintf("[%08x] user fault va %08x ip %08x\n",
		curenv->env_id, fault_va, tf->tf_eip);
	print_trapframe(tf);
	env_destroy(curenv);
}

第8行为写入的权限检查，之后9-14行为struct UTrapframe整个结构体的压入，然后修改curenv里面的内容，转入env_pgfault_upcall当中执行。

如果没有env_pgfault_upcall的话，那么就执行env_destroy()的操作。

Exercise 10

Exception Stack中的结构如下所示：

//	trap-time esp
//	trap-time eflags
//	trap-time eip
//	utf_regs.reg_eax
//	...
//	utf_regs.reg_esi
//	utf_regs.reg_edi
//	utf_err (error code)
//	utf_fault_va            <-- %esp

补全的_pgfault_upcall的代码如下：

// LAB 4: Your code here.
movl 0x28(%esp), %edi
movl 0x30(%esp), %esi
subl $4, %esi
movl %edi, (%esi)
movl %esi, 0x30(%esp)

// Restore the trap-time registers.  After you do this, you
// can no longer modify any general-purpose registers.
// LAB 4: Your code here.
addl $8, %esp
popal

// Restore eflags from the stack.  After you do this, you can
// no longer use arithmetic operations or anything else that
// modifies eflags.
// LAB 4: Your code here.
addl $4, %esp
popfl

// Switch back to the adjusted trap-time stack.
// LAB 4: Your code here.
popl %esp

// Return to re-execute the instruction that faulted.
// LAB 4: Your code here.
ret

这里要实现栈切换同时需要保存%eip，首先在2、3行，将%eip取出放入%edi中，%esp取出放入%esi中，之后将%esp向下延伸一个word的大小，然后把%eip填入，之后将修改后的%esp放回保存的位置。

这样最终得到的%esp所指向的栈顶第一个元素就是我们之前所保存的%eip寄存器的值，就同时完成了栈的切换和%eip的恢复。后面就是不断退栈恢复寄存器的过程了，非常简单。

这里如果是在Exception Stack当中的重复调用，由于之前确保重复调用会在每两个结构之间留下一个word大小的gap，这个空隙就可以填入%eip保证以上的upcall在重复调用的情况下也能正常工作。

Exercise 11

如果是第一次进行调用的话，那么需要进行初始化的设置，即给Exception Stack分配空间（17行），同时设置pgfault_upcall（19行）。

//
// Set the page fault handler function.
// If there isn't one yet, _pgfault_handler will be 0.
// The first time we register a handler, we need to
// allocate an exception stack (one page of memory with its top
// at UXSTACKTOP), and tell the kernel to call the assembly-language
// _pgfault_upcall routine when a page fault occurs.
//
void
set_pgfault_handler(void (*handler)(struct UTrapframe *utf))
{
	int r;

	if (_pgfault_handler == 0) {
		// First time through!
		// LAB 4: Your code here.
		if(sys_page_alloc(0, (void *)(UXSTACKTOP - PGSIZE), PTE_U | PTE_P | PTE_W))
			panic("set_pgfault_handler: page alloc fault!");
		if(sys_env_set_pgfault_upcall(0, (void *)_pgfault_upcall))
			panic("set_pgfault handler: set pgfault upcall failed!");
	}
	// Save handler pointer for assembly to call.
	_pgfault_handler = handler;
}

Exercise 12

pgfault() 可以参照dumbfork.c里面的duppage()，事实上dumbfork就是全部都进行一个复制，而COW的fork()只有在写入写时复制页面的时候才会进行复制，所以这里首先进行一个检查，看是不是写入一个COW页面所产生的错误。如果是的话，就分配一个新的页面并且将整个页面的内容拷贝一份，这里如注释中所写明的利用三次系统调用实现。

//
// Custom page fault handler - if faulting page is copy-on-write,
// map in our own private writable copy.
//
static void
pgfault(struct UTrapframe *utf)
{
	void *addr = (void *) utf->utf_fault_va;
	uint32_t err = utf->utf_err;
	int r;

	// Check that the faulting access was (1) a write, and (2) to a
	// copy-on-write page.  If not, panic.
	// Hint:
	//   Use the read-only page table mappings at uvpt
	//   (see <inc/memlayout.h>).

	// LAB 4: Your code here.
	if(!((err & FEC_WR) && (uvpt[PGNUM(addr)] & PTE_COW)))
		panic("pgfault: 0x%08x the fault page is not writable or copy-on-write page!", addr);

	// Allocate a new page, map it at a temporary location (PFTEMP),
	// copy the data from the old page to the new page, then move the new
	// page to the old page's address.
	// Hint:
	//   You should make three system calls.

	// LAB 4: Your code here.
	addr = ROUNDDOWN(addr, PGSIZE);
	if((r = sys_page_alloc(0, PFTEMP, PTE_P|PTE_U|PTE_W)) < 0)
		panic("pgfault: sys_page_alloc fail, %e", r);
	memmove(PFTEMP, addr, PGSIZE);
	if ((r = sys_page_map(0, PFTEMP, 0, addr, PTE_P|PTE_U|PTE_W)) < 0)
		panic("pgfault: sys_page_map, %e", r);
	if ((r = sys_page_unmap(0, PFTEMP)) < 0)
		panic("pgfault: sys_page_unmap, %e", r);
}

这里duppage()的实现就是按照注释中的内容进行，首先判断原本的页面是不是writable或者COW的，如果是的话那么就将其perm设置成写时复制的。之后现在子进程的地址空间中进行映射，再在父进程的地址空间中进行映射。

//
// Map our virtual page pn (address pn*PGSIZE) into the target envid
// at the same virtual address.  If the page is writable or copy-on-write,
// the new mapping must be created copy-on-write, and then our mapping must be
// marked copy-on-write as well.  (Exercise: Why do we need to mark ours
// copy-on-write again if it was already copy-on-write at the beginning of
// this function?)
//
// Returns: 0 on success, < 0 on error.
// It is also OK to panic on error.
//
static int
duppage(envid_t envid, unsigned pn)
{
	int r;

	// LAB 4: Your code here.
	pte_t pte = uvpt[pn];
	void * addr = (void *)(pn * PGSIZE);

	uint32_t perm = pte & 0xFFF;
	if(perm & (PTE_W | PTE_COW)){
		perm &= ~PTE_W;
		perm |= PTE_COW;
	}
	if((r = sys_page_map(0, addr, envid, addr, perm & PTE_SYSCALL))<0)
		panic("duppage: %e", r);
	if((r = sys_page_map(0, addr, 0, addr, perm & PTE_SYSCALL))<0)
		panic("duppage: %e", r);
	return 0;
}

fork()函数可以参照dumbfork的主体部分，由于只要赋值UTOP以下的地址空间，而Exception Stack是另外进行分配的，所以采用COW的复制方式到USTACKTOP就为止了。

//
// User-level fork with copy-on-write.
// Set up our page fault handler appropriately.
// Create a child.
// Copy our address space and page fault handler setup to the child.
// Then mark the child as runnable and return.
//
// Returns: child's envid to the parent, 0 to the child, < 0 on error.
// It is also OK to panic on error.
//
// Hint:
//   Use uvpd, uvpt, and duppage.
//   Remember to fix "thisenv" in the child process.
//   Neither user exception stack should ever be marked copy-on-write,
//   so you must allocate a new page for the child's user exception stack.
//
envid_t
fork(void)
{
	// LAB 4: Your code here.
	int r;
	envid_t envid;
	uint8_t * addr;
	set_pgfault_handler(pgfault);
	envid = sys_exofork();
	if(envid < 0)
		panic("fork: sys_exofork failed!");
	if(envid == 0){
		thisenv = &envs[ENVX(sys_getenvid())];
		return 0;
	}

	for(addr = (uint8_t *)UTEXT; addr <(uint8_t *)USTACKTOP; addr += PGSIZE)
		if((uvpd[PDX(addr)] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_P))
			duppage(envid, PGNUM(addr));

	if((r = sys_page_alloc(envid, (void *)(UXSTACKTOP-PGSIZE), PTE_W|PTE_P|PTE_U))<0)
		panic("fork: sys_page_alloc failed, %e", r);

	extern void _pgfault_upcall();
	if((r = sys_env_set_pgfault_upcall(envid, _pgfault_upcall)))
		panic("fork: sys_env_set_pgfault_upcall failed, %e", r);

	if((r = sys_env_set_status(envid, ENV_RUNNABLE))<0)
		panic("fork: sys_env_set_status failed, %e", r);
	
	return envid;
}

Exercise 13

kern/trapentry.S和kern/trap.c当中由于我是用的是lab3里面challenge所描述的循环写法，这里并不需要做修改。

在kern/env.c的env_alloc()函数中设定EFLAG

1
2
3

// Enable interrupts while in user mode.
// LAB 4: Your code here.
e->env_tf.tf_eflags |= FL_IF;

在sched_halt()当中所需要注意的就是取消掉sti的注释，设置IF位使得空闲CPU并不会屏蔽中断。

asm volatile (
	"movl $0, %%ebp\n"
	"movl %0, %%esp\n"
	"pushl $0\n"
	"pushl $0\n"
	// Uncomment the following line after completing exercise 13
	"sti\n"
	"1:\n"
	"hlt\n"
	"jmp 1b\n"
: : "a" (thiscpu->cpu_ts.ts_esp0));

Exercise 14

只需要在trap_dispatch()当中添加分发的分支即可，这里需要按照注释内容在进行sched_yield()之前调用lapic_eoi()来确认中断。

// Handle clock interrupts. Don't forget to acknowledge the
// interrupt using lapic_eoi() before calling the scheduler!
// LAB 4: Your code here.
if(tf->tf_trapno == IRQ_OFFSET + IRQ_TIMER){
	lapic_eoi();
	sched_yield();
	return;
}

Exercise 15

sys_ipc_recv()当中主要做的操作就是首先进行参数的检查，检查完了之后将其填入env当中，并且让出CPU等待发送消息的进程将其重新设置为RUNNABLE。

// Block until a value is ready.  Record that you want to receive
// using the env_ipc_recving and env_ipc_dstva fields of struct Env,
// mark yourself not runnable, and then give up the CPU.
//
// If 'dstva' is < UTOP, then you are willing to receive a page of data.
// 'dstva' is the virtual address at which the sent page should be mapped.
//
// This function only returns on error, but the system call will eventually
// return 0 on success.
// Return < 0 on error.  Errors are:
//	-E_INVAL if dstva < UTOP but dstva is not page-aligned.
static int
sys_ipc_recv(void *dstva)
{
	// LAB 4: Your code here.
	struct Env * env;
	if(envid2env(0, &env, 0))
		return -E_BAD_ENV;
	if((uint32_t)dstva < UTOP && (dstva != ROUNDDOWN(dstva, PGSIZE)))
		return -E_INVAL;

	env->env_ipc_dstva = dstva;
	env->env_ipc_recving = true;
	env->env_status = ENV_NOT_RUNNABLE;
	sys_yield();

	return 0;
}

sys_ipc_try_send()的操作主要是对于注释里面所提到的所有可能的错误情形进行检查，当srcva < UTOP的时候，和sys_page_map()当中的处理非常相似。在最终修改接收方env里面对应的值，并且将返回值设置成0。

// Try to send 'value' to the target env 'envid'.
// If srcva < UTOP, then also send page currently mapped at 'srcva',
// so that receiver gets a duplicate mapping of the same page.
//
// The send fails with a return value of -E_IPC_NOT_RECV if the
// target is not blocked, waiting for an IPC.
//
// The send also can fail for the other reasons listed below.
//
// Otherwise, the send succeeds, and the target's ipc fields are
// updated as follows:
//    env_ipc_recving is set to 0 to block future sends;
//    env_ipc_from is set to the sending envid;
//    env_ipc_value is set to the 'value' parameter;
//    env_ipc_perm is set to 'perm' if a page was transferred, 0 otherwise.
// The target environment is marked runnable again, returning 0
// from the paused sys_ipc_recv system call.  (Hint: does the
// sys_ipc_recv function ever actually return?)
//
// If the sender wants to send a page but the receiver isn't asking for one,
// then no page mapping is transferred, but no error occurs.
// The ipc only happens when no errors occur.
//
// Returns 0 on success, < 0 on error.
// Errors are:
//	-E_BAD_ENV if environment envid doesn't currently exist.
//		(No need to check permissions.)
//	-E_IPC_NOT_RECV if envid is not currently blocked in sys_ipc_recv,
//		or another environment managed to send first.
//	-E_INVAL if srcva < UTOP but srcva is not page-aligned.
//	-E_INVAL if srcva < UTOP and perm is inappropriate
//		(see sys_page_alloc).
//	-E_INVAL if srcva < UTOP but srcva is not mapped in the caller's
//		address space.
//	-E_INVAL if (perm & PTE_W), but srcva is read-only in the
//		current environment's address space.
//	-E_NO_MEM if there's not enough memory to map srcva in envid's
//		address space.
static int
sys_ipc_try_send(envid_t envid, uint32_t value, void *srcva, unsigned perm)
{
	// LAB 4: Your code here.
	struct Env* dstenv, * srcenv;
	if(envid2env(envid, &dstenv, 0) || envid2env(0, &srcenv, 0))
		return -E_BAD_ENV;
	if(!dstenv->env_ipc_recving)
		return -E_IPC_NOT_RECV;

	dstenv->env_ipc_perm = 0;

	if((uint32_t)srcva < UTOP){
		pte_t *pte;
		struct PageInfo *pp;
		if(srcva != ROUNDDOWN(srcva, PGSIZE))
			return -E_INVAL;
		if((perm & (PTE_U | PTE_P)) != (PTE_U | PTE_P) || perm & (~PTE_SYSCALL))
			return -E_INVAL;
		if(!(pp = page_lookup(srcenv->env_pgdir, srcva, &pte)))
			return -E_INVAL;
		if((((*pte) & PTE_W) == 0) && (perm & PTE_W))
			return -E_INVAL;
		if(page_insert(dstenv->env_pgdir, pp, dstenv->env_ipc_dstva, perm))
			return -E_NO_MEM;
		dstenv->env_ipc_perm = perm;
	}

	dstenv->env_ipc_recving = false;
	dstenv->env_ipc_value = value;
	dstenv->env_ipc_from = srcenv->env_id;
	dstenv->env_status = ENV_RUNNABLE;
	dstenv->env_tf.tf_regs.reg_eax = 0;

	return 0;
}

在lib/ipc.c当中要提供用户态可用的进行send和recv操作的接口。两个函数的相同之处在于如果没有传递地址映射的话，那么要讲地址设置成一个UTOP上方的值。

这里ipc_recv()只要根据返回值r进行两种情况的区分即可：

// Receive a value via IPC and return it.
// If 'pg' is nonnull, then any page sent by the sender will be mapped at
//	that address.
// If 'from_env_store' is nonnull, then store the IPC sender's envid in
//	*from_env_store.
// If 'perm_store' is nonnull, then store the IPC sender's page permission
//	in *perm_store (this is nonzero iff a page was successfully
//	transferred to 'pg').
// If the system call fails, then store 0 in *fromenv and *perm (if
//	they're nonnull) and return the error.
// Otherwise, return the value sent by the sender
//
// Hint:
//   Use 'thisenv' to discover the value and who sent it.
//   If 'pg' is null, pass sys_ipc_recv a value that it will understand
//   as meaning "no page".  (Zero is not the right value, since that's
//   a perfectly valid place to map a page.)
int32_t
ipc_recv(envid_t *from_env_store, void *pg, int *perm_store)
{
	// LAB 4: Your code here.
	int r;

	r = sys_ipc_recv(pg ? pg : (void *)UTOP);
	if(r){
		if(from_env_store)
			*from_env_store = 0;
		if(perm_store)
			*perm_store = 0;
		return r;
	}
	else{
		if(from_env_store)
			*from_env_store = thisenv->env_ipc_from;
		if(perm_store)
			*perm_store = thisenv->env_ipc_perm;
		return thisenv->env_ipc_value;
	}
	return 0;
}

而对于ipc_send()则是通过一个循环来不断地尝试发送信息，为了防止一直占用CPU，每次循环中都会调用sys_yield()主动让出。

// Send 'val' (and 'pg' with 'perm', if 'pg' is nonnull) to 'toenv'.
// This function keeps trying until it succeeds.
// It should panic() on any error other than -E_IPC_NOT_RECV.
//
// Hint:
//   Use sys_yield() to be CPU-friendly.
//   If 'pg' is null, pass sys_ipc_try_send a value that it will understand
//   as meaning "no page".  (Zero is not the right value.)
void
ipc_send(envid_t to_env, uint32_t val, void *pg, int perm)
{
	// LAB 4: Your code here.
	int r;
	do{
		sys_yield();
		r = sys_ipc_try_send(to_env, val, pg ? pg : (void *)UTOP, perm);
		if(r != 0 && r != -E_IPC_NOT_RECV)
			panic("ipc_send: faild, %e", r);
	}while(r);
}

实现完了之后lab4的基础内容就已经结束了，执行make grade可以得到如下的输出：

spin: OK (1.8s) 
stresssched: OK (3.2s) 
sendpage: OK (0.9s) 
    (Old jos.out.sendpage failure log removed)
pingpong: OK (1.9s) 
    (Old jos.out.pingpong failure log removed)
primes: OK (9.1s) 
    (Old jos.out.primes failure log removed)
Part C score: 25/25

Score: 80/80

看到三部分都可以拿到全部分数。

Challenge 6: sfork()

这一个challenge所要完成的是一个共享除了栈之外所有的地址空间的fork操作，记为sfork()。

首先实现了一个sduppage()函数，所做的是将父进程的地址映射给复制到子进程上，对于权限并不做修改，可以看做只是在sys_page_map()的基础上的封装。

static int
sduppage(envid_t envid, unsigned pn)
{
	int r;

	pte_t pte = uvpt[pn];
	void * addr = (void *)(pn * PGSIZE);

	uint32_t perm = pte & 0xFFF;
	if((r = sys_page_map(0, addr, envid, addr, perm & PTE_SYSCALL))<0)
		panic("sduppage: %e", r);
	return 0;
}

之后就是sfork()函数的实现，代码如下：

int
sfork(void)
{
	int r;
	envid_t envid;
	uint8_t * addr;
	set_pgfault_handler(pgfault);
	envid = sys_exofork();
	if(envid < 0)
		panic("fork: sys_exofork failed!");
	if(envid == 0){
		thisenv = &envs[ENVX(sys_getenvid())];
		return 0;
	}

	bool in_stack = true;
	for(addr = (uint8_t *)(USTACKTOP - PGSIZE); addr >= (uint8_t *)UTEXT; addr -= PGSIZE){
		if((uvpd[PDX(addr)] & PTE_P) && (uvpt[PGNUM(addr)] & PTE_P)){
			if(in_stack)
				duppage(envid, PGNUM(addr));
			else
				sduppage(envid, PGNUM(addr));
		}
		else
			in_stack = false;
	}

	if((r = sys_page_alloc(envid, (void *)(UXSTACKTOP-PGSIZE), PTE_W|PTE_P|PTE_U))<0)
		panic("fork: sys_page_alloc failed, %e", r);

	extern void _pgfault_upcall();
	if((r = sys_env_set_pgfault_upcall(envid, _pgfault_upcall)))
		panic("fork: sys_env_set_pgfault_upcall failed, %e", r);

	if((r = sys_env_set_status(envid, ENV_RUNNABLE))<0)
		panic("fork: sys_env_set_status failed, %e", r);
	
	return envid;
}

可以看到与之前所实现的fork()函数的主体相对比，区别只存在16-26行。这里从栈底往下来进行duppage的操作，当超过栈顶之后，in_stack会被设置成false，之后就是共享的地址空间，全部调用sduppage()。

到这里都是没有什么问题的，重点在于如何将thisenv这个全局变量能够使得不同的进程都能够得到其自身对应的Env结构体，否则的话，在sfork()的过程中，子进程会修改thisenv的指针。导致无论在父进程还是子进程当中，thisenv指向的都是子进程！

最为简单的想法就是通过sys_getenvid()系统调用得到envid，之后查找对应的Env结构体，由于两个进程共享地址空间，所以利用全局变量是不太方便的，一个简单方法是利用宏进行实现。

考虑第一种解决方案：

1	#define thisenv ((const volatile struct Env *)(&envs[ENVX(sys_getenvid())]))

这种写法可以完成需求，但是他是一个地址，而在libmain()当中以及fork()当中都有对于thisenv进行初始化的操作，这样需要进行额外的代码修改。

第二种解决方案：

1 2	extern const volatile struct Env realenv; #define thisenv (realenv = (const volatile struct Env )(&envs[ENVX(sys_getenvid())])), realenv

利用逗号进行分隔，首先进行一个赋值操作，然后提供一个可以作为运算左值的对象，问题在于thisenv会被用作是cprintf()当中的参数，而逗号分隔会使得参数数量改变。

第三种解决方案：

1 2	extern const volatile struct Env realenv; #define thisenv ((const volatile struct Env )((realenv = (const volatile struct Env )(&envs[ENVX(sys_getenvid())])), &realenv))

由于C中的逗号表达式以及赋值表达式所返回的都是值而不是对象，所以用先取地址再解引用的方式可以获得一个能作为运算左值的对象。这种方式理论上是没有问题的，但是由于当中会进行赋值操作，所以编译器会认为可能会导致结果出现偏差，会报warning。编译方式将warning视作error，所以这行不通。

最终采用的解决方案为利用一个新的指针数组存下所有Env结构体的地址，然后采用类似第一种解决方案的操作，不过得到的是一个可以作为赋值左值的对象。在inc/lib.c当中，添加关于penvs指针数组的声明，以及将thisenv作为一个宏进行声明。

extern const volatile struct Env *penvs[NENV];
extern const volatile struct Env envs[NENV];
extern const volatile struct PageInfo pages[];
# define thisenv penvs[ENVX(sys_getenvid())]

在lib/libmain.c当中声明penvs数组，并将其初始化。

const volatile struct Env * penvs[NENV];

//extern const volatile struct Env *thisenv;

void
libmain(int argc, char **argv)
{
	int i;
	for(i = 0; i < NENV; ++i)
		penvs[i] = &envs[i];

在这样的操作下thisenv就可以完美兼容所有代码当中的情况了，不需要修改其他任何的实现。

执行pingpongs.c可以得到如下的输出：

enabled interrupts: 1 2
[00000000] new env 00001000
[00001000] new env 00001001
i am 00001000; thisenv is 0xeec00000
send 0 from 1000 to 1001
1001 got 0 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 1 from 1001 (thisenv is 0xeec00000 1000)
1001 got 2 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 3 from 1001 (thisenv is 0xeec00000 1000)
1001 got 4 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 5 from 1001 (thisenv is 0xeec00000 1000)
1001 got 6 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 7 from 1001 (thisenv is 0xeec00000 1000)
1001 got 8 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 9 from 1001 (thisenv is 0xeec00000 1000)
[00001000] exiting gracefully
[00001000] free env 00001000
1001 got 10 from 1000 (thisenv is 0xeec0007c 1001)
[00001001] exiting gracefully
[00001001] free env 00001001

可以发现实际上两个进程确实是共享了地址空间，并且thisenv能够正确的指向进程自身了。

如果将其中的sfork()修改成fork()的话，得到的输出如下：

enabled interrupts: 1 2
[00000000] new env 00001000
[00001000] new env 00001001
i am 00001000; thisenv is 0xeec00000
send 0 from 1000 to 1001
1001 got 0 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 0 from 1001 (thisenv is 0xeec00000 1000)
1001 got 1 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 1 from 1001 (thisenv is 0xeec00000 1000)
1001 got 2 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 2 from 1001 (thisenv is 0xeec00000 1000)
1001 got 3 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 3 from 1001 (thisenv is 0xeec00000 1000)
1001 got 4 from 1000 (thisenv is 0xeec0007c 1001)
1000 got 4 from 1001 (thisenv is 0xeec00000 1000)
...

不同进程中的val值是不会共享的，综上测试可以说明sfork()的实现没有问题。