Meet Casper

Casper was my colleague in Red Hat, we haven’t met each other for about 4 years. We knew same friends in Opensource communities before he join in Red Hat. He is working for Aliyun on system software area.

They want to cooperate with Universities, share their cool technology, provide guide and job position (intern or regular) for the students.

They investigated and used machine-learning in system software, one success usecase is key-value search, it’s quicker than hash table, but only for read most situation, that’s popular in their production environment.

Ten years of KVM

This article was contributed by Amit Shah

We recently celebrated 25 years of the Linux project. KVM, or Kernel-based Virtual Machine, a part of the Linux kernel, celebrated its 10th anniversary in October. KVM was first announced on 19 October 2006 by its creator, Avi Kivity, in this post to the Linux kernel mailing list.

That first version of the KVM patch set had support for the VMX instructions found in Intel CPUs that were just being introduced around the time of the announcement. Support for AMD’s SVM instructions followed soon after. The KVM patch set was merged in the upstream kernel in December 2006, and was releasedas part of the 2.6.20 kernel in February 2007.


Running multiple guest operating systems on the x86 architecture was quite difficult without the new virtualization extensions: there are instructions that can only be executed from the highest privilege level, ring 0, and such access could not be given to each operating system without it also affecting the operation of the other OSes on the system. Additionally, some instructions do not cause a trap when executed at a lower privilege level — despite them requiring a higher privilege level to function correctly — so running a “hypervisor” that ran in ring 0, while running other OSes in lower-privileged rings was also not a solution.

The VMX and SVM instructions introduced a new ring, ring -1, to the x86 architecture. This is the privilege level where the virtual machine monitor (VMM), or the hypervisor, runs. This VMM arbitrates access to the hardware for the various operating systems so that they can continue running normally in the regular x86 environment.

There are several reasons to run multiple operating systems on one hardware system: deployment and management of OSes becomes easier with tools that can provision virtual machines (VMs). It also leads to lower power and cooling costs by hosting multiple OSes and their corresponding applications and services to run on newer, more capable hardware. Moreover, running legacy operating systems and applications on newer hardware without any changes to adapt to the newer hardware now becomes possible by emulating older hardware via the hypervisor.

The functionality of KVM itself is divided in multiple parts. The generic host kernel KVM module, which exposes the architecture-independent functionality of KVM; the architecture-specific kernel module in the host system; the user-space part that emulates the virtual machine hardware that the guest operating system runs on; and optional guest additions that make the guest perform better on virtualized systems.

At the time KVM was introduced, Xen was the de facto open source hypervisor. Since Xen was introduced before the virtualization extensions were available on x86, it had to use a different design. First, it needed to run a modified guest kernel in order to boot virtual machines. Second, Xen took over the the role of the host kernel, relegating Linux to only manage I/O devices as part of Xen’s special “Dom0” virtual machine. This meant that the system couldn’t truly be called a Linux system — even the guest operating systems were modified Linux kernels with (at the time) non-upstream code.

Kivity started KVM development while working at Israeli startup Qumranet to fix issues with the Xen-related work the company was doing. The original Qumranet product idea was to replicate machine state across two different VMs to achieve fault tolerance. It was soon apparent to the engineers at Qumranet that Xen was too limiting and a poor model for their needs. The virtualization extensions were about to be introduced in AMD and Intel CPUs, so Kivity started a side-project, KVM, that was based on the new hardware virtualization specifications and would be used as the hypervisor for the fault-tolerance solution.

Development model

Since the beginning, Kivity wrote the code with upstreaming it in mind. One of the goals of the KVM model was as much reuse of existing functionality as possible: using Linux to do most of the work, with KVM just being a driver that handled the new virtualization instructions exposed by hardware. This enabled KVM to gain any new features that Linux developers added to the other parts of the system, such as improvements in the CPU scheduler, memory management, power management, and so on.

This model worked well for the rest of the Linux ecosystem as well. Features that started their life with only virtualization in mind began being useful and widely-adopted in general use cases as well, like transparent huge pages. There weren’t two separate communities for the OS and for the VMM; everyone worked as part of one project.

Also, management of the VMs would be easier as each VM could be monitored as a regular process — tools like top and ps worked out of the box. These days, perf can be used to monitor guest activity from the host and identify bottlenecks, if any. Further chipset improvements will also enable guest process perf measurement from the host.

The other side of KVM was in user space, where the machine that is presented to the guest OS is built. kvm-userspace was a fork of the QEMU project. QEMU is a machine emulator — it can run unmodified OS images for a variety of architectures that it supports, and emulate those architecture’s instructions for the host architecture it runs on. This is of course very slow, but the advantage of the QEMU project was that it had quite a few devices already emulated for the x86 architecture — such as the chipset, network cards, display adapters, and so on.

What kvm-userspace did was short-circuit the emulation code to only allow x86-on-x86 and use the KVM API for actually running the guest OS on the host CPU. When the guest OS performs a privileged operation, the CPU will exit to the VMM code. KVM takes over; if it can service the request itself, it would do so, and give control back to the guest. This was a “lightweight exit”. For requests that the KVM code can’t serve, like any device emulation, it would defer to QEMU. This implied exiting to user space from the host Linux kernel, and hence this was called a “heavyweight exit”.

One of the drawbacks in this model was the maintenance of the fork of QEMU. The early focus of the developers was on stabilizing the kernel module, and getting more and more guests to work without a hitch. That meant much less developer time was spent on the device emulation code, and hence the work to redo the hacks to make them suitable for upstream remained at a lower priority.

Xen too used a fork of QEMU for its device emulation in its HVM mode (the mode where Xen used the new hardware virtualization instructions). In addition, QEMU had its own non-upstream Linux kernel accelerator module (KQEMU) for x86-on-x86 that eliminated the emulation layer, making x86 guests run faster on x86 hardware. Integrating all of this required a maintainer who would understand the various needs from all the projects. Anthony Liguori stepped up as a maintainer of the QEMU project, and he had the trust of the Xen and KVM communities. Over time, in small bits, the forks were eliminated, and now KVM as well as Xen use upstream QEMU for their device model emulation.

The “do one thing, do it right” mantra, along with “everything is a file”, was exploited to the fullest. The KVM API allows one to create VMs — or, alternatively, sandboxes — on a Linux system. These can then run operating systems inside them, or just about any code that will not interfere with the running system. This also means that there are other user-space implementations that are not as heavyweight or as featureful as QEMU. Tools that can quickly boot into small applications or specialized OSes with a KVM VM started showing up — with kvmtool being the most popular one.

Developer Interest

Since the original announcement of the KVM project, many hackers were interested in exploring KVM. It helped that hacking on KVM was very convenient: a system reboot wasn’t required to install a new VMM. It was as simple as re-compiling the KVM modules, removing the older modules, and loading the newly-compiled ones. This helped immensely during the early stabilization and improvement phases. Debugging was a much faster process, and developers much preferred this way of working, as contrasted with compiling a new VMM, installing it, updating the boot loader, and rebooting the system. Another advantage, perhaps of lower importance on development systems but nonetheless essential for my work-and-development laptop, was that root permissions were not required to run a virtual machine.

Another handy debugging trick that was made possible by the separation of the KVM module and QEMU was that if something didn’t work in KVM mode, but worked in emulated mode, the fault was very likely in the KVM module. If some guest didn’t work in either of the modes, the fault was in the device model or QEMU.

The early KVM release model helped with a painless development experience as well: even though the KVM project was part of the upstream Linux kernel, Kivity maintained the KVM code on a separate release train. A new KVM release was made regularly that included the source of the KVM modules, a small compatibility layer to compile the KVM modules on any of the supported Linux kernels, and the kvm-userspace piece. This ensured that a distribution kernel, which had an older version of the KVM modules, could be used unchanged by compiling the modules from the newest KVM release for that kernel.

The compatibility layer required some effort to maintain. It needed to ensure that the new KVM code that used newer kernel APIs that were not present on older kernels continued to work, by emulating the new API. This was a one-time cost to add such API compatibility functions, but the barrier to entry for new contributors was significantly reduced. Hackers could download the latest KVM release, compile the modules against whichever kernel they were running, and see virtual machines boot. If that did not work, developers could post bug-fix patches.

Widespread adoption

Chip vendors started taking interest and porting KVM to their architectures: Intel added support for IA64 along with features and stability fixes to x86; IBM added support for s390 and POWER architectures; ARM and Linaro contributed to the ARM port; and Imagination Technologies added MIPS support. These didn’t happen all at once, though. ARM support, for example, came rather late (“it’s the reality that’s not timely, not the prediction”, quipped Kivity during a KVM Forum keynote when he had predicted the previous year that an ARM port would materialize).

Developer interest could also be seen at the KVM Forums, which is an annual gathering of people interested in KVM virtualization. The first KVM Forum in 2007 had a handful of developers in a room where many discussions about the current state of affairs, and where to go in the future, took place. One small group, headed by Rusty Russell, took over the whiteboard and started discussions on what a paravirtualized interface for KVM would look like. This is where VIRTIO started to take shape. These days, the KVM Forum is a whole conference with parallel tracks, tens of speakers, and hundreds of attendees.

As time passed, it was evident the KVM kernel modules were not where most of the action was — the instruction emulation, when required, was more or less complete, and most distributions were shipping recent Linux kernels. The focus had then switched to the user space: adding more device emulation, making existing devices perform better, and so on. The KVM releases then focused more on the user-space part, and the maintenance of the compatibility layer was eased. At this time, even though the kvm-userspace fork existed, effort was made to ensure new features went into the QEMU project rather than the kvm-userspace project. Kivity too started feeding in small changes from the kvm-userspace repository to the QEMU project.

While all this was happening, Qumranet had changed direction, and was now pursuing desktop virtualization with KVM as the hypervisor. In September 2008, Red Hat announced it would acquire Qumranet. Red Hat had supported the Xen hypervisor as its official VMM since the Red Hat Enterprise Linux 5.0 release. With the RHEL 5.4 release, Red Hat started supporting both Xen and KVM as hypervisors. With the release of RHEL 6.0, Red Hat switched to only supporting KVM. KVM continued enjoying out-of-the box support in other distributions as well.

Present and future

Today, there are several projects that use KVM as the default hypervisor: OpenStack and oVirt are the more popular ones. These projects concern themselves with large-scale deployments of KVM hosts and several VMs in one deployment. These come with various use cases, and hence ask of different things from KVM. As guest OSes grow larger (more RAM and virtual CPUs), they become more difficult to live-migrate without incurring too much downtime; Telco deployments need low latency network packet processing, so realtime KVM is an area of interest; and faster disk and network I/O is always an area of research. Keeping everything secure and reducing the hypervisor footprint are also being worked on. The ways in which a malicious guest can break out of its VM sandbox and how to mitigate such attacks is also a prime area of focus.

A lot of advancement happens with new hardware updates and devices. However, a lot of effort is also spent in optimizing the current code base, writing new algorithms, and coming up with new ways to improve performance and scalability with the existing infrastructure.

For the next ten years, the main topics of discussion may well not be about the development of the hypervisor. More interesting will be to see how Linux gets used as a hypervisor, bringing better sandboxing for running untrusted code, especially on mobile phones, and running the cloud infrastructure, by being pervasive as well as invisible at the same time.


Play Cassandra in Docker

上次在Mac上玩Docker用的还是boot2docker[1],一个基于的轻量级Linux发行版。今天想部署多个节点的Cassandra集群,可以把已经装好的Fedora虚拟机拷贝两份硬盘空间不足,在怎么清理还是不够,主要是使用这个虚拟机编译了很多东西磁盘占用较多。下载的Fedora24-Beta mini默认迷你安装下来,也才1.5G。





Starting "default"...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...

                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           ______ o           __/

docker is configured to use the default machine with IP
For help getting started, check out the docs at
# 新建一个容器,拉取并执行docker官方仓库里的Cassandra
Jerusalem:~ amoskong$ docker run --name vm4 -d cassandra

# 获取一个容器的shell,然后执行cqlsh连接数据库
Jerusalem:~ amoskong$ docker exec -it vm4 sh

# cqlsh
Connected to Test Cluster at
[cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4]
Use HELP for help.

重复上面的命令,指定不同的名字,就可以创建多个容器,对应多个Cassandra节点。多个节点连接连接成一个集群cluster就需要配置cassandra seeds,这个可以通过配置文件,docker命令行,或者Kitematic图形管理工具的设置界面。

# 创建容器时候指定 Cassandra seeds ip地址
Jerusalem:~ amoskong$ docker run --name vm4 -d cassandra -e CASSANDRA_SEEDS="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' vm1)"

Jerusalem:~ amoskong$ docker run --name vm4 -d cassandra -e CASSANDRA_SEEDS=",,"

第三个容器里的cassandra老是启动失败,内存不足,在网上查了一下,最后还是通过设置环境变量 MAX_HEAP_SIZE, HEAP_NEWSIZE。


Berlin in Germany (柏林)三


金浦哥家就在柏林西北部的机场旁边,一路东行往南一拐就到Alex广场金浦哥上班的地方。还是挺远的,得骑半个多小时。话说金浦哥身体很壮实,所以他之前说的很近让我有了新的认识。看来没有第一天直冲波茨坦是很明智的。去波茨坦,金浦哥也说很近 0:-)




一骑行就忘记了吃饭,一直骑到城市西南的某个街区角落停了下来,锁好车。找了家土耳其肉夹馍管子,里边还有非常接近老家的软面薄饼,加上菜很好吃,啤酒,还有足球赛,赌博机。老板来自土耳其,老婆孩子都在家里,他两三年才回去一次。这边店面和住宿房租开支太大,他每年下来折合人民币赚30多万,柏林的消费比北京能贵一点点(基本接近),不过这几年生意原来越不行了。他表示也很无奈,回国也没什么能做,只能在这延续着,即便情况越来越糟糕。我还给老板说,你来中国吧,我们那边人也爱吃肉夹馍,估计生意要比这边好,他说好呀好呀。我也是喝的晕乎乎,狂飙英语,Life is hard, We are Friends。最后一只聊聊,聊的酒劲都散了,才走的。他说让我下次来柏林再去找他,如果过来工作那就更好。他人真的是太好了,一年后的今天我只能祝福他,祝福他的家庭。





再往南骑行,中途为了抄近路(一直沿着公路骑行没啥意思) 结果走到了森林深处,路都被腐烂的树叶覆盖了,但是我还是头皮硬撑着没有走回头路,中间只有遇到过一对父女。内心有一点害怕,树很高但是不那么密,能偶尔听到汽车的声音,但是就是找不到路,因为之前看了地图所以感觉只要朝着西南方向走就能出去。结果走呀走,走呀走,最后终于看到了一户人家,有了正规的路。沿着这条路又走了老半天才来到一个小村庄,拿手机一看一直在超西北方向偏移,在森林里耗费了两个小时。后面的路就简单了,一路直行就到波茨坦小镇了。看到小镇那一刻,真是豁然开朗,以后还是不要冒然行动。


回去的路好走多了,是沿着湖的东侧,有专门的骑行车道,一路杀到第一天坐船的地方,还有胜利纪念碑,不是很晚继续北行。在一家泰国餐馆吃了个炒饭,喝了被芒果汁,真的好甜好新鲜。最后结账时候发现菜单上的价钱贵了0.5欧,老板指另外一个地方标着价钱,我说我只能按菜单上的价钱给你支付。老板没说什么,最后给我少了0.5欧,还把一些零头舍去了。我表示有点内疚 :-/




Berlin in Germany (柏林)二


去德国之前联系的他们,不过他们已经安排好了,土耳其的旅行计划没法改动。但是他们仍然邀请我去柏林玩,给我制定了详细的旅游建议,介绍了很多他们平时发现的好地方、好项目,并且让我吃住在他们家,使用他们的山地自行车骑行。这种真诚的对待、被信任,让我非常感激,习惯性的中式推脱后还是欣然接受了 🙂

在杜塞尔多夫参加Linux Conference/KVM Forum期间就碰到了金浦哥公司的 Sebastian, 之前只是见过照片,在FaceBook、Linkedin上看过他一些牛逼的作品(Kernel开发项目、飞行器模拟制造运行,游戏等),走着走着一眼就认出来了。感觉说话有一点点羞涩,不过我一问他们的项目,他就开始说了好多好多想法,非常有成就感。顺便也跟他们公司其他几个人打了招呼,介绍了自己。


而地中海沿岸的西班牙,却在挠经济危机,上次去巴塞罗那,半个城市就像鬼城。白天大马路上都没几个人,商店也都关闭了,罢工游行频频,抢劫小偷特别猖獗。而这里的人,每天起得很晚,商店一天营业没几个小时,晚上一般10点多才吃晚饭,去早了要么没开始营业,要么人特别少。一般人晚饭后都有好几个聚会要参加,玩到很晚很晚。沿海气候,沙滩日光浴都可以免费享受,再来点海鲜啤酒那就更美了。估计这里人都忙着享受,把工作淡忘了,才会有经济危机 ;-?

到了柏林火车站,有好几层超大,地铁公交换乘也很方便,就是得看清楚指示牌。买了张一天的通票,第一次乘坐要自动检票打上时间。压根没人检票全靠自觉,不过被逮到罚款很严重。碰到一个麦当劳想去上个wc,结果需要3.5欧,这里的人力很贵,免费服务较少,整个城市免费wifi也非常少。地铁线很密集,每一站都不是很大,设备也没有中国的新,毕竟是developED国家吗。倒了几趟车感觉来到了东郊,沿途很多涂鸦,并没有name太多崭新的东西。先是去找 Sebastian 拿金浦家的钥匙,刚下地铁时真好碰到了在会议上认识的Sebastian的印度同事,他们带着老婆孩子也才过来不到一年时间,他把我带到了公司。之前在Google街景上大概看过周边的情况,离Alex广场繁华商区非常近。从外面看任然是欧式建筑,不想明亮时尚的写字楼,倒想个居住区。推开大门先是一个长廊,然后是个很大的院子,透过窗户这才看到了很多现代化公司。公司会议室不大,休息室却很大,有各种啤酒甜点游戏,不过没见一个人休息,好多人也是自己从家里带饭。





Looking for a SW Engineer to join the Avocado team (Power8)

Hi folks. (from:

We’re looking for a SW Engineer to work on the KVM Test Automation team (currently working on Avocado), focusing on Power8.

The primary assignment of this person will be to work on the Power8 port of Avocado and to be the person responsible for anything related to power8 in our KMV test automation infrastructure in the long term.

If you know of somebody who would fit the description below and is willing to join our team, please point them to the Red Hat job site or contact me in private.

The position is public in this URL: (don’t care about the work place)

Company Description:
    At Red Hat, we connect an innovative community of customers,
    partners, and contributors to deliver an open source stack of
    trusted, high-performing technologies that solve business
    problems. We’re a billion dollar S&P 500 company offering
    solutions from Linux to middleware, storage to cloud,
    together with award-winning global customer support,
    consulting, and implementation services. 

Job summary:
    The Red Hat Engineering team in Brno is looking for a
    Software Engineer to join the KVM team in Brno, Czech
    Republic. Focusing on Power8 (ppc64), you will work on the
    development of testing frameworks and tools to automate the
    test coverage of Red Hat virtualization technologies running
    on the IBM Power8 architecture. As a Software Engineer,
    you'll work with a globally distributed team to develop
    critical technology for Red Hat products while collaborating
    with the open source community on many projects. 

Primary job responsibilities:
    * Design and implement new features in open source testing
      frameworks like Avocado and Autotest
    * Integrate multiple automation tools in a Continuous
      Integration (CI) system for KVM on Power8
    * Assist the Quality Assurance team in developing complex
      tests involving virtualization technologies
    * Promote a culture of test automation internally within Red

Required skills:
    * 2-3 years of significant software development experience on
    * Good understanding of the inner workings of a Linux
    * Experience with test automation or continuous integration
    * Familiarity with development languages like Python and C
    * Experience with open source projects and development tools
    * Familiarity with IBM's Power8 architecture
    * Bachelor's degree in computer science or equivalent





+ Reclaimed space of my old guests images

  1) Fill unused space to zero:
    In Linux guests:
     # dd if=/dev/zero of=./a.out bs=1M
     # rm a.out

    In Windows guests:
     # download sdelete from
     # sdelete -c c:

  2) # qemu-img convert -p -O qcow2 orig.qcow2 new.qcow2
     # rm orig.qcow2

收缩了几个用了很长时间的镜像,笔记本有剩余40+G空间感觉很富有 🙂

level-triggered interrupt & edge-triggered interrupt (电平触发中断 与 边沿触发中断)

(谢谢 wanpin Li对这篇博客的反馈,让我重新发现了一些问题)


  • 所谓边沿触发,就是电平发生变化(通常是从高到低)时触发的中断 (CPU忙时中断可能丢失)
  • 所谓电平触发,就是设备保持中断请求引脚(中断线)处于预设的有效触发电频(通常为高),中断会一直被请求(CPU忙时中断也不会丢失)。为了避免中断请求被重复处理,需要在处理前 标记 irq,然后 ACK irq(复位中断请求引脚为无效电频,以便能够接受其他中断请求),处理中断,然后 取消 irq 标记。

KVM 最先只支持 edge-triggered 中断,使用一个irqfd通知guest,电频下降边沿可在注入模拟中一次完成。看下面代码连续调用两个kvm_set_irq(),倒数第二个参数为电平。

虚拟机 level-triggered 的注入仍然依赖的是userspace,在QEMU里模拟了 PCI / IOapic /LAPIC(8259)。KVM_IRQFD ioctl 是中断注入的接口,唯一参数 kvm_irqfd 结构体包含了一个irqfd、gsi(全局系统中断),flags。

KVM 在 2012 年也添加了 level-triggered 中断模拟 [1],添加了一个 resamplefd (为eventfd) 到 kvm_irqfd 结构体,用它来通知guest取消 irq 标记。来自所有guests、设备的请求被放在链表中管理,所有的 resample irqfds使用同一个 IRQ源ID,所以 de-assert(复位中断请求引脚为无效电平,也就是清除中断状态寄存器–ISR)只需要一次,而每个irqfd都需要单独通知 guest 一次(用于通知新的resampler使用这个gsi)。

static void
irqfd_inject(struct work_struct *work)
	struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
	struct kvm *kvm = irqfd->kvm;

	if (!irqfd->resampler) {  //edge-triggered
		kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1, false);
		kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0, false);
	} else //level-triggered
			    irqfd->gsi, 1, false); //assert irq
 * Since resampler irqfds share an IRQ source ID, we de-assert once
 * then notify all of the resampler irqfds using this GSI.  We can't
 * do multiple de-asserts or we risk racing with incoming re-asserts.
static void
irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
	kvm_set_irq(resampler->kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
		    resampler->notifier.gsi, 0, false); //de-assert irq for level-triggered interrupt

	list_for_each_entry_rcu(irqfd, &resampler->list, resampler_link)
		eventfd_signal(irqfd->resamplefd, 1); //notify resimplers

[1] commit 7a84428af  [PATCH] KVM: Add resampling irqfds for level triggered interrupts

  irqfd_inject(struct work_struct *work)
  irqfd_shutdown(struct _irqfd *irqfd)
  irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
  struct kvm_irqfd
  struct _irqfd
  struct _irqfd_resampler
  kvm_set_irq(kvm, source_id, level, line_status)

  kvm_reqeust_irq_source_id(struct kvm *kvm)


QEMU 提供一个 sendkey的 Monitor 命令,用来向虚拟机发送单个字符,或者组合键。之所以只支持单个字符,是因为这里需要对空格、ctrl,回车等进行转换,也需要支持keycode的输入。


[amos@amosk qemu]$ cat 

# 封装一下sendkey()函数,调用virsh send-key 命令想虚拟机发送字符串
function sendkey() {
    length=`expr length "$str"`
    for ((i=1; i<=$length; i++)); do
        char=`expr substr "$str" $i 1`
        if [ "$char" = " " ];then
        echo virsh send-key $DOM "$char"

sendkey "root"
echo virsh send-key $DOM kp_enter
sendkey "shutdown -h now"
echo virsh send-key $DOM kp_enter
[amos@amosk qemu]$ bash 
virsh send-key rhel6u5_x64 r
virsh send-key rhel6u5_x64 o
virsh send-key rhel6u5_x64 o
virsh send-key rhel6u5_x64 t
virsh send-key rhel6u5_x64 kp_enter
virsh send-key rhel6u5_x64 s
virsh send-key rhel6u5_x64 h
virsh send-key rhel6u5_x64 u
virsh send-key rhel6u5_x64 t
virsh send-key rhel6u5_x64 d
virsh send-key rhel6u5_x64 o
virsh send-key rhel6u5_x64 w
virsh send-key rhel6u5_x64 n
virsh send-key rhel6u5_x64 spc
virsh send-key rhel6u5_x64 -
virsh send-key rhel6u5_x64 h
virsh send-key rhel6u5_x64 spc
virsh send-key rhel6u5_x64 n
virsh send-key rhel6u5_x64 o
virsh send-key rhel6u5_x64 w
virsh send-key rhel6u5_x64 kp_enter

SeaBIOS study (1)

SeaBIOS [1] 是 x86 结构下的一种 BIOS 的开源实现,可以完成类似 coreboot [2] 初始化硬件后payload的工作,实现一些启动逻辑。

CPU初始化后,放在EIP中执行的第一条指令地址为0xFFFFFFF0,这是intel CPU的一个hack (an inelegant but effective solution to a computing problem),叫做重置向量(reset_vector)。内存0xFFFFFFF0~0xFFFFFFFF(4G最后边的16bytes)的指令,会将CPU调转到系统BIOS的入口地址0xF0000。系统BIOS被预加载到(0xF0000~0xFFFFF, 960k~1M)。

SeaBIOS/src/ 下只有两个汇编代码:
>>> seabios/src/entryfuncs.S : 这里面是从汇编调用C语言函数的宏定义
>>> seabios/src/romlayout.S :
BIOS 起始函数入口“entry_post”,POST代表 Power-on test-self(加电自检)。 文件头部使用DECLFUNC指令定义了几个函数,有中断处理、 32/16big/16模式转换、resume、PMM(处理器电源管理)、PnP(热插拔)、APM(高级电源管理)、PCI-BIOS、BIOS32、ELF等入口函数。entry_post中,对依次通过一些调转函数或中断,对各类设备进行设置。
“jmp entry_19” 调转到 entry_19 函数,再通过entryfuncs.S 中的宏定义,实际调用src/boot.c中的handle_19(),用于加载启动操作系统。
entry_18/handle_18()则用来处理启动(INT 19)失败。

+ 18h Execute Cassette BASIC: True IBM computers contain BASIC in the ROM to be interpreted and executed by this routine in the event of a boot failure (called by the BIOS)
+ 19h After POST this interrupt is used by BIOS to load the operating system.

Qemu[4]解析启动参数(例如:“-boot order=ndc,menu=on,boot-timeout=1000 ..”),并通过rom中的fw_cfg文件向BIOS传递启动参数,BIOS则通过读取这些文件应用参数。但是SeaBios不只是被Qemu一个项目使用,所以启动参数的默认、启动策略有差异。
关于BIOS启动有一个规范(bios boot specification) [5],考虑兼容/支持很多硬件,比较复杂,正在读规范中&#8230;

Bios为系统ACPI提供DSDT(Differentiated System Description Table,差异系统描述表),这样ACPI就能通过统一的接口对不同类型的设备进行初始化设置,描述使用的是ASL汇编语言,编译后的16进制文件,可以被标准系统使用。比如热插拔功能,Bios再DSDT表中描述PCI设备(如网卡),定义电源管理的回调函数,_EJ0方法用于移除一个设备。相应的在操作系统内部,有PCI驱动处理PCI设备的热插拔 (code: linux/drivers/pci/hotplug*),从固定IO Port探测PCI设备、注册初始化、管理,到最后的销毁。

SMBios(System Management BIOS): 主板/操作系统厂商显示产品管理信息所需遵循的统一规范
DMI(Desktop Management Interface): 帮助收集电脑系统信息的管理系统