テレポートする仮想マシン:脚本をひっくり返す
原題: Teleporting Virtual Machines: Flipping the Script
分析結果
- カテゴリ
- 介護
- 重要度
- 62
- トレンドスコア
- 24
- 要約
- テレポート技術を用いた仮想マシンの移動に関する新しいアプローチが紹介されています。この技術により、仮想マシンを物理的な場所から別の場所へ迅速に移動させることが可能になり、システムの可用性や効率性が向上します。従来の方法とは異なり、テレポート技術はダウンタイムを最小限に抑え、運用コストを削減することが期待されています。
- キーワード
In the previous post in this series, Teleporting Servers , we examined how to move a live virtual machine between physical hosts using the pre-copy method. Pre-copy transfers the VM’s memory in multiple rounds while the source keeps running. When the number of dirtied pages falls below a configurable writable working set (WWS) threshold—or a preset maximum number of iterations is reached—the VM is suspended and its CPU state plus any remaining dirty pages are sent to the target host; that final transfer is the service downtime phase. The Pre-Copy Bottleneck We were good with pre-copy, right? Why do we need a new method? Well, pre-copy takes seconds and in virtual world, a second is an eternity. Let's see why pre-copy take time. What is the cap to stop the iterations of transferring memory? Right, it's WWS floor. But what if the smallest WWS reached is too large? The downtime will be high and the processes that needed to be continued will take some time. This means that pre-copy is good only if the workload of the VM is read-intensive (not too much of page dirtying) and if the workload is write-intensive the downtime will be higher so is the applications' performance. The Post-Copy Flip Take the analogy from the previous post, the moving in example. What if you take your essentials in the first run and start living in the new apartment? Then you can move your other big items later and hand over the key to the owner. Now apply that to the VM migration. First you transmit the VM's processor state to the target and start the VM there. Then actively push the VM's memory pages from the source to target. Meanwhile, if there's any page faults they are sent over the network from the source. The Four Pillars of Post-Copy Post-Copy under the hood uses 4 methods to make the migration process efficient. Demand Paging - This is when a page fault occurs and it is requested from the source over the network. Active Push - The pages are sent to the target from the source even without a page fault occurring to make sure the target will not be dependent on the source as soon as possible. Prepaging - This is more like a forecasting technique used to identify the page access pattern and get the needed pages even before there's a page fault. Dynamic Self-Ballooning (DSB) - Why are you sending the free pages over when you can just drop them? DSB takes care of that. Predicting the Future with Bubbles As mentioned above, pre-paging forecasts what pages will be faulted on the target. Let's see what happens behind the scenes. As mentioned earlier, Prepaging is used to make pages available at the target before they are faulted on by the running VM. The effectiveness of the prepaging is measured by the percentage of the page faults that requires an explicit page request to be sent over the network to the source. Smaller the percentage, better the prepaging algorithm. But how the pages needed in the future are decided? Computer programs don't usually access memory completely at random. If a program needs to read data at Memory Address 100, there is a very high probability that in the next millisecond, it's going to need Address 101, 102, and 103. This is called "spatial locality." Now there are two methods in which the pages needed to sent are decided. Bubbling with a Single Pivot What happens when you throw a rock into a pond? There will be ripples spreading outwards in circles. Now compare it with a network fault. Initially the pages are sent over in one direction (0, 1, 2, ...), where the pivot is page 0. When a network page fault occurs, the pivot moves there. From there the pages are sent in both directions, front and back. Assume that page fault is 50. The pivot will be 50 and then the pages will be sent in backwards 49, 48, 47, ... and also frontwards 51, 52, 53, ... Whenever a pages which has been transferred is met, it is skipped, ensuring the pages are transmitted only once. Bubbling with Multiple Pivots Now imaging the VM having multiple processes running, the new VM would fault on page on multiple locations. Thus there need to be multiple pivots, causing multiple bubbles. Each bubble will expand around an independent pivot. If one edge of a bubble comes across a page that is already transmitted that edge will -be stopped. As for the efficiency, it has found limiting the number of pivots to around 7 is a good idea. Therefore whenever a new pivot is occurred and the limit is hit, the new pivot will replace the old one. As for the direction of the bubble growth, it has found that forward expansion is essential, backwards-only expansion is counter productive and bi-directional expansion performs just right. What to Send First? Generally, Linux maintains two linked lists in which pages are accessed in Least Recently Used (LRU) order; one for active pages and one for inactive pages. There's a kernel daemon periodically swapping pages around the two lists. Later on, the inactive list is swapped out of RAM to the SWAP device. This is quite helpful in post-copy implementation, while deciding what pages to send first. But Linux is lazy. If there's enough memory and no swap device, Linux just sits there and leave the lists unsorted. And the migration's pseudo-paging device is turned on last millisecond of the migration, there's no time to organize the list and migration algorithm has no idea which pages are active and which are inactive. Therefore the developers have implemented a kernel thread which runs in the background long before the migration starts. each time an application touches a page, it flips a tiny switch of the page called a "Referenced bit" Then while the thread goes through memory and if it sees a page with Referenced bit turned on, it clears the bit and moves the page to the top of the Active List. If a page sits there for a long time without its referenced bit being turned on it will be slides down in to the inactive list. Setting the Trap: Catching Page Faults There are three ways of trapping page faults by demand-paging component of post-copy. Shadow Paging Hyperviser have a read-only page table for each VM that matches its pseudo-physical pages to the physical page frames. If the VM tries to read or write a page that hasn't arrived yet, the hypervisor catches the violation, stops the vm, fetch the page and lets it continue. Page Tracking Page Tracking; During downtime, all the pages on target VM are marked as not present in their PTE. When the VM wakes up, it throws errors on everything, a custom software intercepts. This needs a lot of hacking into the guest OS. Pseudo-paging As soon as migration is started, the memory pages of the migrating VM at the source are swapped out to an in-memory pseudo-paging device, which resides on the guest kernel. Then the CPU state and non-pageable memory are transferred to the target during downtime. But there's a catch: if the OS detects that even a single ounce of its physical memory is missing, it panics (a kernel panic) and crashes. The solution? A heist. If you found out there was a priceless diamond in a capital museum, how would you steal it without tripping the alarms? The easiest way is replacing the original with a replica of the exact same weight. The same goes for pseudo-paging. If the OS won't let us take the memory, we just replace the original data pages with empty pages. This is called the MFN Exchange . Here's how it works: Before the migration starts, the hypervisor goes to its reserves, gathers a massive pile of completely empty, useless physical memory (the sandbags), and temporarily doubles the VM's memory reserve. Next, all the running applications are temporarily frozen so they stop writing new data. The guest OS is then instructed to swap out its memory. As it does this, the hypervisor intercepts the pointers. It takes the VM's internal addresses ( PFNs ) and reconnects them to those empty, useless physical chips ( MFNs ). The VM is satisfied. Meanwhile, the hypervisor takes the real physical chips holding the actual data (the diamonds) and quietly hands them over to Domain 0 to be beamed to the new server. With the memory safely stolen, the whole VM is suspended for just a few milliseconds and its "brain" is sent over to the target. Once awake on the new server, if the OS hits one of those empty replica pages, it throws a "page fault." A third-party software driver (the MemX client) intercepts this error and immediately pulls the missing data across the network from the old server's Domain 0. Handle the Free Memory Transferring a large number of free pages is a waste of resources and would increase the total migration time regardless of the migration algorithm you use. Also, if the moved VM asked for a brand new empty page, there will be a page fault and an empty page will be fetched from the source wasting time as once arrived, that page is overwritten anyway. A technique called ballooning is used for resizing the memory allocation of a VM. Usually there is a balloon driver in the guest kernel. It can either ask the guest for free memory and give them back to the hypervisor (inflate the balloon), or request pages from the hypervisor and return them to the guest (deflate the balloon). Dynamic Self-Ballooning Now that mechanism is used to avoid transmission of free pages during both pre and post copy migrations. The VM performs ballooning continuously over its execution lifetime - and its called Dynamic Self-Ballooning (DSB). DSB has three components Inflate the balloon - VM has a kernel-level DSB thread that allocates as much as free memory as possible and hand them over to the hypervisor. Detect memory pressure - Memory pressure means some entity needs to access a free page. Deflate the balloon - In response to a memory pressure the balloon must be partially deflated, i.e. reverse of inflating. DSB thread re-populates the free memory from the hypervisor and then release them to the guest kernel. How to detect memory pressure? Imaging you're a manager of a r