辅导案例-COMP0023

University College London Department of Computer Science COMP0023 Networked Systems Individual Coursework 1: The Domain Name System Due: 4th November 2020 By mapping names to IP addresses, the Domain Name System (DNS) plays a critical role in enabling end users to access services all over the Internet. For this coursework, assume that some UK universities have decided to run a DNS hierarchy (from the root to authoritative name servers) on their own, as part of a larger experiment aimed at reproducing the dynamics of the Internet in a controlled environment1. As soon as you become aware of the project, you enthusiastically volunteer to join the project, eager to experiment with concepts you are studying in COMP0023. Soon after your candidature, you are granted access to the machines prepared to host the university DNS in- frastructure. You are then given the task to implement name servers (NSes) supporting specific name resolution patterns decided by your new team leader. Getting Started Before your code is deployed in production, you are asked to demonstrate the correctness and robustness of your implementation in a virtual environment – a common practice also used in industry to limit the amount of bugs and misbehaviours experienced by actual users. To complete this coursework, you will therefore have to download a VirtualBox Virtual Machine (VM), and run an emulated network in it. In the following, we will also refer to such an emulated network as lab. A link to the VM you will have to use for this coursework is available at the following web page: http://www0.cs.ucl.ac.uk/staff/S.Vissicchio/comp0023/cw.html. The VM includes all the software needed to complete your coursework. In fact, you are strongly advised to install no additional software in the VM. This is in your own interest: our marking scripts will test your submission inside a new copy of the VM for this coursework, and you will get zero marks if our scripts fail because your code relies on additional software (e.g., tools or libraries) you customised your local copy of the VM with. Our original VM comes with no graphical interface. In other words, when you connect to the running VM, all you will see is a command line interface (CLI). This is a deliberate choice of ours, motivated by the fact that a command line is really all you need (for this coursework, at least). Avoiding graphical interfaces additionally enables us to let you have first-hand experience on the convenience and expressiveness of the Linux CLI (in the case you haven’t had the chance to appreciate it yet), while also limiting hardware requirements for you to run the VM. If you feel you absolutely need a graphical interface to be productive, you can of course install one. In this latter case, however, do remember to test your coursework’s solution inside a copy of our original VM. Downloading your lab After installing the VM, the first step is for you to download and install your lab for this coursework. To do so, you can login in the VM specifying vagrant as both username and password. Then, simply run the installer provided in the VM: position in the directory /home/vagrant/comp0023 platform, and execute the following command. [email protected]:$ cd comp0023 platform [email protected]:∼/comp0023 platform$ bash install lab.sh where the argument of the script is your own Portico student number (as it appear on Moodle). Starting and stopping an emulated network Now, you are ready to start a new network emulation, that is, to spawn virtual hosts and routers inside your machine. To do so, position in the /home/vagrant/comp0023 platform directory, and execute the startup.sh script from there, as follows: 1note that this scenario is totally fictitious, to the best of our knowledge 1 [email protected]:$ cd /home/vagrant/comp0023 platform [email protected]:∼/comp0023 platform$ sudo bash startup.sh where sudo is needed to run the startup script as privileged user inside the VM. Important: the first instructions run by the startup.sh script destroy all the emulated hosts and routers running in the VM at the moment the startup script is executed. So, when executing the above command, be always sure that no data, files or scripts have to be saved from the emulated network devices before they are torn down: you would lose them otherwise! Also, note that the images for emulated network devices will be downloaded the first time you start a lab. This download can take quite some time, depending on the speed of your Internet connection. It is however done only once, at the very first execution of startup.sh. To stop a network emulation without starting a new one, use the cleanup script specifying the directory /home/vagrant/comp0023 platform as argument – or “.” if you are already positioned in that directory. [email protected]:∼/comp0023 platform$ sudo bash cleanup/cleanup.sh . Interacting with an emulated network To perform network emulations, we rely on Docker. Roughly speaking, Docker enables to run software in (partially) isolated environments, called containers, that can be thought of as lightweight Virtual Machines. In this coursework, we will emulate both network intermediate systems (i.e., routers) and hosts (i.e., running DNS name servers) as Docker containers. You can retrieve diagnostic information about Docker containers and interact with them through the docker CLI command. Although a full tutorial on the docker command line interface is beyond the scope of this document, we report hereafter a few commands that we expect to be useful for you. We also encourage you to check the docker documentation, e.g., by typing on the VM CLI man docker, man docker ps, and similar call to man pages for other docker sub-commands. Once an emulated network is started with the startup.sh command described in the previous section, you can obtain a list of running containers by typing the following command. [email protected]:$ sudo docker ps | grep “host” In particular, the above command will print information about Docker containers emulating hosts. For this coursework, you will have to work on hosts only, that is, those whose names terminates with the “host” substring – so, please, ignore the containers emulating routers that you may notice if you execute sudo docker ps without the grep instruction. To log in a container (e.g., to access the container running a name server), you can use docker exec: [email protected]:$ sudo docker exec -it bash The above command will launch a (bash) command line interface on the given container. Hence, if you specified 1 R1host as container name, you will be presented with the following prompt: [email protected] host:/# indicating that successive commands you will type will be executed inside the container named 1 R1host. Once logged in a container, you are conceptually working in the machine emulated by the container: each container has its own filesystem and runs its own processes. Try for example to type ls in the command line to check which files are locally available. You may find it useful to retrieve information about the network interfaces of hosts emulated by containers. Use ifconfig to do so: [email protected] host:/# ifconfig The output of ifconfig will specify number, type and various parameters of each network interface of the emulated host. Among them, the IP address of each interface is reported next to the keyword inet. Important: ignore the interface called lo, which is a “virtual” interface, not corresponding to any physical link. 2 You may like to develop code outside containers, and then transfer files (e.g., source code) to them; or, vice versa, you may want to perform some debugging inside a container and then analyse collected information outside it. You can transfer files between the VM and the containers running in it through docker cp. In particular, the following command will copy a file from the VM to a container: [email protected]:$ sudo docker cp : You can use docker cp to copy a file from a container to the VirtualBox VM, too: [email protected]:$ sudo docker cp : DNS development and debugging u
tilities As a starting point for your work, we provide an implementation of the root nameserver for the portion of the fictitious university-hosted DNS hierarchy you have to implement. After starting your emulated network, you will find the Python3 implementation of such a root name server up and running at the container 1 R2host. In the same container, you will also find the source code of this implementation: check the /home/root-ns.py file inside 1 R2host. The source code for the root name server exemplifies the usage of a few building blocks you will likely need for the implementation of your name servers. Prominently, root-ns.py shows how to programmatically send and receive UDP packets through a network interface, using the so-called sockets in Python. It additionally illustrates how to use the dnslib Python library for parsing received DNS packets, and packing replies to them. Although we believe that the code is mostly self-explanatory, don’t hesitate to consult the official Python documentation on sockets (see https://docs.python.org/3.5/library/socket.html) and on the dnslib library (see https://pypi.org/project/dnslib/) if you need additional information. Of course, you can copy the root-ns.py file (e.g., using docker cp) on other hosts, and use it as a basis for a name server responsible for DNS zones other than “.”. Since the source file is in Python, running the software on a container named hostX boils down to connecting to that container, and executing the script from its CLI: [email protected]:/home# python3 root-ns.py -a X.Y.Z.W where the string X.Y.Z.W following the -a option is the IP address you would like other hosts to contact the name server at. Once the above command is executed, other hosts can therefore send packets to the process running root-ns.py by specifying X.Y.Z.W as destination IP address. For example, you can log in another emulated host, say 1 R1host, and send a DNS query to the root name server, by using the dig tool: [email protected] R1host:/home# dig @1.102.0.1 www.example.com. The above dig command will print the records returned by the name server at the IP address 1.102.0.1 when that NS is asked about the input name www.example.com. You can further check the content of the query packet received by the root name server as well as its response by adding print(..) statements inside the root-ns.py file in 1 R2host. In addition to dig, you can also debug your DNS hierarchy with the tracedns script located in the home directory of every emulated hosts of a running lab. This script tries to resolve a name provided as argument the same way a local nameserver with an empty cache would. For instance, the following command would trigger iterative DNS queries to the name servers in the emulated DNS hierarchy, aimed to resolve the input name, www.example.com. [email protected] R1host:# python3 home/tracedns.py www.example.com. More precisely, the above command will send a first DNS query for www.example.com. to the hard-coded IP address of the root NS (i.e., 1.102.0.1). If any NS record is provided in the reply, tracedns will then send a new query for the same name to the returned authoritative name server, and it will continue to traverse the DNS hierarchy until either it resolves the initial name, or it cannot proceed further (e.g., because of an unresponsive name server). We provide more details on the tracedns output in the following section, since such output is also used to specify the name resolution patterns your NS implementation should support. Of course, you are also allowed to build your own additional debugging tools. For example, you could implement a local name server answering to recursive queries. We stress however that no custom tool additional to dig and tracedns is needed to complete this coursework. 3 Stage 1: Supporting Name Resolution Patterns Your primary objective in this coursework is to support the resolution of names according to the unmodified DNS protocol. This means that you will have to implement support for the very same interactions that happen in the Internet when a name is mapped to an IP address. Yet, you will have to do so for specific names and IP addresses, and supporting pre-defined resolution patterns. Input: All the information you need to complete Stage 1 is provided in the text files inside the stage1 directory (created after the installation of your lab). Those files store the output that tracedns should returned for specific names, when your name servers are implemented. The content of those files is similar to the following, although names, IP addresses and TTL values will be different: trace recursive DNS query to resolve www.example.com (10.58.210.228) 1 root-server.net. [10.0.0.1] 2 tld.net. [10.41.162.30] 3 ns2.example.com. [10.239.34.10] final reply ;; ->>HEADER<;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 4 ;; QUESTION SECTION: ;www.example.com. IN A ;; ANSWER SECTION: www.example.com. 299 IN A 10.58.210.228 ;; AUTHORITY SECTION: example.com. 172800 IN NS ns2.example.com. ;; ADDITIONAL SECTION: ns2.example.com. 172800 IN A 10.239.34.10 The content of each resolution pattern’s file encompasses two main parts. The first part shows the trace of the query, i.e., the sequence of name servers contacted during a recursive query to resolve a given name. The second part of the file details the final answer to the query, in a format very close to the dig command. A couple of observations are worth doing. Each trace you are provided with lists name and IP address of the name servers that are sequentially queried by a local name server with an empty cache when resolving the name specified in the first line of the file. Thus, the first name server in the list is always the DNS root, and the last one is the name server returning the final answer to the query. The first four lines in the above example trace therefore indicate that to resolve the name www.example.com, a local NS with an empty cache would first query root-server.net, then tld.net and finally ns2.example.com. In analysing the traces in the stage1 directory, assume that all NSes in the trace provided an answer when queried, and no DNS packet was lost when the files were generated. Task: implement a hierarchy of name servers supporting the input resolution patterns. In other words, for each trace file t in the stage1 directory, running tracedns with the name in t as parameter on any host in the emulated lab should provide the same output as in t. [70 marks] • Deliverable: a zip file cw1-task1.zip. The zip file must include one file per NS, with your Python implementation of that NS, and no other files2. Each Python file inside the zip archive must be named as .py, where is the name of the Docker container where the Python file has to copied and run. Important: you will lose marks if your submission does not follow the above format. • Constraints: Each name server must be implemented in Python 3, and it must be possible to run it in the same way root-ns.py is run on 1 R2host when a new lab is started. More precisely, given a script hostX.py, it must be possible to run the script by copying it on the container named hostX, logging on that container, and executing the following command from the directory where the script is copied: 2any file additional to the Python implementation of NSes will anyway be ignored 4 [email protected]:# python3 hostX.py -a X.Y.Z.W. where X.Y.Z.W is the IP address of the hostX’s network interface other than lo. The above command should launch a process that answers to DNS queries sent to port 53 (i.e., the port reserved to DNS) and the given IP address X.Y.Z.W. To implement NSes, you must not use any other library than the ones already provided inside the Docker containers emulating hosts, that is, those included in the installed Python distribution plus dnslib. More generally, you should not install additional software on any Docker container. • Suggestions: Although this task requires the implementation of several name servers, we stress that the NSes to be implemented share the vast majority of functionalities (e.g., parsing received messages, processing them, sending replies, and so on) between them – and with the root name server. The most important thing that changes across different name servers is the DNS zones for which they are authori- tative, and hence the DNS records they store and return in their replies. In the implementation of your NSes, you may therefore want to structure your code to isolate the functions shared across NSes. Important: irrespectively of how you develop your code, remember to submit only one Python script per NS ; you will lose marks otherwise! • Marking scheme: We will assign marks for each trace that can be reproduced after running your NSes’ implementations. We will assign additional marks for non-disclosed queries that are consistent with the traces you are given. Those non-disclosed queries are meant to check that your implementation does not over-fit the input traces. More schematically: – support for trace01: 10 marks – support for trace02: 10 marks – support for trace03: 10 marks – support for trace04: 10 marks – support for trace05: 15 marks – support for additional queries: 15 marks The additional queries can test the behaviour of your NS implementations when queried for: (i) zones in your portion of DNS hierarchy, (ii) names of other name servers in your portion of DNS hierarchy, and (iii) names with no A record. For each of those additional queries we will check that your name servers answer as production NSes deployed in the Internet would. Note that you are actually given six traces. We believe that the sixth trace presents exceptional challenges. Your implementation of the DNS hierarchy is therefore not required to support trace06: you can still get the maximum number of marks (i.e., 100) irrespectively of whether the sixth trace is supported or not. Yet, you may want to accept the challenge of supporting this last trace too: if you succeed, you will get ten bonus marks – in addition to glory and fame! Those marks would not enable you to exceed 100 marks, but they can compensate for marks you may lose on other parts of the coursework. Evaluate pros and cons of supporting trace06, carefully! – [bonus] support for trace06: 10 marks To check that a trace is correctly implemented, our marking scripts will download a copy of the VM and of the lab you are initially provided with, copy your submitted zip file in the VM, copy and run each Python script on the container indicated by the script’s name, and then launch tracedns for each of the names in your resolution patterns files. Our scripts will automatically compute your mark by comparing the output of such tracedns commands with the content of the corresponding resolution patterns’ files. 5 Stage 2: Avoiding Overloads Suppose now that while operating the hierarchy, university researchers note that some name servers receive too many requests for their resources. Because of one of those overloads, a nameserver in the portion of hierarchy you implemented in Stage 1 is noted to fail at times. You are then asked to deploy on 1 R1host a replica of the failing NS. Input: You are provided with a minimalistic workload under which the non-root NS receiving the highest number of requests is likely to fail: your task is to identify such a nameserver, and replicate it on 1 R1host. The workload is specified in the stage2/workload.txt file, which has the following format: LNS 1: ... LNS 2: ... The file reports queries sent to your portion of the DNS hierarchy by two different local name servers (LNSes). For each LNS, the workload file lists the names that the LNS queried when the NS you will have to replicate failed. In computing the NS which received the highest number of queries under the provided workload, assume that no packet was lost during any query, and every LNS started with an empty cache, resolved the names in workload.txt one after the other (i.e., no parallelization), and did not discard any resource record because of TTL expiration. Important: running tracedns on the queried names, one after the other, and counting the queries generated this way does not enable to determine the NS you should replicate. Before starting Stage 2, be sure you understand why. Task: Reduce the load of the non-root NS which receives the maximum number of queries when two LNSes with an initially empty cache resolve the names in workload.txt. You are allowed to change the implementation of any name server implemented in Stage 1 (including the root NS we provide), and to replicate exactly one NS to the container named 1 R1host. For the replicated NS, the failure of either of the two replicas should not reduce the number of names that can be resolved by an external LNS. [30 marks] • Deliverable: a text file named cw1-task2.txt, and a zip file cw1-task2.zip. The text file must include a single line with the name of the Python file to be deployed on the container 1 R1host. For example, if you intend to replicate the name server running on 1 R100host3, you should submit a cw1-task2.txt file with the following line only: 1 R100host.py The cw1-task2.zip archive must exclusively include one file per NS: each of those files must be the Python implementation of an NS. Each Python file must be named as .py, where is the name of the Docker container where the Python file has to be copied and run. Do not include in the zip file any NS implementation for the replica to be run on 1 R1host – we will run the script indicated in cw1 task2.txt on 1 R1host. Important: you will lose marks if your submission does not follow the above format. • Constraints: the implementation of all NSes must be in Python 3. As for Stage 1, it must be possible to run every submitted Python file from the CLI, using a command equivalent to the one used to start the root NS. You must not change the mapping implied by Stage 1 between each NS and the zones for which the NS is authoritative for. You must not install new software on any Docker container. Additionally, you must not change the IP address at which existing NSes are run – which would anyway provide no benefit in the context of this coursework. 3this will obviously not be your case, since there are much less than one hundred hosts in your lab 6 Important: The NS replica deployed on 1 R1host must appear in the traces with a name different from any other NS in the original lab. • Assumptions: Your input workload may include names that are not mentioned in Stage 1: assume that no name server in the DNS hierarchy stores an A record for any of those names. If multiple NSes are specified in the authoritative section of a DNS reply, all LNSes always try to contacted them in order: LNSes send a query to the first authoritative NS first, if they don’t get any response, they try the second one, and so on. In computing the load for each NS, assume that: (i) no packet is lost during any of those queries, and (ii) the implementation of LNSes and NSes are bug-free. • Marking scheme: We will assign marks as follows. – correct selection of the NS to replicate; i.e., cw1-task2.txt specifies the name of the NS receiving the highest number of queries under the input workload: 10 marks – reduction of the number of queries for the replicated NS under the input workload: 10 marks – robustness to one replica failure: 10 marks Your implementation will be automatically checked by marking scripts. Our scripts first deploy the Python files on Docker containers on the basis of the corresponding names. They then check how many queries are served by every NS. Load-reduction checks will be repeated five times, each time simulating LNSes restarting with an empty cache: your replica implementation will therefore pass our tests if it reduces the load of the most loaded NS before replication either in a single execution of the given workload, or across up to five executions. Academic Honesty You are permitted to discuss the lectures’ and assigned readings’ content about the Domain Name System with your classmates, but you are not permitted to share details of the assignment, show your code (in whole or in part) to any other student, or to contribute any lines of code to any other student’s solution. All code that you submit must be written entirely by you alone. We use sophisticated copying detection software that exhaustively compares code submitted by all students from this year’s class and past years’ classes, and produces color-coded copies of students’ submissions, showing exactly which parts of pairs of submissions are highly similar. Do not copy code from anyone, either in the current year, or from a past year of the class. You will be caught, just as students have been caught in years past. Copying of code from student to student is a serious infraction; it will result in automatic awarding of zero marks to all students involved, and is viewed by the UCL administration as cheating under the regulations concerning Examination Irregularities (normally resulting in exclusion from all further examinations at UCL). You have been warned! Questions and Piazza Site If you have questions about the coursework, please don’t hesitate to visit us during office hours, or to ask questions on Piazza. When asking questions on Piazza, please be careful to mark your question as private if it reveals details of your solution. Questions that don’t reveal details of your solution, such as those about how to interpret the coursework text or lecture material, should be left public, though, so that all in the class may benefit from seeing the answers. As always, please monitor the Piazza site of the course. Any announcements (e.g., helpful tips on how to work around unexpected problems encountered by others) will be posted there. Credits The support for network emulation used in this coursework is derived from the mini-Internet project4 developed by the ETH network research group. 4 T. Holterbach, T. Bu¨hler, T. Rellstab, and L. Vanbever, “An Open Platform to Teach How the Internet Practically Works,” in SIGCOMM Comput. Commun. Rev., 2020. 7 欢迎咨询51作业君

辅导案例-COMP0023

Related

Previous Post辅导案例-CS176A

Next Post辅导案例-CPSC 331

Author admin