Class 2
CS 480-008
28 January 2016

On the board
------------

1. Last time
2. DNS
3. Network layer (IP)
4. Transport layer (UDP, TCP)
5. Application layer (anything)
    --sockets API

---------------------------------------------------------------------------

1. Last time

    Intro to course

    Intro to networking unit

    Computer networks are interesting

	--end-points highly programmable, middle kind of boring (only
	kind of).

	    --can program all of the nodes!

	    --extremely easy to innovate and develop new uses of the
	    network (the Web was not designed by computer scientists or
	    network architects! the Web was an application of the
	    network that required zero buy-in from network engineers)

	--contrast: telephone network: end-points ridiculously simple,
	middle has complexity.

	    --worse, can't program most phones, need FCC approval for
	    new devices, no visibility, etc.

    [Aside: if you're interested in this stuff, take classes in
    networking! Or program away! Or read the RFCs (short for "Request
    For Comments" but despite the name, they are standards). 
    
    ** Few things are as open and well-documented as the various
    protocols that form, and run over, the Internet.]

    Layered picture
    
        [redraw it]

    Today: cover IP, transport, app

    Going to simplify what is happening below IP

2. DNS: how do names turn into addresses

    DNS = Domain Name System. One of the most successful distributed
    systems in history.

    type:

    $ dig www.cims.nyu.edu


    ask "." for the name server (NS) for .edu.
        (".edu" is known as a TLD, or top-level domain)

    ask that NS for the NS for .nyu.edu.

    ask that NS for the NS for cims.nyu.edu.

    where do names come from?

        ICANN has authority over the name space.
        
        registry holds the names.
        
        registrars assigns names within the given domain;
            resulting records go in the registry


3. IP

    Internet Protocol (IP): classic technology
    (took over the world, almost literally)

    --IP used to connect multiple networks

    --Runs over a variety of physical networks

    --Most computers today speak IP

        --We will focus on the classic version of IP: IPv4.

    Fundamentals:

    --Every host has a unique 4-byte (32-bit) IP address

        --for example: 
	    access.cims.nyu.edu is 128.122.49.15
	    linserv.cims.nyu.edu is 128.122.49.125

    (Notice: IPv4 permits 2^{32} addresses. And portions of the space
    are not usable. Not enough for all devices that want to connect!
    Middleboxes, private networks, NAT, etc. help deal with the
    shortage, at the cost of complexity. 
   
    IPv6 also deals with the shortage: IPv6 has 16-byte (128-bit)
    addresses. With 2^{128} addresses, it seems hard to run out. But
    IPv6 is still not deployed everywhere.)

    --Where do addresses come from?

 	--The top-level assignment is by IANA, who delegates to ARIN
 	(for north america), who assigns to either NYU or NYU's
 	providers.)
       
	--For example, NYU gets:

	    128.122.0.0 - 128.122.255.255
	    192.76.177.0 - 192.76.177.255
	    192.86.139.0 - 192.86.139.255
            216.165.0.0 - 216.165.127.255

            see http://whois.arin.net/rest/org/NYU/nets

        --This is a different name space from Domain Names.

	For example: 
	    access.cims.nyu.edu is 128.122.49.15
            fox.geekny.com is      128.122.140.111
        could have:
            foo.cims.nyu.edu being  5.17.35.6

    --How do packets get where they're going?

        *Forwarding*: router sees a packet with a destination, looks up
        the destination, decides which link to send it out.

            DEMO:

            $ traceroute .....

        *Routing* solves the problem of knowing where all of the hosts
        are attached, and how to reach them

	    --Dijkstra's algorithm, Link state, path vector, etc., etc.

    --Address space structured to make routing practical at global scale
    (because of the hierarchy and aggregation)

        --Result: number of routing entries across the Internet vastly
        smaller than the number of addresses

	    --this was hugely important for scaling. still is, though
	    becoming less so (as memory gets cheaper)


        DEMO:
        
        $ netstat -arn


    --How do hosts get IP addresses? two possibilities:

	--manual configuration
	    --BTW, even edge routers get this thing configured
	    manually. A third-tier ISP is told: "here's the IP
	    address of the other end of this link."
	    --If you have a cable modem, it does this	

	--DHCP


    --Commercial providers exchange prefixes, using BGP.

        Lots of complexity and considerations

        (technical concerns and business ones interact)

    --Internally, use other protocols: IS-IS, OSPF, etc.


    TRANSITION

    --we do not yet have a way to indicate what application or process
    on the destination computer gets the packet

    --we also don't cleanly handle things like failure, congestion in
    the network, etc.

4. Transport layer


    DRAW PICTURE:

		layer                          role

	TCP    UDP    ICMP("ping")	{flow control, port space}
		    IP			{forwarding}
		Ethernet		{framing}
	    radio  copper_wires  fiber  {signal propagation}
	
    --Onboard:
        * Motivation    
        * TCP vs UDP
        * port space
        * Congestion control

    --Motivation: failure, demultiplexing, flow control, etc.

        Several types of error can affect packet delivery

        --Bit errors (e.g., electrical interference, cosmic rays)

        --Packet loss (packets dropped when queues fill on overload)

        --Link and node failure

        In addition, properly delivered frames can be delayed,
        reordered, even duplicated

    How much should OS (or the networking modules) expose to application?

        --Some failures cannot be masked (e.g., server dead)

        --Others can be (e.g., retransmit lost packet)

        --But masking errors may be wrong for some applications (e.g.,
        old audio packet no longer interesting if too late to play)

    UDP and TCP most popular protocols on IP

    --Both use 16-bit _port_ number as well as 32-bit IP address

    --Applications _bind_ to a port and receive traffic to that port
	(discuss later what the interface is)

    UDP: User Datagram Protocol

    --Exposes packet-switched nature of Internet

    --Sent packets may be dropped, reordered, even duplicated
    (but generally not corrupted). Application's problem to deal
    with these errors

    TCP: transmission control protocol

    --Provides illusion of a reliable "pipe" between two
      processes on two different machines

    --Masks lost and reordered packets so apps don't have to worry

    --Handles congestion and flow control

    Uses of TCP

    --Most applications use TCP

    --Easier interface to program to (reliability)

    --Automatically avoids congestion (don't need to worry about
      taking down network)

    --Example: Interacting with www.cs.nyu.edu
	--Browser resolves IP address of www.cs.nyu.edu 
	--Browser connects to TCP port 80 on that IP address
	--Over TCP connection, browser requests and gets home page

5. Application layer

    Servers typically listen on well-known ports
	SSH: 22
	Email: 25
	Finger: 79
	Web / HTTP: 80

    What is the interface to the networking stack?

     --Application programmer classically sees *sockets*. 

	(Inspired by pipes )

	Write data on one machine, read it on another

	*sockets* can represent many different network protocols, but:

	--classically an interface to TCP/IP and UDP
	--sometimes an interface to IP or Ethernet (raw sockets)

	--sockets API:

    DEMO

	/* senders and receivers */
	int sockfd = socket(AF_INET, SOCK_STREAM|SOCK_DGRAM|, 0);
	    [note: with AF_INET in the first position, the setting of
	    SOCK_STREAM vs SOCK_DGRAM controls whether the app's data is
	    going to go over TCP or UDP].
	    
	    [with UDP sockets, send atomic messages that may be
	    reordered or lost]

	    [with TCP sockets, bytes written on one end are read on the
	    other, provided no failures. but no guarantees that reads
	    will return the full amount requested ... or that the data
	    will be packetized according to the number of times the
	    sender called send(). With TCP, you *must* sit there in a
	    loop and keep reading. You know you're done because either
	    (a) the application-level protocol is expected to understand
	    where message boundaries begin and end or (b) the first
	    machine closed its connection to the server]

	int rc = close();
	select(); /* for asynchronous network I/O;
	             we won't use this in lab1 */

	struct sockaddr_in {
	    short sin_family;
	    short sin_port; 
	    uint32_t sin_addr; 
	    char sin_zero[8];
	};

	/* senders */
	int rc = connect(sockfd, &addr, addrlen);
	int rc = send(sockfd, buf, len, 0);
	int rc = sendto(sockf, buf, len, 0, &sockaddr, addrlen, 0);

	/* receivers */
	int rc = bind(sockfd, &addr, addrlen);
	int rc = listen(sockfd, backlog_len);
	int rc = accept(sockfd, &addr, &adddrlen);
	int rc = recv(sockfd, buf, len, 0);
	int rc = recvfrom(sockfd, buf, len, 0, &addr, &addrlen);


	NOTES:

	* connections are named by 5 components:

	    protocol (TCP), local IP address, local port, remote IP
	    address, remote port

	* UDP does not require connected sockets

	* OS tracks all of this state in a PCB (protocol control block).


-------------------------------------------------------------------

Reference: poke around with some tools:

       --"ifconfig -a" (Unix)                                                                     
        --"netstat -arn" (Unix)                                                                    
        --"dig [hostname]" (Unix)

        --"dig -x [IP address]" (Unix)

        --"ipconfig /all" (windows)                                                                
        --"route print" (Windows?)   

        --"arp -a" (Unix)