Class 17 (make-up, video-taped)
CS 480-008
1 April 2016

On the board
------------

1. Last time
2. Public key crypto concepts, continued
    Public key encryption 
    Textbook RSA
    Digital signatures
3. Web security intro

---------------------------------------------------------------------------

1. Last time

    --defending against untrusted OSes: Haven

    --public key crypto intro

2. Some crypto concepts

    We're looking at a few primitives:
        
        [last time] key exchange
        public-key encryption
        digital signatures


    Public key encryption

        Each party has a public-key pair: (pk, sk)
        pk is public key; broadcast to the world
        sk is secret key; guard it!

        Setup: two algorithms: Enc, Dec (and a key generation
        algorithm, which we will mostly neglect)

        Interface:
            
            (pk,sk)    <-- Gen(security_parameter)
            ciphertext <-- Enc(public key, message)
            plaintext  <-- Dec(secret key, ciphertext)

            Want/need:

              **  Dec(sk, Enc(pk, m)) = m

              **  Without sk, eavesdropping adversary (who sees
                  ciphertext) cannot guess message m

        NOTICE: if the encryption algorithm is deterministic, the
        scheme is insecure (why???)

    RSA

        Three starting facts/definitions from group theory:

        (1) Two integers x,y are _relatively prime_ if they have no
        common factors besides 1. This is often written as gcd(x,y)=1,
        meaning that the greatest common divisor is 1. 

            Ex: 6,10 are not relatively prime (they have a common factor
            of 2)

            Ex: 8,9 are relatively prime because gcd(8,9)=1.

        (2) Any integer N>1 induces a multiplicative group consisting of
        all integers that are relatively prime to N. In notation:

            ZN* = { a \in {1,....,N-1} | gcd(a, N) = 1}
                
            The number of elements in ZN* is written phi(N), known as
            "Euler's phi function."

            In other words, phi(N) is the number of numbers less than N
            that are relatively prime to N.

        (3) For all a in ZN*,
                a^{phi(N)} = 1 mod N

            This is because for any group G and any element g in G, 
            g^{|G|} = e, where e is the group identity (1 in the
            case above), and that follows from basic group theory.

    "Textbook RSA"
    
        Here is an INSECURE variant of RSA, the so-called "Textbook
        RSA." Do not use this in your systems!

        Key generation:
            -- generate N = p q , for two primes p, q
              [fact: for N of this form, phi(N) = (p-1)(q-1)]
            -- identify e such that e and phi(N) are relatively prime.
            -- identify d such that 
                d*e = 1 mod phi(N)
            -- return N, e, d

            -- public key: (N,e); private key: (N,d)
    
        Encrypt(pubkey (N,e), msg m):

            regard m as an element in the group ZN*.
                
            c = m^e mod N

        Decrypt(secretkey (N, d), ciphertext c) 

            m = c^d mod N


        This scheme is insecure, so we won't and can't show the
        second "want/need" above. But we can show the first:

         (m^e)^d mod N = m^(ed) mod N 
                       = m^(k*phi(N) + 1) mod N,  [b/c ed=1 mod phi(N)]
                       = m^(k*phi(N))*m mod N,    [rewriting]
                       = m mod N                  [fact (3)]
                       = m

    Digital signatures

        Each party has a public-key pair: (pk, sk)
        pk is public key; broadcast to the world
        sk is secret key; guard it!

        Setup: two algorithms: Sign, Verify (and a key generation
        algorithm, which we will mostly neglect)

        sig   <-- Sign(sk, msg)
        {0,1} <-- Verify(pk, msg, sig)

        Want/need:

            ** For all msg: Verify(pk, msg, Sign(sk, msg)) = 1

            ** Without sk, an adversary cannot forge signatures,
            i.e., if adversary does not have sk but tries to
            produce msg, sig pairs, then

                Verify(pk, msg, sig) = 0

        One can implement an insecure version of digital signatures
        with Textbook RSA.

            Key generation:
                Same as earlier

            Sign(secretkey (N,d), msg m):
                s = m^d mod N

            Verify(pubkey (N,e), msg m, sig s):
                
                if (s^e mod N == m)    return 1
                else                   return 0


        For this reason, some people say that digital signatures are
        the "inverse" of encryption, but this view obscures more
        than it illuminates, so we encourage you to keep the two
        concepts separate.

        Also, the scheme above is ridiculously insecure. Why?

            adversary chooses arbitrary s' in ZN*.

            adversary sets the message m' to s'^e. 

            adversary ouputs (m', s'). This is a forgery: it will pass
            verification, but the owner of sk never created this
            (msg, sig) pair.

        Other attacks possible too.

        A modification is to hash the message first with a
        collision-resistant hash function, but one again needs to be
        careful.

    Use of signing + Diffie-Hellman key exchange:

        S-->C: (g, p, g^x)_[signed]
        C-->S: g^y

3. Web security

    A. Intro

    We are switching gears.

    Now we are going to study isolation between *sites* in a
    *client-side* Web browser.

    Overall plan is called the "same-origin policy" (SOP)
        --The SOP is described in _The Tangled Web_
        --New mechanisms have come out since "The Tangled Web"
            But mostly adding to the design, rather than replacing it

    What's the top-level motivation for the SOP?
        
        Browser visits site foo.com. Assume:
            foo.com malicious. 
            foo.com delivers adversarial JavaScript to the browser.

        What could go wrong? Assume no SOP.

        Ex.1. User is also visiting site bar.com
            --The JavaScript in foo.com manipulates the DOM
              (JavaScript's representation of the page) for bar.com.
            --Can deface bar.com
            --Can also rewrite links in bar.com so that when the user
            clicks on 'Enter password', the password is delivered to
            foo.com

        Ex.2. User isn't visiting another site.
            --But the JavaScript from foo initiates Web requests to moo.com,
            where the user is authorized (perhaps moo.com stores private
            data for the user). The authority might be encapsulated in
            the cookie that the browser presents to moo.com.
            --Now the restricted data is living in program objects (the
            DOM, JavaScript variables) that foo.com's code can
            manipulate. This is a problem, because:
            --The adversarial JavaScript can then issue Web
            requests to foo.com to exfiltrate the content of the
            restricted data:
                http://www.foo.com/heres-the-users-data-for-moo-com?abc.....

        The SOP prevents these and other issues by regarding each site
        as a separate *security principal*, or origin, and imposing the
        rule that JavaScript from one origin may not modify the DOM of a
        page from a different origin. This addresses Ex1. The SOP also
        stipulates that JavaScript from one origin may not issue HTTP
        requests to a different origin, which addresses Ex2.

        The SOP is big and complicated.... let's take a step back


    How did the browser security plan come about?

      Origin: Netscape browser introduced SOP when adding support for Javascript

      Incremental design/development: no single coherent design.
        Noone expected web browsers to be used in the ways they are today.
        Security issues patched as they were discovered, with extra rules/checks.

      Browser vendors competed (and to some extent still compete) on functionality.
        Adding new features (or even security mechanisms) before standards.

      Historically, W3C has largely been documenting what browsers already do,
        instead of proposing new standards that browsers will then implement.

      Browsers didn't always agree on overall plan, or the implementation details.
        Browser vendors do something that roughly resembles the specs
        Many quirks. See quirksmode.org. 
        As a result, many inconsistent corner cases that can be exploited. 

      Now, there's quite a bit of collaboration "behind the scenes".
        Developers of Chrome, Firefox, IE talk to each other a fair amount.

      Important issues get fixed slowly over time.
        Compatibility is a huge constraint, hard to break old sites.
        (Users will stop using your web browser!)

      Some of the fixes take place in the browser and Javascript libraries (jQuery, etc).
        When possible, just a compatibility layer on top of raw browser APIs.

      Some of the improvements through new headers
        E.g., Content-Security-Policy

      Many of the attacks we will talk about in class are more difficult to pull off
        E.g., most of the attacks we will see in lab5 don't work with Chrome

 
    B. Background, threat model, setting

    What is the Web, really?
    
        In the old days, it was like in lab1: a simple client/server
        architecture (client was your web browser, server was a machine
        on the network that could deliver static text and images to your
        browser).

        The web has changed: now the browser is very complicated.

          --JavaScript: Allows a page to execute client-side code.

          --DOM model: Provides a JavaScript interface to the page's HTML, allowing the
           page to add/remove tags, change their styling, etc.

          --Cookies: storage in browser, used for e.g. user authentication

          --XMLHttpRequests (AJAX): Asynchronous HTTP requests.

          --Web sockets: Full-duplex client-server communication
           over TCP.

          --Web workers: Multi-threading support.

          --Multimedia support: <video>, web cams, screen-sharing.

          --Geolocation: Browser can determine your location by examining GPS
           units. Firefox can also locate you by passing your WiFi information to the
           Google Location Service.

          --<canvas> and WebGL: Bitmap manipulation and interactive 2D/3D graphics.

          --NaCl: browser can even run native code supplied by others!

    We will focus on the browser for this unit. Goal is isolation among
    sites within browser.

    Threat model / assumptions (are they reasonable?)

        Attacker controls his/her own web site, attacker.com.
            --Inevitable, with some other domain name.
        Attacker's web site is loaded in your browser.
            --Advertisements, links, emailed links, etc.
        Attacker cannot intercept/inject packets into the network.
            --Network security (SSL, TLS, etc.) addresses that 
        Browser/server doesn't have implementation bugs (e.g., buffer overflows).
            --We've tried to address server bugs. Browser is inevitably
            part of the TCB (Trusted Computing Base). Research has
            studied how to shrink it.

    Web applications often contain several types of content from
    multiple principals. As an example:
            http://foo.com/index.html

      +--------------------------------------------+
      |  +--------------------------------------+  |
      |  |        ad.gif from ads.com           |  |
      |  +--------------------------------------+  |
      |  +-----------------+ +------------------+  |
      |  | Analytics .js   | | jQuery.js from   |  |
      |  | from google.com | | from cdn.foo.com |  |
      |  +-----------------+ +------------------+  |
      |                                            |
      |        HTML (text inputs, buttons)         |
      |                                            |
      |  +--------------------------------------+  |
      |  | Inline .js from foo.com (defines     |  |
      |  | event handlers for HTML GUI inputs)  |  |
      |  +--------------------------------------+  |
      |+------------------------------------------+|
      || frame: https://facebook.com/likeThis.html||
      ||                                          ||
      || +----------------------+ +--------------+||
      || | Inline .js from      | | f.jpg from https://
      || | https://facebook.com | | facebook.com |||
      || +----------------------+ +--------------+||
      ||                                          ||
      |+------------------------------------------+|
      |                                            |

    Question: Which pieces of JavaScript code can access which pieces of state?
    For example . . .
        *Can the analytics code from google.com access state in the jQuery code
         from cdn.foo.com? [Seems maybe bad since different principals wrote the
         code, but they are included in the same frame . . .]
        *Can  the  jQuery code  from  cdn.foo.com  access  state in  the  inline
         JavaScript code  defined by  foo.com? [They're  *almost* from  the same
         place . . .]
        *Can the analytics code or jQuery access the HTML text inputs? [We've
         got to make that content interactive somehow.]
        *Can JavaScript in the Facebook frame touch any state in the foo.com
         frame?  Does it matter that the Facebook frame is https://, but the
         foo.com frame is regular http://?


    Browsers answer these questions with the SOP. But in order to
    understand the SOP, we need a few more concepts. Next time...

---------------------------------------------------------------------------

References:

    A good text is "Modern Cryptography", by Jonathan Katz and Yehuda
    Lindell.