Code, Cook, Sleep, repeat.

It’s my third-last week in Hacker School, and people seem to play around a lot with ASTs (mostly in Python with provides run-time AST modification - how cool is that!?!). No wonder, after Paul Tagliamonte demonstrated how much fun it is!

I’ve been using them to create my learning tool that overloads JS operators with more transparent versions modified from V8 code.

This is a simplified walkthrough of the essential parts of the overloading process.

The REPL is still under construction but I’ll link it to this article when I get it to a stable state.

Goal

For my REPL I need to overload javascript operators and functions with my own functions. Since JS doesn’t have operator overloading, I’ll need to take the code in, replace the things I want to overload, and run the code.

Other possible uses for similar pipeline are syntax checking, adding logging magically to all functions in the codebase, code minification / obfuscation lisp/c-like macros etc. So this is very powerful for most kind of JS preprocessing.

Say I’d like to transform

var a = 4 + "foo";

var a = ADD(4, foo);

Seems clear? Let’s see how we’ll go about doing it.

Summary of actions

Create an abstract syntax tree from the code
Replace the function calls (from the tree)
Re-create the JS-code (from the syntax tree)
Run the modified code (good old eval)

How

Getting ready

First thing is to include esprima.js, which you can get through bower. Esprima parses your JS code to an abstract syntax tree (AST). This is a tree-representation of your program, which can be manipulated more easily and systematically than text-shaped code. You can play with the esprima demo and see what kind of ASTs your code produces.

To get it running on your own machine, first step is naturally to install it.

bower install esprima

Include in your JS, and you’re ready to dabble!

1. Creating abstract syntax trees

I want to see the syntax trees my different rows generate, so my test is

var ast = esprima.parse("var answer = 34 + 8"),
    ast2 = esprima.parse("var answer2 = ADD(34, 8)")

Which means I can see how the syntax trees differ for my source and target tree.

{
  "type": "Program",
  "body": [
    {
      "type": "VariableDeclaration",
      "declarations": [
        {
          "type": "VariableDeclarator",
          "id": {
            "type": "Identifier",
            "name": "answer"
          },
          "init": {
            "type": "BinaryExpression",
            "operator": "+",
            "left": {
              "type": "Literal",
              "value": 34,
              "raw": "34"
            },
            "right": {
              "type": "Literal",
              "value": 8,
              "raw": "8"
            }
          }
        }
      ],
      "kind": "var"
    }
  ]
}

The ast2 part seems otherwise the same, but init-part is different..

  "init": {
    "type": "CallExpression",
    "callee": {
      "type": "Identifier",
      "name": "ADD"
    },
    "arguments": [
      {
        "type": "Literal",
        "value": 34,
        "raw": "34"
      },
      {
        "type": "Literal",
        "value": 8,
        "raw": "8"
      }
    ]
  }

2. Replace the function calls

So I’ll make a function that overwrites the former with the latter:

function replacePlusByADD(node) {
    // get operands from +
    var a = node.left,
        b = node.right;

    node.type = "CallExpression";
    node.callee = {
      "type": "Identifier",
      "name": "ADD"
    };
    node.arguments = [a, b];

    // reset unnecessary properties
    node.left = null;
    node.right = null;
    node.operator = null;

    return node;
}

I’ll use estraverse for the AST traversal

var ast  = esprima.parse(code);
estraverse.traverse(ast, {
    enter: function(node) {
        if (node.type === "BinaryExpression") {
            replacePlusByADD(node);
        }
    }
});

3 & 4. Re-create the JS-code and run

This is really just

var modified_code = escodegen.generate(ast);

eval(modified_code);

And Voilà! We have just created a JS pre-processor.

I found playing around with these tools, very much fun, in addition to helping me make pretty complex things in very few lines of code.

More resources: Esprima tutorial Fun with Esprima

Hash extension attack

We had Filippo Valsorda hold a small presentation at Hacker School today. The topic has how to exploit a hash extension vulnerability.

Problem lies in a combination of how hash-functions work, and unsafe ways of transmitting an API-secret in a request. The example-case was VIMEO’s old API that had this vulnerability some years ago (nowdays the vulnerability is fixed).

About MD5-hashes

MD5 hashes can’t be reversed, it’s a one-way functions.

This means checking equality a server checks hashes a string and compares it with the received hashed string. The actual passwords are never compared, just the hashes.For ex. a naive hashed password check would mean taking a password from the database and comparing it with the hashed password from client.

MD5 hashses are nearly unique - accidental collisions are extremely rare, but a collision attack is possible. It’s an iterative hash function, which means hashes are created by combining 512-bit (=64 byte) chunks of the input the the hash seed (which is predefined in MD5 spec). The hash is created only based on the input, so the next seed is formed from the created hash etc. This means the hash has to be extensible.

The vulnerability

The extensibility is built so, that there is no secret component. This means if someone captures the hash, they can extend it. This is important.

In this case, the hashed signature string was created by concatenating secret and key-value pairs alphabetically, md5-hashing that string, and adding it as api_sig value to the request. So (in this request) naive pseudocode representation of the request could be:

method: vimeo.test.login
api_key: e1df59512d6e136e5ba0936e8ac273cb
api_sig: hash_md5(api_secret + "api_key" + api_key + "method" + method)

So the hash would be created from string

sec12345api_keye1df59512d6e136e5ba0936e8ac273cbmethodvimeo.test.login

This had two design flaws: - The secret was prepended to the string that was hashed - All the other components (except the secret) is passed in plaintext in the request

If the attacker captures a request (eg. by doing a man-in-the-middle-attack), changes the content, extends the signature hash to match the new request content, she can successfully pretend to be the original user and run any command she wishes (presuming it’s allowed for the original user).

Example of the forged request:

a: api_keye1df59512d6e136e5ba0936e8ac273cbmethodvimeo.test.login\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe8\x02\x00\x00\x00\x00\x00\x00
video_id: 1337
favorite: 1
method: vimeo.videos.setFavorite
api_key: e1df59512d6e136e5ba0936e8ac273cb
api_sig: caf3d8384c2fbf2917c2b78dcf8ac588

api_sig is formed by hashing the whole new request key-value pairs in alphabetical order. An important part is the a-attribute, which contains the old request + padding required by MD5, which can be considered just an implementation detail. The main idea is that the new signature can be constructed from the old request and new added key-value pairs, which may be completely different from the original ones.