Games People Play - Bayesian filters in recruiting

dpapathanasiou · on March 5, 2008

Somebody has actually implemented this (http://londonmiddleware.org/chaff/), though it's not clear how seriously the results are used in a real candidate evaluation.

raganwald · on March 5, 2008

Thanks for the link, I will add it to the post. +1!

apathy · on March 5, 2008

You end your article with 'Bayesian filters will not outperform a human in hiring'. But that's not the point.

A classifier, and especially a supervised classifier, is really just a tool for intelligence amplification. Make a dumb man perform smarter, and a smart man perform better than almost anyone without the advantage. Similar to providing physicians with a checklist for common procedures which has been developed by iterative analysis of outcomes. It's very hard to become a physician if you're dumb or lazy, but it's quite easy to get fatigued on a 36-hour residency shift, or get complacent if some 'trivial' procedure interrupts your microsurgery specialty. And the stakes are far higher in most surgical interventions.

Naturally, many doctors resist. But the best figure out how to use this to their advantage. It increases the efficiency of the system. Likewise, having the 'advice' of a machine that has been trained on a corpus of good multiple-human-actor decisions, over time, can provide individuals with better judgment than their experience alone. This is more apparent with a larger corpus and finer-grained classification -- eg. multidimensional classification with a huge corpus and eigenclasses of suitability. Game that, and you're smart enough to be in management, most likely ;-)

So, I don't believe you should let your detractors off so easily. Maybe a talented human will outperform an filter with a small corpus. But I'd bet dollars to donuts that, for someone who isn't a full-time interviewer, the assistance of a well-trained filter will increase their acuity and throughput, allowing them to get on with their real jobs and worry less about dumb hires.

You can't really avoid the enthusiasm of junior employees who haven't been burned, and this is another scenario where a filter can help them gauge their judgment by providing a historical perspective. "You know the last guy we hired who interviewed like this, one of your coworkers spent 2 hours a day for 3 months training him, and then we fired him!" That's something you want to avoid, and I have seen this happen at places like Google where you might think they'd be immune. But once you let the dumb or negative folks in, it's all downhill from there.

So -- replace humans? No. Augment them? Yes. It's what computers (and statistical analyses) are meant for!

mynameishere · on March 5, 2008

An employee will cost anywhere from 20K/year to 500K+/year. I think it is worth spending 2 minutes personally reading each resume.

bayareaguy · on March 5, 2008

Reading each resume could take 2 minutes but joining it to the set of open positions could take a lot longer. It's cheap and easy to do when you only have a few positions to fill but depending on your HR's matching algorithm it could be a lot more expensive if you have a lot of open positions.

raganwald · on March 5, 2008

Ah, my old friend whose name is here. Your comment is absolutely true.

If I may ask, are you mentioning it because you think the post is suggesting otherwise? Or are you just mentioning it??

mynameishere · on March 5, 2008

No, you're not suggesting otherwise. But using "Bayesian filtering" (or whatever variation of it) is best on huge data sets. Working manually, I could tell you with near 100 percent reliability which email is spam--better than any filter. It's inefficient for a human to do it, so a process that can remove 95 percent instead of 100 is acceptable. Inefficiency matters less as the data becomes smaller and more important.

Real life example: My current manager has some twisted filter on his brain, whereat he is convinced that a mastery of certain things (like design patterns, or "OO architecture") are extremely important. We were interviewing a while back, and some kid said his 'proudest achievement' was a Pac-Man clone he made. Well, my manager's filter did not include the words "Pac-Man clone" and so we never even looked at it.

Every good candidate in a creative field is going to go outside the bounds of any filter you can come up with, training or otherwise. The better they are, the more likely this is true. A tool that is suitable for flagging "V!agr3" is not necessarily the tool for...identifying good pharmaceutical researchers.

pchristensen · on March 5, 2008

That was the main concern I had about the Bayesian resume filter - would it work at small (or even mid-sized) companies? Sure, with Google getting 10K's or resumes a month, they could mine some monster data out of it, but if you hire a couple people a year and get 100 resumes, do you have enough data?

I guess Reg's point was that even if it isn't perfect, it gives you some data, which is a heck of a lot better than no data.

cstejerean · on March 6, 2008

Some data is not always better than no data. Sometimes data can provide a bias that you don't want. So it has to be the right data that you have. For example knowing that someone attended a prestigious university is likely to create a bias towards hiring that person and ignoring clues that indicate otherwise.

If your classification filter recommends somebody it's possible to let that piece of information provide the same kind of bias. So if the data you have is likely to be unreliable you might want to just ignore it.

raganwald · on March 6, 2008

"So if the data you have is likely to be unreliable you might want to just ignore it."

That prompts me to ask two questions:

1. So should you forget about data, or pay attention to collecting good data? WHich course of action is more important? 2. If you don't make decisions based on data... Just how are you making decisions?

I am serious about question #2. We aren't talking about face to face interviews here, we're talking about looking at 200 resumes and deciding which ten people to call with the expectation of bringing 3-5 of the ten in for interviews.

In my experience, when people tell me they are using their "experience" and "judgment," They are actually using a highly biased process, such as selecting people who went to their University or preferring people who share the same hobbies.

cstejerean · on March 7, 2008

1. try to collect better data if possible. if that's not possible ignore the source of that data (and likely find another source that can provide more reliable data). So I might ignore the automatic resume filter and perhaps judge the fitness of a candidate based on how enthusiastic they seem about work they've done in the past (just an example)

2. You always need to make decision based on data. But sometimes you need to allow your brain to interpret the data for you, even for resumes. If you're getting more resumes than you can handle reading by hand you can implement a spam filter. For example require candidates to solve a sample problem and submit the answer along with their resume.

And as far as gaming by providing stock answer to the coding problems try this: extract actual problems (bugs or new features) from your real application, write automated tests for them and then put the problems online for interested candidates to solve. After a while identify a new problem from your application and put that online. Not only are you finding quality applicants you're solving real problems at the same time.

These are just some thoughts that went through my head by the way, I can't speak for how effective these methods are (or would be).

raganwald · on March 5, 2008

Something to think about--and this may become a small post--is that people already use mechanical filters right now.

People use keywords or other criteria for selecting resumes to read when they use a database like monster.com or workopolis.com to hire. they do not read every resume.

Obviously the expected likelihood of a positive outcome is lower than with placing certain types of ads or asking employees for referrals. But there's an opportunity for employers that use these kinds of databases to get less shitty results than the employers who advertise and read every resume they receive :-)

So we are in agreement of sorts: if youa re asking employees to submit their friends, read every resume. If you are dealing with a hiuge data set--and monster.com is a hiuge data set--filtering helps.

That being said, using a filter--no matter how well trained--on monster.com is still a terrible way to hire. Thus... more posts about playing the hiring game in the future...

Thanks very much for clarifying your point.