Cool stuff! We use a similar process internally to rerank and filter our cold ou...

jayunit · 2025-07-16T17:29:58 1752686998

Very interesting! What are some examples of criteria that you can evaluate pairwise, but couldn't score individually?

ashwindharne · 2025-07-16T18:17:38 1752689858

It's all unstructured text (title, company, company size, experience, skills, raw text, etc.) and LLMs are pretty bad at assigning numerical scores in a vacuum. To make it work, we'd have to provide a representative set of examples, break scoring down by specific field, etc.

Kind of a lot of work compared to just dumping the text of 2 profiles into a context window along with a vague description of what I want, and having the LLM make the binary judgment.

bravura · 2025-07-16T18:00:36 1752688836

Pairwise rank constraints involve fewer assumptions that per-item scoring about the underlying nature of the data, thus they are more robust.

npip99 · 2025-07-16T19:33:56 1752694436

Yeah that's exactly what we observed. Our goal was to create an absolute score that's completely independent from the Corpus, which is difficult because naturally all ELO distributions are inherently tied to the corpus itself!

When we were exploring the mathematical foundations, we considered ELO scoring against a "Universal Corpus" based on the natural entropy of human language (Obviously that's intractable, but sometimes this term cancels out like in the DPO proof).

But eventually we figured out a method using cross-query comparisons to assign an "ELO bias" to all document ELOs within a given query's candidate list. This normalizes it correctly such that when a candidate list is all bad, the ELOs shift low. And when the candidate list is all good, the ELOs shift high. Even when the relative ELOs are all the same.