Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the approach I would use is as follows:

0. Get a dictionary.

1. Form a directed graph, with an edge from each word to every word that uses that word in its definition.

2. Remove all words that have no outgoing edges.

3. If you removed some words, go to step 1. Otherwise, all words left in the dictionary are minimal.

EDIT: If anyone knows of a machine-readable dictionary, I'd love to actually do this.



This will not yield a minimal set; in a cycle, it is only necessary to remove at least one word. The problem is thus to delete the minimum number of vertices to remove all cycles. This is the NP-hard Feedback Vertex Set problem. Here's a paper that solves it for a dictionary (there is some more): https://arxiv.org/abs/0911.5703


I checked the comments to check if this paper was mentioned anywhere. Good recommendation.


Looks like you found our answer! Someone's already done the hard work.


This is not necessarily the answer. It's an upper-bound for the answer.


Looks like somebody made txt and json versions of the Oxford Unabridged English Dictionary here: https://github.com/adambom/dictionary. The json version should let build up the graph structures you're talking about pretty easily.


But you will come across a lot of words used in definitions that could easily be replaced with more common words. In some cases the change to the definition would be tiny, in others it might be more significant.


I'd like to see a DAG of WordNet, a database of English synonyms. Mapping single word synonyms solves the problem of common words in definitions.

https://en.wikipedia.org/wiki/WordNet


Came here to say exactly the same! Great minds think alike!


It seems like a good start. Once you do that, you could start finding vertices with large amounts of incoming edges, attempt to redefine the word as a phrase composed of only words still in the graph, remove that vertex, and repeat.


That will get you much closer but it does ignore the ability to apply creativity to definitions to further reduce them. In the end, a machine-driven technique can give an approximate answer to this problem but it will never be the "perfect" answer.


You could try this using WordNet

https://wordnet.princeton.edu/


and word2vec


Doesn't every word in a given definition have an "outgoing edge?"


Yes, but not all words defined in the dictionary are used in a definition.

So if "multitudinous" isn't used in a definition of another word, you remove it from the set. Maybe you then find out that "myriad" was only used in the definition of multitudinous, so you can take myriad out, and so on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: