More

simonmar · on Aug 31, 2021

Sorry about that, the package was still set to private. Try again now?

tclancy · on Aug 31, 2021

All set, thanks!

simonmar · on Aug 31, 2021

Kythe has one schema, whereas with Glean each language has its own schema with arbitrary amounts of language-specific detail. You can get a language-agnostic view by defining an abstraction layer as a schema. Our current (work in progress) language-agnostic layer is called "codemarkup" https://github.com/facebookincubator/Glean/blob/main/glean/s...

For wiring up the indexer, there are various methods, it tends to depend very much on the language and the build system. For Flow for example, Glean output is just built into the typechecker, you just run it with some flags to spit out the Glean data. For C++, you need to get the compiler flags from the build system to pass to the Clang frontend. For Java the indexer is a compiler plugin; for Python it's built on libCST. Some indexers send their data directly to a Glean server, others generate files of JSON that get sent using a separate command-line tool.

References use different methods depending on the language. For Flow for example there is a fact for an import that matches up with a fact for the export in the other file. For C++ there are facts that connect declarations with definitions, and references with declarations.

mrazomor · on Aug 31, 2021

In case using Kythe was an option, what was the rationale for not using it?

One major limitation of Kythe is handling different versions. For example, Kythe can produce a well connected index of Stackage, but a Hackage would have many holes (not all references would be found, as the unique reference name needs the library version). How Glean handles different library versions?

EDIT: the language agnostic view is already mentioned.

Game_Ender · on Aug 31, 2021

Is there an example of using the C++ indexer? I saw hack and JS on your site but missed C++ (Python would also be amazing!).

simonmar · on Aug 31, 2021

We want to open-source the C++ and Python indexers but they're not ready yet - we have to separate them from internal build-system-specific bits.

simonmar · on Aug 31, 2021

I should point out that Glean has evolved quite a bit since that talk!

simonmar · on Aug 31, 2021

There will be more indexers: we have Python, C++/Objective C, Rust, Java and Haskell. It's just a case of getting them ready to open source. You can see the schemas for most of these already in the repo: https://github.com/facebookincubator/Glean/tree/main/glean/s...

simonmar · on June 26, 2015

The graph is sorted by performance, with the worst performing (not necessarily the most common) on the left. We've also done more profiling and optimisation since we took those measurements.

FXL employed some tricks that were sometimes beneficial, but often weren't - for example it memoized much more aggressively than we do in Haskell. Mostly that's a loss, but just occasionally it's a win. When a profile shows up one of these cases, we can squash it by fixing the original code.

What matters most is overall throughput for the typical workload, and we win comfortably there.

simonmar · on June 26, 2015

For example, let's say that one of the things you want to compute is the number of friends of the current user. This value is used all over the codebase, but it only makes sense in the context of the current request (because every request has a different idea of "the current user"). So this is a memoized value, even though in the language it looks like a top-level expression.

Memoization only stores results during a request. It starts empty at the beginning of the request and is discarded at the end, and it is not shared with any other requests. It's just a map that's passed around (inside the monad) during a request.

j_m_b · on June 26, 2015

Thanks for the response. Just trying to expand my brain here =), so I have a followup question.

I always thought of memoization as storing the parameters to, and result of, a function call in a memotable. Doing some quick research, I came across this definition of memoization from NIST that sounds more general "Save (memoize) a computed answer for possible later reuse, rather than recomputing the answer." What I understand from what you said is that when a request is processed, it produces a map that is passed around for the duration of the request.

Something like:

Request -> (some processes) -> memoized map -> Policy Filters

How is the memoized map reused?

simonmar · on June 26, 2015

The memo table (map) is a bit of state that is maintained throughout the request's lifetime. When we compute a memoized value, it is inserted into the map, and if we need the value again we can just grab it from the map instead of recomputing it.

The "automatic" bit is that we insert the code that consults the map so the programmer doesn't have to write it. The map itself is already invisible, because it's inside the monad. So the overall effect is a form of automatic memoization.

simonmar · on June 10, 2014

Here's our paper about the ideas behind Haxl: http://community.haskell.org/~simonmar/papers/haxl-icfp14.pd...

seanmcdirmid · on June 10, 2014

Somehow I think this should have been upvoted to the top.

untothebreach · on June 11, 2014

Simon Marlow and Sean McDirmid in the same thread? Be still my heart!

vdijkbas · on June 12, 2014

Also see the slides of Simon's talk at ZuriHac 2014 last weekend:

http://www.haskell.org/haskellwiki/ZuriHac2014#Talk_by_Simon...

simonmar · on June 10, 2014

He is still committing, but not quite so often :)

Evgeny · on June 14, 2014

Is it a cultural reference? Found it in bestcomments, so looks like many people do get it, but I don't. Genuinely interested having English as a second language.

efnx · on June 10, 2014

I see what you did there, haha.

simonmar · on June 10, 2014

Haxl is the only major project at FB using Haskell right now, but who knows where this will lead?

simonmar · on Nov 7, 2013

Segmented stacks are better even if you have a relocatable stack. GHC switched from monolithic copy-the-whole-thing-to-grow-it to segmented stacks a while ago, and it was a big win. Not just because we waste less space, but because the GC knows when individual stack segments are dirty so that it doesn't need to traverse the whole stack. To avoid the thrashing issue we copy the top 1KB of the previous stack chunk into the new stack chunk when we allocate a new chunk.