Hacker Newsnew | past | comments | ask | show | jobs | submit | dlwh's commentslogin

Stanford CRFM | Bay Area | Full-Time | On-Site, Hybrid

Foundation models like ChatGPT, PaLM, and Stable Diffusion are transforming the world around us. The Stanford Center for Research on Foundation Models (CRFM; https://crfm.stanford.edu/), which is part of Stanford HAI, is an interdisciplinary initiative that aims to make foundation models more reliable, transparent and accessible to the world. We take on ambitious projects that seek to rigorously evaluate existing foundation models and to build new ones.

We are currently seeking a research engineer to join our engineering team. This is an unique opportunity to work with seasoned engineers who have spent many years in industry as well as PhD students, post-docs, and faculty at CRFM. You will contribute to cutting-edge research, publish papers, gain access to the latest foundation models, and be immersed in the vibrant CRFM community.

You will work on our open source software projects, including:

  - HELM, a framework for holistic evaluation of large language models 
  - Levanter, a framework for transparent and accessible large-scale language model training 

For more information or to apply, please go to https://careersearch.stanford.edu/jobs/research-engineer-213...


I wish this was open to remote at least within California if not further out!


I highly recommend the book 1491, which goes pretty deeply into what we can infer about the pre-Columbian Americas.


I’m currently reading it. It’s mind blowing to see that there were peoples in Americas further than 13k years ago.

For those looking to read it, make sure to get the 400-some page version of this book. There is a similarly named book aimed at grade school kids.

https://www.amazon.com/1491-Revelations-Americas-Before-Colu...

https://www.amazon.com/Before-Columbus-Americas-Charles-Mann...



Thanks for the kind words.

Breeze does a large chunk of (dense) compute via netlib-java, which calls out to "real" lapack if you set it up. Are things really faster than that? Or are you referring to the non BLAS/non Lapack things?


Few things about netlib-java.

1: It's a read only repository now. It's retired. Lack of maintenance will hurt its long term prospects.

2. The license on net lib java's native binaries are not commercial friendly

3. Net lib java does everything on heap with double arrays, we do everything off heap. There's no copying to worry about, and there's a lot lower latency and flexibility with our data buffers.

4. Due to javacpp we have better control and interop with other c++ libraries like opencv. This makes it easier to write native code and use it from java later on. This allowed us to write and maintain all of our own c/c++ code with the same api (see: nd4j there) - https://github.com/deeplearning4j/libnd4j

So yes it ends up being faster in practice for a lot of scenarios. Aside from that, we also have more control over the blas libaries we pick.

This means we also have access to cublas as well as (see below) more configuration and flexibility.

Net lib java tries to be "pure" which, while elegant, isn't practical if you want to benefit from gpus and DL. We implemented the proper shims to make things "just work" from the user's perspective there on top of having more flexibility (see: mkls opemp knobs etc)

Nd4j has its own built in garbage collector and memory management which means we don't have to worry about any strange work arounds when working with cpus/gpus and we can keep off heap buffers in a managed manner.

See:

http://deeplearning4j.org/workspaces

http://deeplearning4j.org/native

In general, "just blas" isn't enough. I know from personal experience. I wrote nd4j after trying to use every java library for matrix compute and all of them fell flat in terms of speed, interop with other c++ libraries, and the need to use java arrays was highly limiting. Over the years, we built up nd4j to handle harder scenarios.

This includes other features like distributed parameter servers among other things.

Other things aside: I like what breeze attempted but it ultimately didn't scratch the itch for me when I was looking hard at the various java matrix libraries (I've tried all of them)

When I originally built out nd4j, it has this backend architecture:

http://nd4j.org/backend.html

It was so we could just use whatever matrix backend we wanted. None of them worked well enough due to the flexibility we needed.

I also had an inherent problem with java based for loops in any setting. We wrote our own forkjoin implementation as well attempting to make it fast and it just couldn't beat plain c.

We've found especially after matrices of size 128 x 128 or so, we hands down beat every JVM out there no matter what language is. The last bit we are working on are smaller matrices.

The other problem we're working on is our sparse support could use some work. The basics are in there but it's not quite ready for prime time yet.

After that, (I'm obviously biased) I don't see how anything could compete with us. Especially after we add our autodiff/pytorch like stack on top of all these primitives.

Hope that helps!


Main author here.

Breeze has breeze-viz, which is very basic but at the time there wasn't anything else. I highly endorse using something else. I personally like http://sameersingh.org/scalaplot/

They're under the same aegis basically because they're all mine. ScalaNLP started out as really being just NLP, but it scope-crept. That said, Epic is a library for structured prediction first and foremost, and one of the main applications of structured prediction is NLP.

Breeze is basically like SciPy and large chunks of it power Epic. It's really the only thing that doesn't belong in the namespace.


I'm glad you're bringing something like this to the JVM/Scala-ecosystem.

There are some things that I've been interested in asking for in a high level scientific computing library. If you're planning on continuing your visualization library can you please come up with some solution for layout specification? Whenever I'm plotting something and I spend 30 minutes getting all of the data in order the last thing I want to do is fight with the plotting library's label positions because they overlap. Or if I say "Let me take this plot, add some more stacked subplots, and show different catagories" I don't want my labels to be perfect but my scatters to be given a 10x10 pixel box to draw into.

On the HP/numerical computing side of things have you looked into implicit GPU operation types? Something that would let you queue up operations that can be run on a parallel computing system. Basically describe complex operations with the high-level object's normal operations. The objects aren't actually calculating anything, they just organize a GPU kernel in the background. As the final stage you can turn the

    gpumat a(3, 5);
    gpumat b(5, 3);
    gpumat gpu_op_queue = (a * b) + (a * b) * 5;

    function(a, b) operation = gpu_op_queue.compile();
    mat output = operation(some_3x5, some_5x3);
In the backend you'd hopefully be able to great your own types like 'cpumat', 'computerclustermat', or 'gpuclustermat'.

If you had some easy way to generically express extremely parallel numerical operations, an abstract way of implementing high-performance back-ends that take those operations and compile them to GPU kernels, and a visualization engine that doesn't feel like it's from the 80s then your library will really take off.

Personally I feel GPU-optimization and fighting with visualization libraries are the two biggest pain points in scientific computing.


Thanks for the questions.

I am very unlikely to take on visualization. I don't acutely need it for what I do, and I am some-but-not-nearly-enough interested in visualization for its own sake. I started to read about the grammar of graphics stuff at one point and decided it was too far down the rabbit hole.

I have looked more into gpu stuff, and agree specifying a compute graph (and then implicitly optimizing it) is more likely to be the future. FWIW, this is basically what XLA (from TensorFlow) and whatever it was FB announced on Friday are doing.

I wrote my thoughts up recently on the Breeze mailing list here: https://groups.google.com/forum/#!topic/scala-breeze/_hEFpnI...

I'm starting to think it through but I'm not sure I have time for that either :(. A 4-month old and a startup take up a lot of time.


So, this is (largely) my project.

Not sure why this is on the front page of HN, but I'm happy to answer any questions.

I'm not really giving these libraries the love they need these days. I mostly started them in grad school before the deep learning revolution really hit my subfield (NLP), and I haven't had time to modernize them. They still have their uses, especially Breeze, which is used in Spark's MLLib and directly by a number of companies.


Sure, but the FP side of Scala is definitely firmly in the advanced static type system camp.


Semantic Machines | Software Engineers and Machine Learning Engineers | Boston, MA and SF Bay Area, CA (Berkeley) | http://www.semanticmachines.com/careers/

Semantic Machines is developing technology to power the next generation of conversational artificial intelligence: AIs that you can actually have a conversation with. Think Google Assistant or Alexa or Siri, but without having to carefully craft commands like you're talking to a Bash shell.

Our team has built much of the core technology underlying Siri and Google Now, and our founders (including both the former Chief Speech Scientist for Siri and the head of UC Berkeley's Natural Language Processing group) have multiple >$100 million exits under their belt.

We're looking to hire a few talented software engineers and machine learning engineers to help build out technology, by expanding our core NLP infrastructure, data processing pipelines, neural net clusters, and backend services.

Experience with natural language processing systems is a plus, as is experience with the JVM (especially Scala), but we're mainly interested in passionate engineers who can learn quickly and work effectively in complex systems.

Please reach out to me directly or email info@semanticmachines.com. Thanks!


Not a neuroscientist, but I do do NLP, and I only lightly skimmed the paper.

This doesn't really speak to UG.

First, you can believe in the structures they purport to show without accepting the existence of UG, by appealing to the existence of general mechanisms in the brain for assembling hierarchical structures, which is equally validated by this experiment.

Second, they looked at two languages with sentences of up to ~7 syllables each with at most two constituents (Noun Phrase Verb Phrase). You can't show any evidence for any hierarchy of interest in 7 syllables. They demonstrated that phrases exist and phrase boundaries exist, but it's entirely possible to have "flat' grammars without interesting hierarchy, especially in simple sentences. If they want to show interesting hierarchy, they should conduct experiments with more interesting structure (say, some internal PPs and some limited center embedding) and show something that correlates with multiple levels of the "stack" getting popped, or something.

It's still interesting work, but as usual oversold by the university press office.


First, you can believe in the structures they purport to show without accepting the existence of UG, by appealing to the existence of general mechanisms in the brain for assembling hierarchical structures, which is equally validated by this experiment.

That was kinda my impression as well, but I don't want to say much more as I'm so far from an expert on this and I'll probably just make an idiot out of myself. Still, as you say, it is interesting work in its own right.


Prosopagnosia[1] is a real condition that affects ≈2.5% of people. You might check out "The Man Who Mistook His Wife for a Hat" if you're interested.

[1] https://en.wikipedia.org/wiki/Prosopagnosia


Thanks for the link, I didn't know about this thing. Sacks's book is actually (I think) somewhere on my desk gathering dust, I'll need to get around to reading it.

Anyway, in my case it's nowhere near that bad. I can recognize faces with enough exposure, but I have problems remembering them when first seeing, as well as enumerating features, imagining the faces and comparing them between each other. I can do all of that, but it takes a lot of effort and time.


to add another:

    System.out.println(0.0 == -0.0); // true
    System.out.println(java.util.Arrays.equals(new double[] {0.0}, new double[] {-0.0})); // false
(It's documented in the contract for Arrays.equals, but still kind of ridiculous.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: