DistiLLM 01

I don’t like how social media sometimes equate ‘sharing’ with actually ‘reading’ a good article. Occasionally, a shared piece turns out to be a time-waster after investing 15 minutes deciphering thousands of words. To fulfill two objectives: 1) challenging myself to read 52 papers this year as a New Year’s resolution, and 2) sparing you time amid information overload, I present you this:

An Untitled LLM-related Reading Curation DistiLLM

Alternatively, interpret it as:

Hacker News posts mostly filtered by the keyword “LLM” that don’t actually suck.

I hope this self-learning process could benefit some strangers somehow.

What happened in Feb 2024?

Let’s get started with some name-dropping:

ℹ️ Let’s get started. Performance wise, Gemma is slightly better than Mistral 7B v0.2 Instruct in terms of safety, but pretty close to or worse than that in instruction following.

This might be my read of the month:

A bit taste of other interesting articles before ending this month’s contents

These are probably not related to LLM, but personally I find them interesting to share:

  1. “Monosemanticity” is a neural network design that each neuron is “dedicated to a single, specific concept.” This one-to-one design increases the model’s interpretability. ↩︎

  2. Matryoshka means the nested Russian dolls. ↩︎

  3. An example of model routing: if we know in advance that for a prompt, users with prefer model A’s response over model B’s, and model A is usually cheaper/faster, then we can route this prompt to model A. ↩︎

#nlp #distillm