howaboutqiu

DistiLLM 03

tl;dr
2024年4月 LLM 领域重要动态汇总,涵盖 Mistral 8x22B、Llama 3、Phi-3、OpenELM 等模型发布,幻觉检测技术进展,以及 Python 工具链、深度学习框架等周边技术发展。

Here I am, again during my quick lunch break, bringing the third installment in our series where I curate some of NLP topics/blogs/papers. I must confess, the task of keeping up with the fast-paced world of NLP while juggling my own schedule has been overwhelming lately. In fact, as I type this, there’s a pile of travel bags eyeing me, begging to be packed for my upcoming trip. So, I find myself wondering if, perhaps, a good old “copy and paste” might be the way to go—just for this month.

What happened in April 2024?

Something not happened in April, or not about LLM, but

My notes on hallucination

Been studying Representation Engineering previously mentioned recently, and spent some time on hallucination.

The current status quo of hallucination spotting is empirical based: once you see it, you claim it is hallucination. And as the time goes by, you might have an overall impression about how often a model hallucinates, even though you are not sure if your prompt is controlled. Or, under a different scenario, what if a prompt is altered, will the model hallucinate as before, or totally differently?

We don’t have a universal answer to these.

Plus, what is the ultimate goal of hallucination evaluation? Just saying one model is superior to the others? Is it possible that model A hallucinates in area X, while model B hallucinates more in area Y?

Stumbled upon this leaderboard (plus an associated model) from Vectara:

#自然语言