I recently realized I don't have a lot of papers to point to as examples of amazing overall presentation (writing, story, figures, engagement). So I asked around on twitter and bluesky and here are some responses.
- A Theory of Usable Information Under Computational Constraints (Tomás Vergara Browne): I really like this one.
- Inherent Disagreements in Human Textual Inferences (Gaurav Kamath): Extremely clear high-level motivation, solid statistical rigour, incredible writing, and really stark findings!
- On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models (Bruce (Zhi) Wen): I liked how they used the same observation-takeaway-discussion-practical-guidance structure to present multiple main results. Also, I find papers from Danqi Chen's lab are generally good at giving you an intuitive sense of the main message.
- Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs and Beyond Language Modeling: An Exploration of Multimodal Pretraining (Myself): "Peter Tong, Ellis Brown, Saining Xie etc have this unique way of writing papers that are long, exploratory and with clear takeaways. The title is something like 'An Exploration of X' with X being something central to the field."
- On the Paradox of Learning to Reason from Data (Michael Rizvi-Martel): When first reading this I felt the idea of sampling reasoning problems in two different ways to isolate "reasoning" from "fitting the distribution" was quite clever! Also clean use of theory to motivate existence of a solution.
- Neural Summarization by Extracting Sentences and Words and Online Large-Margin Training of Dependency Parsers (Desmond Elliott): A Problem Formulation section near the start of a paper can make it clear to the readers about what your model is trying to solve and the data or optimization that you are assuming, e.g. Section 2 in the first and Section 2.1 in the second.
- Fully Character-Level Neural Machine Translation without Explicit Segmentation(Desmond Elliott): Examples of Figures that are very good when you can read them alongside the text, e.g. Figure 3 and Section 4, and Figure 2 and Section 3.
- LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders (Sathwik-70B on Twitter)