Jan 1, 0001 · 863 words · 5-minute read

LLM-Assisted Lit Review Workflow and Reflections on the Future of White Collar Work

Below is my current workflow for conducting a literature review using LLMs.

High-level reflections:

Different LLMs are good at different parts of a task. Or, what we usually think of as a single task (“conducting literature review”) is actually 10 tasks: finding the relevant areas of research or relevant topics, locating the relevant papers, physically downloading the papers or retrieving the papers to your disc storage, pre-processing the text, processing the text, getting actionable insights from the text.
Jevon’s Paradox is real: More work will be created – what was previously “future researchers should look into XYZ” or “others who specialize in ABC can study” is now expected from you. Once norms shift, what you will be expected to accomplish per hour.
LLMs make mistakes (truncate text, hallucinate). But there are intermediary steps you can ask them to take to make their processes more legible.
The most cumbersome/difficult to automate step of all this is actually downloading the papers. Take that as an analogy for all white collar work.
LLMs progress very quickly – if your main complaint is the models hallucinate / just read the first 10 pages / won’t download the papers, you just haven’t found the right way yet.
I made tremendous amoutn of improvement on my workflow just by asking Claude Code or ChatGPT. If Claude Code mostly wrote itself, you can definitely improve your workflow by asking.
Token economics: right now I can afford to do all this verification for $200/mo but when it’s not subsidized hwo will it work relative to human?
Some kinds of jbos/tasks rquire processing large quantiites of documents (e.g. quantitative history research, equity research, the discovery phase of a trial). But some require close reading of a few documents (e.g. certain types of history research, contract and terms of service drafting)

Different Kinds of Lit Reviews

For structured lit review (e.g. RCTs where you want to look at studies by number of patients/size of dose, effect size OR observational studies where you know the parameters (e.g. size of treatment and outcome)), you can use Elicit and the like.

For lit reviews where the graph conveys almost all fo the information you need (e.g. CS papers, medical papers where the illustrated mechanism is of the utmost importance), perhaps only use #1 to find the papers. Human reading graphs is much more efficient than machines.

1. Finding the Right Literatures

My solution is to ask ChatGPT in Heavy Thinking and/or Research mode in Chrome.

First of all, you need to be on the research or pro or whatever the expensive models that allow internet connection and allow thinking. This is because if you don’t, it could hallucinate. It may not have the, it may not search all, it may not be comprehensive in its search (precision or recall?).

As to what models to use, I’ve found that, as of Feb 2026, ChatGPT (with Heaving Thinking or Research) provides the best answers. Gemini (Pro) doesn’t know the hierarchies of academic publishing and the relationships of different subfields to each other, so the papers it suggests, although relevant, are not as good. Claude (Opus 4.6) is getting close to ChatGPT, but not as comprehensive in its search, for whatever reason.

I recommend asking ChatGPT.com (the web version) in Chrome because Chrome has Claude and Zotero extensions that we’ll use in later steps.

2. Downloading the Papers

3. Reading the PDFs

The model by default will batch read the PDFs and summarize the literature in a matrix/table format (e.g. “Paper Title,” “Research Question,” “Research Method,” “Findings”). This way of doing a lit review is problematic for the following reasons:

LLMs may skip important information during the batch reading to save on tokens. You have no way of verfiying because these are all new work to you.

some common modes of faliure I’ve observed:

LLM reads the first and the last page(s), skipping the middle
LLM does the right thing by reading the text in smaller chunks, but sentences adn apragraphs that sit at the border are not processed properly. (e.g. a paragraph has 2 sentences; sentence 1 sits in the previous chunk and gets ignored because it’s not a complete thought; sentence 2 sits in the next chunk but alsog ets ignored for teh seame reason)

You have no idea what about these papers will be relevant for your particular project.
One paper may have multiple ideas you want to rely on/cite, and LLMs by default will tend to group your papers by theme/idea.

Claude Read tool: PDFs are supported, but: max 20 pages per request, and for large PDFs (10+ pages) I must specify page ranges

[LEARN:large-files] When reading large files (>30K chars), tile with 3+ non-overlapping chunks (e.g., cut -c 0-20000, cut -c 20000-40000, cut -c 40000-60000) instead of reading from head and tail — the middle gets missed. Alternatively use awk by character offset. Always verify full coverage before writing summaries

4. Verification

Explicitly build in a separate agent to verify the output for completeness and accuracy (e.g. of quotes). Assume the model is by default is lazy adn can’t be trusted.

Different Kinds of Lit Reviews

1. Finding the Right Literatures

2. Downloading the Papers

3. Reading the PDFs

4. Verification

Read more