Hacker News Clone

tecoholicNov 29, 2025, 9:32 PM

> Converts an image to a single-page PDF with a hidden text layer using Tesseract. This is the 'State Preservation' step.

Does this mean the text only pdf page is transformed into an image that covers the full page, but the text is still under there. So, any machine based extraction would still get the text, but would probably loose all the bounding box information and regular users cannot just use their mouse to select text anymore?

kumarmNov 29, 2025, 11:36 PM

Seems true and really wish the project included some sample PDF output.

My Text to Speech app uses bounding box to display what text in PDF is being read and would not work well PDF's from this project.

GavCoNov 30, 2025, 9:50 AM

OP here, I added a sample PDF output in the project assets and put screenshots in the ReadMe. The text is selectable after rehydration. would this work with your app?

tecoholicNov 30, 2025, 12:11 PM

Wait! what? This is incredible. Amazing work.

kumarmNov 30, 2025, 7:03 PM

Amazing. Worked really well. Thank you.

lxeNov 29, 2025, 9:03 PM

This is nuts and I absolutely love this. So you convert the PDF into image, edit the image, then convert the image back into a PDF.

thenthenthenNov 30, 2025, 1:34 AM

This is the usual workflow dealing with pdfs (unfortunately)

esafakNov 30, 2025, 10:06 PM

No, it's not, unless you are dealing with scans. Lots of apps let you edit PDFs.

shevisNov 29, 2025, 9:59 PM

A side effect of replacing entire pages with images is that the file size will expand dramatically. Most PDFs only contain a couple of images

falcor84Nov 30, 2025, 4:45 AM

It might be feasible to have an intermediate AI call take the generated image and slice it into individual text and image elements that it would then render into the pdf page

moezdNov 30, 2025, 6:50 AM

Behold, the might of LLMs! Instead of ushering the age of AGI as advertised 6 months ago, now it cleans your PDFs for you.

Many thanks to humanity for failing to standardise PDF and this project for paying interest on that tech debt with datacenter levels of energy consumption.

treetalkerNov 29, 2025, 9:12 PM

I'd love to see clearer examples: a video, or original pdf / command / result pdf. Very cool!

struc_soDec 2, 2025, 4:07 AM

Interesting approach. I've spent a lot of time wrangling PDF internals recently, and the issue is usually maintaining the xref table integrity when you inject new content streams.

Does this approach rewrite the entire file structure on save, or are you appending incremental updates to the EOF? Incremental is safer for corruption, but file size bloats quickly with AI-generated diffs.

perfectritoneNov 30, 2025, 2:23 AM

It's incredible how many hacks there are to make PDFs semi-usable.

itsmevictorNov 29, 2025, 9:26 PM

Very nice! I wonder whether that could be used to get LLMs to annotate pdfs. Say an "agentic" CLI like Claude Code or Gemini-cli reviews a pdf and finds typos, could it use this to annotate the pdf like underlining them in red or something of that sort? That could be nice.

mentalgearNov 29, 2025, 9:27 PM

Nice - but consider adding an animated screengrap like: https://github.com/pythops/oryx

yoavmNov 29, 2025, 10:43 PM

Please don't add an animated gif to your README. Nothing worse than an autoplaying video with no controls, that has 10 frames but takes 5.4MB to download. Github supports normal video files. It allows the user to rewind or pause, and it results in a much smaller file size.

varencNov 30, 2025, 12:22 AM

Generally agreed! though fun point of info: you can use the .avif format to get something that behaves just like a gif (auto-playing, no sound, no controls) but supports modern features (HDR/transparency channel) and is compressed as well as a modern video is, since its just AV1. And it's supported in most all modern browsers these days: https://caniuse.com/?search=avif

ornornorNov 30, 2025, 6:48 AM

I tend to use webm but I’m curious, is avif better (performance, size) for gif?

varencNov 30, 2025, 7:04 PM

Webm is better in many ways, but it doesn't give you gif-like behavior I think. As in, you can't just include it in an <img> tag and a get an autoplaying looping video. Though you can simulate it with <video>.

Basically, .avif is an "animated image" format, like .gif, but .webm is only a video format.

edit: just realized .webp i think can be an animated image! So that seems like the alternative

ornornorNov 30, 2025, 8:02 PM

Thanks

iamflimflam1Nov 30, 2025, 3:31 AM

The lack of examples makes me very reluctant to commit any time to trying this out - despite it being something that I’m interested in.

Has anyone given any it a go? Does it work?

stingraycharlesNov 30, 2025, 3:42 AM

What? There are examples in the repo and even in OP’s post.

I haven’t tried it, but there are plenty of examples.

albert_eNov 30, 2025, 4:14 AM

Do you mean example commands? we see those examples on the githib README, yes,

But people here are probably also looking for example input and output PDFs (or images/screenshots) showing the actual work done to get a sense of what to expect.

iamflimflam1Nov 30, 2025, 4:34 AM

Exactly - if these examples work really well, then include some screenshots.

ThrowawayTestrNov 29, 2025, 9:56 PM

I recently tried to change a single word in a PDF and nearly tore my hair out (thank you LibreOffice) I'll definitely keep this in mind for next time, thank you.

tkfossNov 29, 2025, 10:18 PM

Try photopea next time

albert_eNov 30, 2025, 4:26 AM

Wow - didnt know about this tool for PDF editing - thanks!

https://www.photopea.com/

PS: in my quick test of editing a PDF text -- the output PDF had weirdly added an extra "&" symbol at the end of every existing line of text. will try out more to see if it was something in the input PDF that was causing it.

fzysingularityNov 30, 2025, 3:47 PM

What is photopea built on?

tkfossDec 1, 2025, 10:24 PM

Author does yearly AMAs on reddit, you should look it up.

McNulty2Nov 30, 2025, 12:59 AM

I like the example of updating latest market data. Updating a deck one-off is tedious. Keeping it updated long-term was never going to happen. But now it can

toddmoreyNov 30, 2025, 1:03 AM

I thought it was kinda funny that Google Slide’s own built in “beautify this slide” button converts the whole slide into an uneditable image.

albert_eNov 30, 2025, 4:16 AM

AFAIK -- even the "Designer" feature of Microsoft Powerpoint (now folded under Copilot license I believe) gives slide deigns with shapes etc that are not editable. Thankfully the text remains editable. But if we wnat to ever so slightly modify the suggested design my removing or reshaping some if the shapes ... nopes. Feels like they are worried about humans with taste ripping-off the AI output :D

ohansDec 2, 2025, 3:19 PM

Really cool! I reckon a nice UI would be a good addition

mlpoknbjiNov 30, 2025, 12:01 AM

Somewhat unrelated but can anyone recommend a way to edit the text of a PDF using LLM? Something like AI + acrobat pro?

informal007Nov 30, 2025, 1:37 AM

it will be more excited if i can use this feature in application with GUI, it’s now convenient to check the result after edit the PDF, i need to transfer between CLI and PDF reader

voodNov 30, 2025, 6:38 AM

Congratulations on the release; that's a really good job.

John7878781Nov 29, 2025, 10:23 PM

Love this.

After several iterations of edits, would the image quality decrease?

ZopieuxNov 30, 2025, 2:27 PM

I am disappointed that this doesn't modify the underlying pdf structure (which is a horror show, I know) but instead relies on fairly lossy OCR back&fourths.

I wish an agent with a validation and rendering tools could instead manipulate the structure to accomplish those edits way less destructively, checking its progress with the tools.

mertleeeNov 30, 2025, 1:14 AM

[dead]

sultsonNov 29, 2025, 9:17 PM

[dead]

Show HN: Nano PDF – A CLI Tool to Edit PDFs with Gemini's Nano Banana

Comments