Show HN: AutoShorts – Local, GPU-accelerated AI video pipeline for creators

https://github.com/divyaprakash0426/autoshorts

Comments

divyaprakashJan 25, 2026, 7:37 AM
I built this because I was tired of "AI tools" that were just wrappers around expensive APIs with high latency. As a developer who lives in the terminal (Arch/Nushell), I wanted something that felt like a CLI tool and respected my hardware.

The Tech:

    GPU Heavy: It uses decord and PyTorch for scene analysis. I’m calculating action density and spectral flux locally to find hooks before hitting an LLM.

    Local Audio: I’m using ChatterBox locally for TTS to avoid recurring costs and privacy leaks.

    Rendering: Final assembly is offloaded to NVENC.
Looking for Collaborators: I’m currently looking for PRs specifically around:

    Intelligent Auto-Zoom: Using YOLO/RT-DETR to follow the action in a 9:16 crop.

    Voice Engine Upgrades: Moving toward ChatterBoxTurbo or NVIDIA's latest TTS.
It's fully dockerized, and also has a makefile. Would love some feedback on the pipeline architecture!
ameliusJan 25, 2026, 1:55 PM
> Multi-Provider Support: Choose between OpenAI (GPT-5-mini, GPT-4o) or Google Gemini for scene analysis

This is the first sentence in your features section, so it is not strange if users don't understand if this tool is running locally or not.

divyaprakashJan 25, 2026, 2:06 PM
Fair point. I used SOTA models for the analysis to prioritize quality, but since the heavy media processing is local, API costs stay negligible (or free). The architecture is modular, though—you can definitely swap in a local LLM for a fully air-gapped setup.
ramon156Jan 25, 2026, 9:59 AM
I don't get this reasoning. You were tired of LLM wrappers, but what is your tool? These two requirements (felt like a CLI and respects your hardware) do not line up.

Still a cool tool though! Although it seems partly AI generated.

rustyhancockJan 25, 2026, 10:06 AM
I've started including a statement of AI usage in my docs.

HN is a niche audience but it seems like it's the first question everyone has when opening a repo.

Which is odd because the first question we should have is, does it work.

Personally I can't see myself ever writing the bulk of the README again, life's too short.

divyaprakashJan 25, 2026, 11:08 AM
Fair points all around. To be transparent: yes, I used an AI coding assistant (Antigravity) to help with the heavy lifting of refactoring the original legacy code and drafting the README. I’m with @rustyhancock on this—I’d rather focus my brainpower on the pipeline logic and hardware integration than on writing boilerplate and Markdown.

However, orchestrating things like decord with CUDA kernels, managing VRAM across parallel processes, and getting audio sync right with local TTS requires a deep understanding of the stack. An LLM can help write a function, but it won't solve the architectural 'glue' needed to make it a reliable CLI tool.

The project is open-source precisely because it’s a work in progress. It needs the 'human touch' for things like the RT-DETR auto-zoom and more nuanced video editing logic. PRs are more than welcome—I'd love to see where the community can push this beyond its current state.

HamukoJan 25, 2026, 11:12 AM
I think my life's too short to ever read your READMEs.
pelasacoJan 25, 2026, 4:05 PM
The life ist too short to read AI generated README, which are clearly not written for humans..
foucJan 25, 2026, 11:21 AM
Seems like the post you're replying to has since been edited to clarify that he's referring to the wrappers that rely on third party AI APIs over the internet rather than running locally.
pelasacoJan 25, 2026, 4:04 PM
You were tired of "AI tools", then you vibe-coded an AI tool to deal with that? Not sure if i get it why it deserves to be on "Show HN"
ithkuilJan 25, 2026, 5:06 PM
The sentence continued with "that were just wrappers ...".
HeartofCPUJan 25, 2026, 9:42 AM
It looks like it’s written by a LLM
divyaprakashJan 25, 2026, 11:36 AM
Guilty as charged. I used Antigravity to handle the refactoring and docs so I could stay focused on the CUDA and VRAM orchestration.
wasmainiacJan 25, 2026, 6:57 PM
This isn’t a job interview, drop the corpo speak. What’s going on with Cuda and vram? We are all friends here.
divyaprakashJan 25, 2026, 8:00 PM
Haha fair enough.The actual internals are basically just one big fight with VRAM. I'm using decord to dump frames straight into GPU memory so the CPU doesn't bottleneck the pipeline. From there, everything—scene detection, hsv transforms, action scoring—is vectorized in torch (mostly fp16 to avoid ooming). I also had to chunk the audio stft/flux math because long files were just eating the card alive. The tts model stays cached as a singleton so it's snappy after the first run, and I'm manually tracking 'Allocated vs Reserved' memory to keep it from choking. Still plenty of refinement left on the roadmap, but it's a fun weekend project to mess around with.
wasmainiacJan 25, 2026, 10:47 PM
Nice! Thanks :) what is ooming?
shaugenJan 25, 2026, 11:15 PM
Out Of Memory-ing.
JgraceJan 25, 2026, 3:28 PM
[flagged]
wasmainiacJan 25, 2026, 4:36 PM
This does not seem local first. Misleading.

Regardless, we need more tools like this to speed social media towards death.

divyaprakashJan 25, 2026, 4:43 PM
If social is heading that way, at least my tool saves you the manual labor of editing the funeral.
wasmainiacJan 25, 2026, 4:52 PM
Huh?
divyaprakashJan 25, 2026, 4:55 PM
I was just joking about your comment on social media's 'death
techjamieJan 25, 2026, 5:16 PM
I watched a video[1] recently that posited the idea of AI slop farms making large, auto-moderated spaces impossible to find meaningful human content in. With the idea that it'll lead to a renaissance for smaller, more personal websites like forums or other niche places to flourish.

I think that sounds a little too convenient and idealistic to be what really happens, but I did find the concept to be a potential positive to what's happening around it. Facebook is already a good portion of the way there, being stuffed with bots consuming stolen or AI content from other bots, with confused elderly people in the middle.

[1] https://youtu.be/_QlsGkDvVHU

Yash16Jan 25, 2026, 11:20 AM
Can I use this for other use cases instead of game videos? I want to create film-style scenes, cinematic elements, and smooth motion effects. I’m also thinking of deploying it as a SaaS and using it for video creation features in my app: https://picxstudio.com/
divyaprakashJan 25, 2026, 11:35 AM
Definitely. The architecture is modular—just swap the LLM prompts for 'cinematic' styles. It's headless and dockerized, so it fits well as a SaaS backend worker
myky22Jan 25, 2026, 9:52 AM
Wow, great job.

I did smth similar 4 years ago with YOLO ultralytics.

Back then I used chat messsges spike as one of several variables to detect highs and fails moments. It needed a lot a human validation but was so fun.

Keep going

divyaprakashJan 25, 2026, 11:36 AM
Great idea. Integrating YOLO for 'Action Following' is high on the roadmap—I'd love a PR for that if you're interested!
8organicbitsJan 25, 2026, 2:21 PM
What's the intended use case for this? It seems like you'd create slop videos for social media. I'd love to see more AI use cases that aren't: uninteresting content people would prefer to avoid.
divyaprakashJan 25, 2026, 2:30 PM
It’s actually designed for your own gameplay—it scans hours long raw session to find the best highlights and clips them into shorts. It's more about automating the tedious editing process for your own content rather than generating "slop" from scratch.
8organicbitsJan 25, 2026, 3:46 PM
Personal consumption is an interesting angle. I'm starting to think AI content is only desirable to the creator, but no one else wants to see the slop.
ares623Jan 25, 2026, 8:04 PM
It’s like dreams.
simianparrotJan 25, 2026, 2:55 PM
Automating editing is by definition making it slop.
JgraceJan 25, 2026, 3:27 PM
[flagged]
mpaepperJan 25, 2026, 12:33 PM
How much memory do you need locally? Is a rtx 3090 with 24gb enough?
divyaprakashJan 25, 2026, 12:36 PM
Yes, more than enough. I have rtx4080 laptop gpu with 12gb vram.
Huston1992Jan 25, 2026, 11:54 AM
big fan of the 'respects my hardware' philosophy. i feel like 90% of ai tools right now are just expensive middleware for openai, so seeing something that actually leverages local compute (and doesn't leak data) is refreshing