Cheers.
Since then, I’ve taken it further with RunMat Accelerate: the runtime now automatically fuses operations and routes work between CPU and GPU. You write MATLAB-style code, and RunMat runs your computation across CPUs and GPUs for speed. No CUDA, no kernel code.
Under the hood, it builds a graph of your array math, fuses long chains into a few kernels, keeps data on the GPU when that helps, and falls back to CPU JIT / BLAS for small cases.
On an Apple M2 Max (32 GB), here are some current benchmarks (median of several runs):
* 5M-path Monte Carlo * RunMat ≈ 0.61 s * PyTorch ≈ 1.70 s * NumPy ≈ 79.9 s → ~2.8× faster than PyTorch and ~130× faster than NumPy on this test.
* 64 × 4K image preprocessing pipeline (mean/std, normalize, gain/bias, gamma, MSE) * RunMat ≈ 0.68 s * PyTorch ≈ 1.20 s * NumPy ≈ 7.0 s → ~1.8× faster than PyTorch and ~10× faster than NumPy.
* 1B-point elementwise chain (sin / exp / cos / tanh mix) * RunMat ≈ 0.14 s * PyTorch ≈ 20.8 s * NumPy ≈ 11.9 s → ~140× faster than PyTorch and ~80× faster than NumPy.
If you want more detail on how the fusion and CPU/GPU routing work, I wrote up a longer post here: https://runmat.org/blog/runmat-accel-intro-blog
You can run the same benchmarks yourself from the GitHub repo in the main HN link. Feedback, bug reports, and “here’s where it breaks or is slow” examples are very welcome.
I haven't granted permission. When I asked ChatGPT it said 'Safari creates an audio capture session even if the user has not granted microphone permission, for capability negotiation, but no audio data leaves the device until permission is granted.' - is that accurate?
Demo video: https://youtu.be/W7ejMqZ6EUQ
Repo: https://github.com/schoblaska/jargon
You can paste an article, PDF link, or YouTube video to parse, or ask questions directly and it'll find its own content. Sources get summarized, broken into insight cards, and embedded for semantic search. Similar ideas automatically cluster together. Each insight can spawn research threads - questions that trigger web searches to pull in related content, which flows through the same pipeline.
You can explore the graph of linked ideas directly, or ask questions and it'll RAG over your whole library plus fresh web results.
Jargon uses Rails + Hotwire with Falcon for async processing, pgvector for embeddings, Exa for neural web search, crawl4ai as a fallback scraper, and pdftotext for academic papers.
What is a viable path in very late 2025 for a hobbyist to get started?