I didn't write a single line of code.
Of course no-code doesn't mean no-engineering. This project took a lot more manual labor than I'd hoped!
I wrote a deep dive on the workflow and some thoughts about the future of AI coding and creativity:
But I didn't want to call it a "SimCity" map, though that's really the vibe/inspiration I wanted to capture, because that implies a lot of other things, so I used the term "pixel art" even though I figured it might get a lot of (valid) pushback...
In general, labels and genres are really hard - "techno" to a deep head in Berlin is very different than "techno" to my cousins. This issue has always been fraught, because context and meaning and technique are all tangled up in these labels which are so important to some but so easily ignored by others. And those questions are even harder in the age of AI where the machine just gobbles up everything.
But regardless, it was a fun project! And to me at least it's better to just make cool ambitious things in good faith and understand that art by definition is meaningful and therefore makes people feel things from love to anger to disgust to fascination.
Hot take: "photo realistic" is just a style.
If it doesn't exist and wasn't taken with a camera, then "photo realistic" is just the name of the style.
Same with "pixel art", "water color", and everything else.
still respects what it is but clearly differentiates itself as something new
One day the majority of pixel art in the world, and indeed even photoreal photos, will be generated.
We'll call it pixel art even if the original motivation is gone.
Actually, if you only look at (midtown and uptown) Manhattan, is looks more "pixel art"-y because of the 90-degree street grid and most buildings being aligned with it. But the other boroughs don't lend themselves to this style so well. Of course, you could have forced all buildings to have angles in 45° steps, but that would have deviated from "ground truth" a lot.
This looks like early 2000 2.5D art, like Diablo style.
And hell, I even use three em-dashes! But maybe the fact that I typed them out using hyphens is the telltale sign this is actually human...
Feels like something is missing... maybe just a pixelation effect over the actual result? Seems like a lot of the images also lack continuity (something they go over in the article)
Overall, such a cool usage of AI that blends Art and AI well.
It's very cool and I don't mind the use of AI at all but I think calling it pixel art is just very misleading. It's closer to a filter but not quite that either.
It kind of looks like a Google Sketchup render that someone then went and used the Photoshop Clone and Patch tools on in arbitrary ways.
Doesn’t really look anything like pixel art at all. Because it isn’t.
Everything is just a style now. And these names will become attached to the style rather than the technique.
Otherwise every digital image could be classified as pixel art.
This person shares lots authentic looking ai generated pixel art. This should give the building more realistic pixel art look.
Edit: example showing small houses https://x.com/RealAstropulse/status/2004195065443549691 Searching for buildings
At some point I couldn't fiddle with the style / pipeline anymore and just had to roll with "looks ok to me" - the whole point of the project wasn't to faithfully adhere to a style but to push the limits of new technology and learn a lot along the way
> I spent a decade as an electronic musician, spending literally thousands of hours dragging little boxes around on a screen. So much of creative work is defined by this kind of tedious grind. ... This isn't creative. It's just a slog. Every creative field - animation, video, software - is full of these tedious tasks. Of course, there’s a case to be made that the very act of doing this manual work is what refines your instincts - but I think it’s more of a “Just So” story than anything else. In the end, the quality of art is defined by the quality of your decisions - how much work you put into something is just a proxy for how much you care and how much you have to say.
Great insights here, thanks for sharing. That opening question really clicked for me.
I agree that "push button get image" AI generation is at best a bit cheap, at worst deeply boring. Art is resistance in a medium - but at what point is that resistance just masochism?
George Perec took this idea to the max when he wrote an entire novel without the letter "E" - in French! And then someone had the audacity to translate it to English (e excluded)! Would I ever want to do that? Hell no, but I'm very glad to live in a world where someone else is crazy enough to.
I've spent my 10,000 hours making "real" art and don't really feel the need to justify myself - but to all of the young people out there who are afraid to play with something new because some grumps on hacker news might get upset:
It doesn't matter what you make or how you make it. What matters is why you make it and what you have to say.
I want to add one point: That you make/ ship something at all.
When the first image generating models came out my head was spinning with ideas for different images I'd want to generate, maybe to print out and hang on the wall. After an initial phase of playing around the excitement faded, even though the models are more than good enough with a bit of fiddling. My walls are still pretty bare.
Turns out even reducing the cost of generating an image to zero does not make me in particular churn out ideas. I suspect this is true for most applications of AI for most people.
When I first saw all the items I was like "yea, I'm going to cover my house in custom stuff'. But other then a few personal t-shirts I haven't done anything.
It's like if someone says my job as a SWE is just pressing keys, or looking at screens. I mean, technically that's true, and a lot of what I do daily can certainly be considered mundane. But from an outsiders perspective, both mundane and creative tasks may look identical.
I play around with image/video gen, using both "single prompt to generate" à la nano banana or sora, and also ComfyUI. Though what I create in ComfyUI often pales in comparison to what Nano or Sora can generate given my hardware constraints, I would consider the stuff I make in ComfyUI more creative than what I make from Sora or Nano, mainly because of how I need to orchestrate my comfy ui workflow, loras, knobs, fine tuning, control net, etc, not to mention prompt refinement.
I think creativity in art just boils down to the process required to get there, which I think has always been true. I can shred papers in my office, but when Banksy shred his painting, it became a work of art, because of the circumstances in which it was creative.
Where to place boxes to make good music is not obvious, and typically takes a tremendous understanding of music theory, prior art, and experimentation. I think the comparison to an author or programmer "just pressing keys" is apt. Reducing it to the most basic physical representation undercuts all of the knowledge and creativity involved in the work. While it can be tedious sometimes, if you've thought of a structure that sounds good but there is a lot of repetition involved in notating it, there are a lot of software features to reduce the tedious aspects. A DAW is not unlike an IDE, and there are ways to package and automate repetitive musical structures and tasks and make them easy to re-use, just as programmers have tools to create structures that reduce the repetitive parts of writing code so they can spend more of their attention on the creative parts.
I have no idea how to translate it to actual audio anyone else could hear in any way, apart from learning to ~code assembler~ drag million boxes in DAW.
This gap will be filled.
If you had instead drawn this, there would be charm and fun details EVERYWHERE. Little buildings you know would have inside jokes, there would be references snuck into everything. Who YOU are would come through, but it would also be much smaller.
This is HUGE, and the zoomed out view is actually an insanely useful map. It's so cool to see reality shifted into a style like this, and there's enough interesting things in "real new york" to make scrolling around this a fun thing to do. I have no impression of you here other than a vague idea you like the older sim city games BUT I have a really interesting impression of NYC.
IMO, that's two totally different pieces of art, with two totally different goals. Neither takes away from the other, since they're both making something impactful rather than one thing trying to simulate the impact of the other. Really nice job with this.
Also, does someone have an intuition for how the "masking" process worked here to generate seamless tiles? I sort of grok it but not totally.
Reference image from the article: https://cannoneyed.com/img/projects/isometric-nyc/training_d...
You have to zoom in, but here the inputs on the left are mixed pixel art / photo textures. The outputs on the right are seamless pixel art.
Later on he talks about 2x2 squares of four tiles each as input and having trouble automating input selection to avoid seams. So with his 512x512 tiles, he's actually sending in 1024x1024 inputs. You can avoid seams if every new tile can "see" all its already-generated neighbors.
You get a seam if you generate a new tile next to an old tile but that old tile is not input to the infill agorithm. The new tile can't see that boundary, and the style will probably not match.
More interestingly, not even the biggest smartest image models can tell if a seam exists or not (likely due to the way they represent image tokens internally)
The issue is that the overall style was not consistent from tile to tile, so you'd see some drift, particularly in the color - and you can see it in quite a few places on the map because of this.
Maybe to process the Nano-Banana generated dataset before fine-tuning, and then also to fix the generated Qwen output?
Would you mind sharing a ballpark estimate?
Maybe, though a guy did physically carve/sculpt the majority of NYC: https://mymodernmet.com/miniature-model-new-york-minninycity...
That being said I have three kids (one a newborn) - there's no possible way I could have done this in the before times!
Granted, it was a team effort, but that's a lot more laborious than a pixel-art view.
New York City is being recreated at 1:1 scale inside Minecraft
> The link to this photo or video may be broken, or the post may have been removed. > [Visit Instagram]
Transcribed: "We probably deleted or misplaced documentation of that thing someone spent 21 years creating. We don't care - this way to feed of AI generated cat videos."
I know you'll get flak for the agentic coding, but I think it's really awesome you were able to realize an idea that otherwise would've remained relegated to "you know what'd be cool.." territory. Also, just because the activation energy to execute a project like this is lower doesn't mean the creative ceiling isn't just as high as before.
The reality is, this is a HUGE project to do with no GIS / image gen experience without Gen AI.
Firefox, Ubuntu latest.
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://isometric-nyc-tiles.cannoneyed.com/dzi/tiles_metadat.... (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). Status code: 429.
Edit: i see now, the error is due to the cloudflare worker being rate limited :/ i read the writeup though, pretty cool, especially the insight about tool -> lib -> application
- Chromium: Failed to load tiles: Failed to fetch
- Zen: Failed to load tiles: NetworkError when attempting to fetch resource.
Absolutely loved zooming around to see:
- my old apartment
- places I've worked
- restaurants and rooftop lounges I've been to etc
The explanation of how this was put together was even cooler: https://cannoneyed.com/projects/isometric-nyc
I wonder if for almost any bulk inference / generation task, it will generally be dramatically cheaper to (use fancy expensive model to generate examples, perhaps interactively with refinements) -> (fine tune smaller open-source model) -> (run bulk task).
Interestingly enough, the model could NOT learn how to reliably generate trees or water no matter how much data and/or strategies I threw at it...
This to me is the big failure mode of fine-tuning - it's practically impossible to understand what will work well and what won't and why
- the way they represent image tokens isn't conducive to this kind of task
- text-to-image space is actually quite finicky, it's basically impossible to describe to the model what trees ought to look like and have them "get it"
- there's no reliable way to few-shot prompt these models for image tasks yet (!!)
Transparency also exists, e.g. GPT Image does it, and Nano Banana Pro should have it supported soon as well.
We have a blog post on a similar workflow here: https://www.oxen.ai/blog/how-we-cut-inference-costs-from-46k...
On the inference cost and speed: we're actively working on that and have a pretty massive upgrade there coming soon.
> Oxen.ai supports datasets in a variety of file formats. The only requirement is that you have a column where each row is a list of messages. Each message is an dictionary with a role and content key. The role can be “system”, “user”, or “assistant”. The content is the message content.
Oh, so you're forced to use the ChatML multi-turn conversation format.
gemini 3.5 pro reverse engineered it - if you use the code at the following gist, you can jump to any specific lat lng :-)
https://gist.github.com/gregsadetsky/c4c1a87277063430c26922b...
also, check out https://cannoneyed.com/isometric-nyc/?debug=true ..!
---
code below (copy & paste into your devtools, change the lat lng on the last line):
const calib={p1:{pixel:{x:52548,y:64928},geo:{lat:40.75145020893891,lng:-73.9596826628078}},p2:{pixel:{x:40262,y:51982},geo:{lat:40.685498640229675,lng:-73.98074283976926}},p3:{pixel:{x:45916,y:67519},geo:{lat:40.757903901085726,lng:-73.98557060196454}}};function getAffineTransform(){let{p1:e,p2:l,p3:g}=calib,o=e.geo.lat*(l.geo.lng-g.geo.lng)-l.geo.lat*(e.geo.lng-g.geo.lng)+g.geo.lat*(e.geo.lng-l.geo.lng);if(0===o)return console.error("Points are collinear, cannot solve."),null;let n=(e.pixel.x*(l.geo.lng-g.geo.lng)-l.pixel.x*(e.geo.lng-g.geo.lng)+g.pixel.x*(e.geo.lng-l.geo.lng))/o,x=(e.geo.lat*(l.pixel.x-g.pixel.x)-l.geo.lat*(e.pixel.x-g.pixel.x)+g.geo.lat*(e.pixel.x-l.pixel.x))/o,i=(e.geo.lat*(l.geo.lng*g.pixel.x-g.geo.lng*l.pixel.x)-l.geo.lat*(e.geo.lng*g.pixel.x-g.geo.lng*e.pixel.x)+g.geo.lat*(e.geo.lng*l.pixel.x-l.geo.lng*e.pixel.x))/o,t=(e.pixel.y*(l.geo.lng-g.geo.lng)-l.pixel.y*(e.geo.lng-g.geo.lng)+g.pixel.y*(e.geo.lng-l.geo.lng))/o,p=(e.geo.lat*(l.pixel.y-g.pixel.y)-l.geo.lat*(e.pixel.y-g.pixel.y)+g.geo.lat*(e.pixel.y-l.pixel.y))/o,a=(e.geo.lat*(l.geo.lng*g.pixel.y-g.geo.lng*l.pixel.y)-l.geo.lat*(e.geo.lng*g.pixel.y-g.geo.lng*e.pixel.y)+g.geo.lat*(e.geo.lng*l.pixel.y-l.geo.lng*e.pixel.y))/o;return{Ax:n,Bx:x,Cx:i,Ay:t,By:p,Cy:a}}function jumpToLatLng(e,l){let g=getAffineTransform();if(!g)return;let o=g.Ax*e+g.Bx*l+g.Cx,n=g.Ay*e+g.By*l+g.Cy,x=Math.round(o),i=Math.round(n);console.log(` Jumping to Geo: ${e}, ${l}`),console.log(` Calculated Pixel: ${x}, ${i}`),localStorage.setItem("isometric-nyc-view-state",JSON.stringify({target:[x,i,0],zoom:13.95})),window.location.reload()};
jumpToLatLng(40.757903901085726,-73.98557060196454);100 people built this in 1964: https://queensmuseum.org/exhibition/panorama-of-the-city-of-...
One person built this in the 21st century: https://gothamist.com/arts-entertainment/truckers-viral-scal...
AI certainly let you do it much faster, but it’s wrong to write off doing something like this by hand as impossible when it has actually been done before. And the models built by hand are the product of genuine human creativity and ingenuity; this is a pixelated satellite image. It’s still a very cool site to play around with, but the framing is terrible.
Even worked for all four cardianal directions.
https://www.bing.com/api/maps/sdk/mapcontrol/isdk/birdseyev2...
Slapping a style transfer on top of this is not that complex.
Oh man...
I especially appreciated the deep dive on the workflow and challenges. It's the best generally accessible explication I've yet seen of the pros and cons of vibe coding an ambitious personal project with current tooling. It gives a high-level sense of "what it's generally like" with enough detail and examples to be grounded in reality while avoiding slipping into the weeds.
Personally I'm extremely excited about all of the creative domains that this technology unlocks, and also extremely saddened/worried about all of the crafts it makes obsolete (or financially non-viable)...
[1] https://files.catbox.moe/1uphaw.png
This is a fairly cool and novel application of generative AI[2], but it did not generate pixel art and it's still wildly incoherent slop when you examine it closely. This mostly works because it uses scale to obfuscate the flaws; users are expected to be zoomed out and not looking at the details. But the details are what makes art work. You could not sell a game or an animation like this. This is not replacing anybody.
[2] It's also wholly unrepresentative of general use-cases. 99.99999999% of generative AI usage does not involve a ton of manual engineering labour fine-tuning a model and doing the things you did to get this set up. Even with all of that effort, what you've produced here is never replacing a commercially viable pixel artist. The rest of the world slapping a prompt into an online generator is even further away from doing that.
If you don’t see these tools as a way for ALL of us to more-intimately reach more of our intended audiences,
whether as a musician, marketer, small business, whatever,
then I don’t know if you were really passionate or excited about what you were doing in the first place.
Feature idea: For those of us who aren't familiar with The City, could you allow clicks on the image to identify specific landmarks (buildings, etc.)? Or is that too computationally intensive? I can identify a few things, but it would sure be nice to know what I'm looking at.
Upvote for the cool thing I haven’t seen before but cancelled out by this sentiment. Oof.
That's not to say they're not very important issues! They are, and I think it's reasonable to have strong opinions here because they cut to the core of how people exist in the world. I was a musician for my entire 20s - trust me that I deeply understand the precarity of art in the age of the internet, and I can deeply sympathize with people dealing with precarity in the age of AI.
But I also think it's worth being excited about the birth of a fundamentally new way of interacting with computers, and for me, at this phase in my life, that's what I want to write and think about.
You get your votes back from me.
> If you can push a button and get content, then that content is a commodity. Its value is next to zero.
> Counterintuitively, that’s my biggest reason to be optimistic about AI and creativity. When hard parts become easy, the differentiator becomes love.
Love that. I've been struggling to succinctly put that feeling into words, bravo.
I expect artists will experiment with the new tools and produce incredibly creative works with them, far beyond the quality I can produce by typing in "a pelican riding a bicycle".
an awesome takeaway from this is that self-hosted models are the future! can't wait for hardware to catch up and we can do much more experiments on our laptops!
I don't think there are enough artists in the world to achieve this in a reasonable amount of time (1-5 years) and you're probably looking at a $10M cost?
Part of me wonders if you put a kickstarter together if you could raise the funds to have it hand drawn but no way the very artists you hire wouldn't be tempted to use AI themselves.
Cool project!
I am especially impressed with the “i didn’t write a single line of code” part, because I was expecting it to be janky or slow on mobile, but it feels blazing fast just zooming around different areas.
And it is very up to date too, as I found a building across the street from me that got finished only last year being present.
I found a nitpicky error though: in Brooklyn downtown, where Cadman Plaza Park is, your webite makes it looks like there is a large rectangular body of water there (e.g., a pool or a fountain). In reality, there is no water at all, it is just a concrete slab area.
Makes me feel insane that we're passing this off as art now.
That's what we call a monk's job in Holland. Kudos
Maybe you can use that^ to snap the pixels to a perfect grid
One thing I would suggest is to also post-process the pixel art with something like this tool to have it be even sharper. The details fall off as you get closer, but running this over larger patch areas may really drive the pixel art feel.
It would be neat if you could drag and click to select an area to inpaint. Let's see everyone's new Penn Station designs!
Would guess it'd have to be BYOK but it works pretty well:
https://i.imgur.com/EmbzThl.jpeg
Much better than trying to inpaint directly on Google Earth data
Edit: this submission has a few links that could be what I had in mind but most of them no longer work: https://news.ycombinator.com/item?id=2282466
As you say: software engineering doesn’t go away in the age of AI - it just moves up the ladder of abstraction ! At least in the mid term :)
It's as if NYC was built in Transport Tycoon Deluxe.
I'll be honest, I've been pretty skeptical about AI and agentic coding for real-life problems and projects. But this one seems like the final straw that'll change my mind.
Thanks for making it, I really enjoy the result (and the educational value of the making-of post)!
Very cool work and great write up.
The 3D/street view version is an obvious and natural progression from here, but from what I've read in your dev log, it's also probably a lot of extra work.
All told I probably put in less than 20 hours of actual software engineering work, though, which consisted entirely of writing specs and iterating with various coding agents.
Since the output is so cool and generally interesting, there might be an opportunity for those forking this to do other cities to deploy a web app to crowd source identifying broken tiles and maybe classifying the error or even providing manual hinting for the next run. It takes a village to make a (sim) city! :-)
I must admit I spent way too much time finding landmarks I visited when I last holidayed from Australia. Now I'm feel nostalgic.
Thanks so much for sharing!
Could even extend it to add a "night mode" too, though that'd require extensive retexturing.
Reminds me of https://queensmuseum.org/exhibition/panorama-of-the-city-of-...
I actually have a nice little water shader that renders waves on the water tiles via a "depth mask", but my fine-tunes for generating the shader mask weren't reliable enough and I'd spent far too much time on the project to justify going deeper. Maybe I'll try again when the next generation of smarter, cheaper models get released.
SF/Mountain View etc don't even have one! you get a little piece of the NYC brand just for you!
At first I thought this was someone working thousands of hours putting this together, and I thought: I wonder if this could be done with AI…
You probably need to adjust how caching is handled with this.
I too have been giving cloudflare 5$ for a while now :D
Nicely put.
To me, the appeal of pixel art is that each pixel looks deliberately placed, with clever artistic tricks to circumvent the limitations of the medium. For instance, look at the piano keys here [1]. They deliberately lack the actual groupings of real piano keys (since that wouldn't be feasible to render at this scale), but are asymmetrically spaced in their own way to convey the essence of a keyboard. It's the same sort of cleverness that goes into designing LEGO sets.
None of these clever tricks are apparent in the AI-generated NYC.
On another note, a big appeal of pixel art for me is the sheer amount of manual labor that went into it. Even if AI were capable of rendering pixel art indistinguishable from [0] or [1], I'm not sure I'd be impressed. It would be like watching a humanoid robot compete in the Olympics. Sure, a Boston Dynamics bot from a couple years in the future will probably outrun Usain Bolt and outgymnast Simone Biles, but we watch Bolt and Biles compete because their performance represents a profound confluence of human effort and talent. Likewise, we are extremely impressed by watching human weightlifters throw 200kg over their heads but don't give a second thought to forklifts lifting 2000kg or 20000kg.
OP touches on this in his blog post [2]:
I spent a decade as an electronic musician, spending literally thousands of hours dragging little boxes around on a screen. So much of creative work is defined by this kind of tedious grind. [...] This isn't creative. It's just a slog. Every creative field - animation, video, software - is full of these tedious tasks. In the end, the quality of art is defined by the quality of your decisions - how much work you put into something is just a proxy for how much you care and how much you have to say.
I would argue that in some case (e.g. pixel art), the slog is what makes the art both aesthetically appealing (the deliberately placed nature of each pixel is what defines the aesthetic) but also awe-inspiring (the slog represents an immense amount of sustained focus).[0] https://platform.theverge.com/wp-content/uploads/sites/2/cho...
[1] https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2Fu...
But I didn't want to call it a "SimCity" map, though that's really the vibe/inspiration I wanted to capture, because that implies other things, so I used the term "pixel art" even though I knew it'd get a lot of (valid) pushback...
As with all things art, labels are really difficult and the context / meaning / technique is at once completely tied to genre but also completely irrelevant. Think about the label "techno" - the label is deeply meaningful and subtle to some and almost meaningless to others
It really makes me not have any interest in the actual thing you're showing.
Imagine if Wendy Carlos wrote an article about how it's cool she could use this new piece of technology called a Moog Modular synthesizer to create a recording of Bach, and now harpsichords are dead, acoustic instruments suck, and all Baroque music will be made on modular synthesizers forever more.
Unfortunately with AI it's much worse because it's more like if Wendy Carlos has slurped up all Baroque music ever recorded by everyone and then somehow regurgitated that back out as slop with some algorithmic modifications.
I'm sorry, I'm unable to accept that's where we're at now.
Nope, It was Stalin who said that in regards to his "meatwave" strategy.
It's genuinely astonishing how much clearer this is than a traditional satellite map -- how it has just the right amount of complexity. I'm looking at areas I've spent a lot of time in, and getting an even better conceptual understanding of the physical layout than I've ever been able to get from satellite (technically airplane) images. This hits the perfect "sweet spot" of detail with clear "cartoon" coloring.
I see a lot of criticism here that this isn't "pixel art", so maybe there's some better term to use. I don't know what to call this precise style -- it's almost pixel art without the pixels? -- but I love it. Serious congratulations.