I recently built a Chrome Extension that converts any snippet of a website into an isolated React component.
It was one of the most difficult things I've built, but I also think it's pretty cool. So, I thought I'd share how it works.
Converting HTML to valid React was the first step. That was straightforward once I discovered window.getComputedStyle
because I could extract all the CSS properties of an element.
This was the very first version of the algorithm:
window.getComputedStyle
.While this works, the generated code is unusable because all elements have literally every CSS property explicitly set. No human would ever write code like this:
Note: If you know someone who writes React like this, I have something to tell you.
I quickly realized that getting useful React code while preserving the original styling is the primary challenge here.
It's 2024, so naturally my first attempt was to feed the extracted HTML into a LLM. I prompted a LLM to clean up the HTML, remove reundant styles, and "prepare the component for production use."
While this worked ok for simple components, there were major problems:
getComputedStyle
My goal became reducing the amount of CSS properties needed to maintain the same visual look.
The breakthrough came when I realized that we needed the CSS properties that were being explicitly set on the element either via inline styles or via a stylesheet.
Then, only for the explicitly set CSS properties, I could extract and include the value.
My updated algorithm looked like this:
getComputedStyle
on the list of properties to get the values we need to set.The browser has really powerful stylesheet APIs. You can manually create, edit, and query stylesheets in Javascript, which I didn't know until working on this.
On average, the output went from 200+ CSS properties to around 5-10 properties on each element. I was super excited because this meant:
Now that we have the baseline styles, we can trim it down even more.
Often times, websites will have certain global styles. For example, resetting the box-sizing
property is pretty popular:
Obviously, I didn't want to include boxSizing: 'border-box'
on every single div
in the component. So, I wrote this function that looks for shared properties and abstracts them into a top-level style
tag.
On average, this helped reduce the lines of code in the component by ~5%.
Another optimization I made was removing styles that were already being inherited from parent elements.
For example, if you look at this HTML:
The span
element doesn't need to explicitly set color: 'blue'
since it will inherit that from its parent div
. I wrote logic to detect these cases and remove redundant style properties.
This was super effective for properties like:
color
font-family
font-size
line-height
On average, this helped reduce the lines of code in the component by ~10%.
This was an obvious one when you saw the output at this stage. SVGs turn to contribute a lot of noise and bloat the component size. A single SVG is like 200 lines of mumbo jumbo code. And they appeared more often than I originally expected, primarily because of icons.
I now pull the SVGs into their own components and imported them as to make the core component smaller. (Later on, this proved to be extra useful because users tend to have their own icon set regardless.)
In most cases, this helped reduce the lines of code in the component by ~20%.
In CSS, there's a shorthand property for several styles. For example, padding: 10px 50px 20px;
is the same as:
So, I wrote a function that condenses the styles for border, padding, and margin. It's gnarly because the logic differs depending on how many values are specified. In the case of padding, when three values are specified, the first padding applies to the top, the second to the right and left, the third to the bottom. But when one value is specified, it applies the same padding to all four sides:
I'll be honest there's still a few bugs. When the entire internet is your input set, there's a lot of edge cases. One particular bug that I haven't cracked yet is dealing with images that referenced by their relative path.
Here's some real examples if you're curious what the final output looks like:
I've been thinking about open-sourcing the core library, but the code is pretty messy. If you're interested in this, let me know.
Here's the extension if you want to give it a try.
Thanks for reading! This was fun to hack on - Teddy