<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI &#8211; Deep Core Labs</title>
	<atom:link href="https://deepcorelabs.com/category/ai/feed/" rel="self" type="application/rss+xml" />
	<link>https://deepcorelabs.com</link>
	<description>Building Extraordinary Brands</description>
	<lastBuildDate>Mon, 30 Mar 2026 19:38:48 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://deepcorelabs.com/wp-content/uploads/2015/09/deep-core-labs-logo-small-50x50.png</url>
	<title>AI &#8211; Deep Core Labs</title>
	<link>https://deepcorelabs.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Green Difference Studio — Free Online Green Screen &#038; Chroma Key Tool in Your Browser</title>
		<link>https://deepcorelabs.com/green-difference-studio-free-online-green-screen-chroma-key-tool-in-your-browser/</link>
					<comments>https://deepcorelabs.com/green-difference-studio-free-online-green-screen-chroma-key-tool-in-your-browser/#respond</comments>
		
		<dc:creator><![CDATA[Miro Hristov]]></dc:creator>
		<pubDate>Mon, 16 Mar 2026 01:37:40 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[three.js]]></category>
		<category><![CDATA[Video]]></category>
		<guid isPermaLink="false">https://deepcorelabs.com/?p=5239</guid>

					<description><![CDATA[]]></description>
										<content:encoded><![CDATA[
		<div id="fws_69d41b8c5337d"  data-column-margin="default" data-midnight="dark"  class="wpb_row vc_row-fluid vc_row top-level"  style="padding-top: 0px; padding-bottom: 0px; "><div class="row-bg-wrap" data-bg-animation="none" data-bg-animation-delay="" data-bg-overlay="false"><div class="inner-wrap row-bg-layer" ><div class="row-bg viewport-desktop"  style=""></div></div></div><div class="row_col_wrap_12 col span_12 dark left">
	<div  class="vc_col-sm-12 wpb_column column_container vc_column_container col no-extra-padding inherit_tablet inherit_phone "  data-padding-pos="all" data-has-bg-color="false" data-bg-color="" data-bg-opacity="1" data-animation="" data-delay="0" >
		<div class="vc_column-inner" >
			<div class="wpb_wrapper">
				
	<div class="wpb_video_widget wpb_content_element vc_clearfix   vc_video-aspect-ratio-169 vc_video-el-width-100 vc_video-align-left" >
		<div class="wpb_wrapper">
			
			<div class="wpb_video_wrapper"><iframe title="Green Difference Studio — Free Open-Source Chroma Key Tool in the Browser" width="1080" height="608" src="https://www.youtube.com/embed/FnODFxK4WuE?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe></div>
		</div>
	</div>

			</div> 
		</div>
	</div> 
</div></div>
		<div id="fws_69d41b8c54a6b"  data-column-margin="default" data-midnight="dark"  class="wpb_row vc_row-fluid vc_row"  style="padding-top: 0px; padding-bottom: 0px; "><div class="row-bg-wrap" data-bg-animation="none" data-bg-animation-delay="" data-bg-overlay="false"><div class="inner-wrap row-bg-layer" ><div class="row-bg viewport-desktop"  style=""></div></div></div><div class="row_col_wrap_12 col span_12 dark left">
	<div  class="vc_col-sm-12 wpb_column column_container vc_column_container col no-extra-padding inherit_tablet inherit_phone "  data-padding-pos="all" data-has-bg-color="false" data-bg-color="" data-bg-opacity="1" data-animation="" data-delay="0" >
		<div class="vc_column-inner" >
			<div class="wpb_wrapper">
				<a class="nectar-button jumbo regular extra-color-gradient-2"  style="color: #ffffff; background-color: #00d195;" target="_blank" href="https://deepcorelabs.com/tools/green-difference-studio/" data-color-override="#00d195" data-hover-color-override="false" data-hover-text-color-override="#fff"><span class="start loading">Open Green Difference Studio - Remove Green Screen Online</span><span class="hover">Open Green Difference Studio - Remove Green Screen Online</span></a>
			</div> 
		</div>
	</div> 
</div></div>
		<div id="fws_69d41b8c559ed"  data-column-margin="default" data-midnight="dark"  class="wpb_row vc_row-fluid vc_row"  style="padding-top: 0px; padding-bottom: 0px; "><div class="row-bg-wrap" data-bg-animation="none" data-bg-animation-delay="" data-bg-overlay="false"><div class="inner-wrap row-bg-layer" ><div class="row-bg viewport-desktop"  style=""></div></div></div><div class="row_col_wrap_12 col span_12 dark left">
	<div  class="vc_col-sm-12 wpb_column column_container vc_column_container col no-extra-padding inherit_tablet inherit_phone "  data-padding-pos="all" data-has-bg-color="false" data-bg-color="" data-bg-opacity="1" data-animation="" data-delay="0" >
		<div class="vc_column-inner" >
			<div class="wpb_wrapper">
				
<div class="wpb_text_column wpb_content_element " >
	<div class="wpb_wrapper">
		<p><a href="https://deepcorelabs.com/tools/green-difference-studio/"><strong>Open Demo Online</strong></a> | <a href="https://github.com/deepcorelabs/green-difference-studio"><strong>Source on GitHub</strong></a></p>
<hr />
<p>So a couple weeks ago Corridor Crew dropped their video about CorridorKey — an open-source, AI-powered chroma keyer that uses a transformer network to solve the green screen &#8220;unmixing problem.&#8221; The thing is genuinely impressive. It takes a raw green screen frame and a rough alpha hint, then predicts true foreground color and a clean linear alpha for every pixel, including all the nightmare stuff like motion blur, hair, and out-of-focus edges. They trained it on procedurally generated 3D renders with mathematically perfect alpha data. It outputs 16-bit and 32-bit EXR files for Nuke and DaVinci. Serious tool for serious work.</p>
<p>I actually installed a quanted version <a href="https://github.com/edenaion/EZ-CorridorKey">EZ-CorridorKey</a> on my 4080 Super workstation and it does work. The keying results are legitimately great — better than anything traditional can do on difficult footage. But the premask step is a pain. You need to feed it a decent black-and-white outline of your subject, and getting that right is its own little project. For clean studio footage it&#8217;s fine, but the workflow isn&#8217;t exactly &#8220;drop a file and go&#8221;. The included GVM AUTO / SAM2 / MatAnyone2/ VideoMaMa etc. produced very underwhelming results for me and the tool crashed quite a bit.</p>
<p>That got me thinking. What if you just want to pull a quick key on a talking-head video and export it with alpha? What if you don&#8217;t want to install anything at all? What if you&#8217;re on a laptop with no dedicated GPU?</p>
<p>That&#8217;s how Green Difference Studio happened.</p>
<h2 id="standing-on-shoulders">Standing on shoulders</h2>
<p>I should give credit where it&#8217;s due. The original chroma key shader that got this project started came from <a href="https://www.urbanpixellab.com/realtime-greenscreen-keyer/">Urban Pixel Lab&#8217;s Realtime Greenscreen Keyer</a> (<a href="https://github.com/urbanpixellab/greenscreen-shader">GitHub</a>). Their WebGL shader was the foundation — the hue-based keying approach, the basic spill suppression logic, the general structure of doing chroma math in a fragment shader. From there it got extended pretty heavily with sampled key colors, curve-based threshold falloff, despill depth, choke/feather morphology, and all the other controls, but it wouldn&#8217;t exist without that starting point.</p>
<h2 id="the-whole-thing-was-vibe-coded">The whole thing was vibe-coded</h2>
<p>I&#8217;m not going to pretend this was some carefully architected project with a Jira board and sprint planning. I opened Claude Code, described what I wanted, and started iterating. Every feature in the app was built through conversation — me describing what I needed, sometimes yelling at the screen when the mute button wouldn&#8217;t toggle (SVG <code>hidden</code> attribute, never again), and watching the code take shape in real time.</p>
<p>The shader pipeline, the tracker system, the background frame cache, the export pipeline — all of it came from back-and-forth with an AI pair programmer. Some sessions were smooth. Others involved me typing in all caps because the video was blasting audio during frame extraction for the third time. That&#8217;s vibe coding. You ride the wave and sometimes the wave rides you.</p>
<p>No Figma mockups. No PRD. No architecture diagram. Just &#8220;I want this thing to exist&#8221; and then making it exist, one conversation at a time.</p>
<h2 id="what-it-actually-does">What it actually does</h2>
<p>Green Difference Studio runs entirely in your browser. You drop a video in, and it keys out the green screen in real time using a WebGL fragment shader powered by Three.js. No server, no upload, no waiting for a cloud GPU. Everything stays on your machine.</p>
<p>The keying controls are what you&#8217;d expect from a decent compositor — hue range, saturation floor, light range, edge feather. There&#8217;s spill suppression with despill lift to recover natural skin tones. You can preview the alpha channel to check your matte quality, and use choke/feather to clean up edges.</p>
<p>But the part I&#8217;m most happy with is the tracker system. You can place tween trackers (static points you drag per frame) or mouse trackers (hold mouse on the subject while the video plays, release to stop tracking). Each tracker can be set to Keep or Discard mode with flood-fill-based alpha masking. There&#8217;s an &#8220;auto invert remaining&#8221; toggle that makes everything outside the tracked region transparent (or opaque, depending on mode). It&#8217;s not automatic motion tracking — that&#8217;s on the roadmap — but it&#8217;s surprisingly usable for isolating subjects in tricky shots.</p>
<p>Export gives you WebM with embedded alpha channel, a standalone grayscale matte, or PNG for single frames. The frame cache builds progressively in the background after upload, so you&#8217;re never staring at a loading bar. You see the first frame immediately and start working while thumbnails populate the timeline behind the scenes.</p>
<h2 id="why-browser-based-matters">Why browser-based matters</h2>
<p>CorridorKey requires a minimum 24GB VRAM GPU. That&#8217;s a $1,500+ graphics card. It outputs EXR sequences meant for professional compositing software that costs hundreds or thousands of dollars a year. Even with EZ-CorridorKey making the install easier, you&#8217;re still dealing with Python environments, model downloads, and the premask workflow.</p>
<p>Green Difference Studio requires Chrome. That&#8217;s it.</p>
<p>It won&#8217;t give you the same quality on difficult shots — ML-based unmixing is fundamentally more capable than traditional threshold-based keying for things like hair detail and translucent materials. But for the vast majority of green screen footage — talking heads, product shots, simple VFX work — a well-tuned traditional keyer running at GPU speed in a browser tab gets the job done. And it gets it done right now, on whatever laptop you happen to have.</p>
<h2 id="the-tech-under-the-hood">The tech under the hood</h2>
<p>The rendering pipeline is a Three.js fragment shader that does all the chroma math on the GPU. Spill suppression, edge feathering, alpha generation — it&#8217;s all happening in GLSL. The tracker flood fill runs on the CPU (Web Worker offloading is on the TODO list), and export uses the WebCodecs API for hardware-accelerated encoding with a MediaRecorder fallback for alpha-channel WebM.</p>
<p>Timeline thumbnails are built from a canvas-based frame cache that populates asynchronously using a separate hidden video element — this was one of the trickier problems to solve, because you can&#8217;t seek two different positions on the same video element simultaneously without the browser fighting you.</p>
<p>Other dependencies: GSAP for smooth tracker animations, iro.js for the color picker, noUiSlider for the range controls, and webm-muxer for standalone matte export.</p>
<h2 id="what-s-next">What&#8217;s next</h2>
<p>The README has a proper roadmap, but the highlights:</p>
<ul>
<li><strong>Mask tools</strong> — brush, shape, polygon, and lasso masks for garbage mattes</li>
<li><strong>Automatic motion tracking</strong> — Lucas-Kanade or correlation-based point tracking</li>
<li><strong>Image sequence export</strong> — PNG+Alpha and JPG matte sequences</li>
<li><strong>Better despill</strong> — edge-aware spill suppression for hair and translucent materials</li>
<li><strong>Undo/redo</strong> — full history stack</li>
<li><strong>Web Worker flood fill</strong> — the CPU-heavy tracker math should be off the main thread</li>
</ul>
<p>And the dream entry at the bottom of the list: CorridorKey in the browser. A transformer model running in WebGPU doing neural green screen removal with no install. Probably not happening tomorrow. But WebGPU is maturing fast, and ONNX runtime for web is getting better every month. One day, maybe.</p>
	</div>
</div>




	<div class="wpb_raw_code wpb_content_element wpb_raw_html" >
		<div class="wpb_wrapper">
			<div id="gds-demo">
  <style>
    #gds-demo {
      width: 100%;
      max-width: 900px;
      margin: 2em auto;
      font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
    }
    #gds-demo .gds-video-wrap {
      position: relative;
      width: 100%;
      aspect-ratio: 1 / 1;
      overflow: hidden;
      border-radius: 8px;
      border: 1px solid #333;
    }
    #gds-demo .gds-video-wrap video {
      display: block;
      width: 100%;
      height: 100%;
      object-fit: contain;
    }
    #gds-demo .gds-checker-dark {
      background-image:
        linear-gradient(45deg, #1a1a1a 25%, transparent 25%),
        linear-gradient(-45deg, #1a1a1a 25%, transparent 25%),
        linear-gradient(45deg, transparent 75%, #1a1a1a 75%),
        linear-gradient(-45deg, transparent 75%, #1a1a1a 75%);
      background-size: 20px 20px;
      background-position: 0 0, 0 10px, 10px -10px, -10px 0;
      background-color: #111;
    }
    #gds-demo .gds-checker-light {
      background-image:
        linear-gradient(45deg, #ccc 25%, transparent 25%),
        linear-gradient(-45deg, #ccc 25%, transparent 25%),
        linear-gradient(45deg, transparent 75%, #ccc 75%),
        linear-gradient(-45deg, transparent 75%, #ccc 75%);
      background-size: 20px 20px;
      background-position: 0 0, 0 10px, 10px -10px, -10px 0;
      background-color: #e8e8e8;
    }
    #gds-demo .gds-solid {
      background-image: none;
    }
    #gds-demo .gds-controls {
      display: flex;
      align-items: center;
      gap: 8px;
      margin-top: 10px;
      flex-wrap: wrap;
    }
    #gds-demo .gds-label {
      font-size: 13px;
      color: #555;
      margin-right: 2px;
      font-weight: 600;
    }
    #gds-demo .gds-swatch {
      width: 28px;
      height: 28px;
      border-radius: 5px;
      border: 2px solid #ddd;
      cursor: pointer;
      transition: border-color 0.15s, box-shadow 0.15s;
      flex-shrink: 0;
    }
    #gds-demo .gds-swatch:hover {
      border-color: #888;
    }
    #gds-demo .gds-swatch.gds-active {
      border-color: #333;
      box-shadow: 0 0 0 2px rgba(0,0,0,0.2);
    }
    #gds-demo .gds-swatch-checker-dark {
      background-image:
        linear-gradient(45deg, #1a1a1a 25%, transparent 25%),
        linear-gradient(-45deg, #1a1a1a 25%, transparent 25%),
        linear-gradient(45deg, transparent 75%, #1a1a1a 75%),
        linear-gradient(-45deg, transparent 75%, #1a1a1a 75%);
      background-size: 10px 10px;
      background-position: 0 0, 0 5px, 5px -5px, -5px 0;
      background-color: #111;
    }
    #gds-demo .gds-swatch-checker-light {
      background-image:
        linear-gradient(45deg, #ccc 25%, transparent 25%),
        linear-gradient(-45deg, #ccc 25%, transparent 25%),
        linear-gradient(45deg, transparent 75%, #ccc 75%),
        linear-gradient(-45deg, transparent 75%, #ccc 75%);
      background-size: 10px 10px;
      background-position: 0 0, 0 5px, 5px -5px, -5px 0;
      background-color: #e8e8e8;
    }
    #gds-demo .gds-tip {
      margin-top: 16px;
      font-size: 15px;
      line-height: 1.7;
      color: #333;
    }
    #gds-demo .gds-tip strong {
      color: #111;
    }
  </style>

  <div class="gds-video-wrap gds-checker-light" id="gds-video-bg">
    <video autoplay loop muted playsinline controls src="https://deepcorelabs.com/tools/green-difference-studio/grok-video-0c16267b-ecf0-4cb2-846b-ba012b2b2713.webm?v=0.1"></video>
  </div>

  <div class="gds-controls">
    <span class="gds-label">Background:</span>
    <div class="gds-swatch gds-swatch-checker-light gds-active" onclick="gdsBg(this,'checker-light')" title="Light checkerboard"></div>
    <div class="gds-swatch gds-swatch-checker-dark" onclick="gdsBg(this,'checker-dark')" title="Dark checkerboard"></div>
    <div class="gds-swatch gds-solid" style="background:#1a1a1a;" onclick="gdsBg(this,'#1a1a1a')" title="Dark gray"></div>
    <div class="gds-swatch gds-solid" style="background:#ccc;" onclick="gdsBg(this,'#ccc')" title="Light gray"></div>
    <div class="gds-swatch gds-solid" style="background:#3d2b2b;" onclick="gdsBg(this,'#3d2b2b')" title="Muted brown"></div>
    <div class="gds-swatch gds-solid" style="background:#2e2640;" onclick="gdsBg(this,'#2e2640')" title="Muted purple"></div>
  </div>

  <p class="gds-tip">
    This transparent WebM was generated with an AI text-to-video model prompted to render on a green screen, then keyed with <strong>Green Difference Studio</strong> — completely free, right in the browser. Works great with <strong>any text-to-video or image-to-video tool</strong> if you prompt it to shoot on a green screen. Free transparent videos, no need to wait for the big guys to support alpha channel. <img src="https://s.w.org/images/core/emoji/17.0.2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" />
  </p>

  <script>
    function gdsBg(el, val) {
      var wrap = document.getElementById('gds-video-bg');
      var swatches = document.querySelectorAll('#gds-demo .gds-swatch');

      for (var i = 0; i < swatches.length; i++) {
        swatches[i].classList.remove('gds-active');
      }

      el.classList.add('gds-active');

      if (val === 'checker-light') {
        wrap.className = 'gds-video-wrap gds-checker-light';
        wrap.style.backgroundColor = '';
      } else if (val === 'checker-dark') {
        wrap.className = 'gds-video-wrap gds-checker-dark';
        wrap.style.backgroundColor = '';
      } else {
        wrap.className = 'gds-video-wrap gds-solid';
        wrap.style.backgroundColor = val;
      }
    }
  </script>
</div>
		</div>
	</div>

<div class="wpb_text_column wpb_content_element " >
	<div class="wpb_wrapper">
		<p>About the dream entry at the bottom of the list: CorridorKey in the browser. A transformer model running in WebGPU doing neural green screen removal with no install. Probably not happening tomorrow. But WebGPU is maturing fast, and ONNX runtime for web is getting better every month. One day, maybe.</p>
<p>And if Corridor Crew ever reads this — thanks for the inspiration. CorridorKey is genuinely amazing work and I hope it keeps pushing the industry forward. This little browser tool exists because you made me want to build something.</p>
	</div>
</div>




			</div> 
		</div>
	</div> 
</div></div>
]]></content:encoded>
					
					<wfw:commentRss>https://deepcorelabs.com/green-difference-studio-free-online-green-screen-chroma-key-tool-in-your-browser/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Open Wake Word on the Web</title>
		<link>https://deepcorelabs.com/open-wake-word-on-the-web/</link>
					<comments>https://deepcorelabs.com/open-wake-word-on-the-web/#comments</comments>
		
		<dc:creator><![CDATA[Miro Hristov]]></dc:creator>
		<pubDate>Sat, 12 Jul 2025 03:47:47 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[sound]]></category>
		<category><![CDATA[audio]]></category>
		<guid isPermaLink="false">https://deepcorelabs.com/?p=4455</guid>

					<description><![CDATA[How I Ported a Python Wake Word System to the Browser When the LLMs Gave Up I started this project with a goal that seemed simple on paper: take openWakeWord,...]]></description>
										<content:encoded><![CDATA[<a class="nectar-button n-sc-button jumbo accent-color regular-button"  href="https://deepcorelabs.com/projects/openwakeword" data-color-override="false" data-hover-color-override="false" data-hover-text-color-override="#fff"><span>OpenWakeWord - Web Demo</span></a>
<h2 id="how-i-ported-a-python-wake-word-system-to-the-browser-when-the-llms-gave-up">How I Ported a Python Wake Word System to the Browser When the LLMs Gave Up</h2>
<p>I started this project with a goal that seemed simple on paper: take <a href="https://github.com/dscripka/openWakeWord/" target="_blank" rel="noopener">openWakeWord</a>, a powerful open-source library for wake word detection, and make it run entirely in a web browser. And when I say &#8220;in the browser,&#8221; I mean it. No tricks. No websockets streaming audio to a Python server. I wanted the models, the audio processing, and the detection logic running completely on the client.<br />
My initial approach was to &#8220;vibe-code&#8221; it with the new generation of LLMs. I fed my high-level goal to <strong>Gemini 2.5 Pro, o4-mini-high, and Grok 4</strong>. They gave me a fantastic head start, building out the initial HTML, CSS, and JavaScript structure with impressive speed. But after dozens of messages just refining the vibe, we hit a hard wall. The models would run, but the output score was just a flat line at zero. No errors, no crashes, just… nothing.<br />
This is where the real story begins. The vibe was off. Vibe coding had failed. I had to pivot from being a creative director to a deep-dive detective. It&#8217;s a tale of how I used a novel cross-examination technique with these same LLMs to solve a problem that each one, individually, had given up on.</p>
<h3 id="tl-dr-the-openwakeword-javascript-architecture-that-actually-works">TL;DR: The <code>openWakeWord</code> JavaScript Architecture That Actually Works</h3>
<p>For the engineers who just want the final schematics, here is the stateful, multi-buffer pipeline required to make this work.</p>
<ul>
<li><strong>Pipeline:</strong> <code>[Audio Chunk]</code> -&gt; <code>Melspectrogram Model</code> -&gt; <code>Melspectrogram Buffer</code> -&gt; <code>Embedding Model</code> -&gt; <code>Wake Word Model</code> -&gt; <code>Score</code></li>
<li><strong>Stage 1: Audio to Image (Melspectrogram):</strong>
<ul>
<li><strong>Audio Source:</strong> 16kHz, 16-bit, Mono PCM audio.</li>
<li><strong>Chunking:</strong> The pipeline operates on <strong>1280 sample</strong> chunks (80ms). This is non-negotiable.</li>
<li><strong>Model Input:</strong> The chunk is fed into <code>melspectrogram.onnx</code> as a <code></code> <strong>float32</strong> tensor.</li>
<li><strong>Mandatory Transformation:</strong> The output from the melspectrogram model <strong>must</strong> be transformed with the formula <code>output = (value / 10.0) + 2.0</code>.</li>
</ul>
</li>
<li><strong>Stage 2: Image Analysis (Feature Embedding):</strong>
<ul>
<li><strong>Melspectrogram Buffer:</strong> The 5 transformed spectrogram frames from Stage 1 are pushed into a buffer.</li>
<li><strong>Sliding Window:</strong> This stage only executes when the <code>mel_buffer</code> contains at least <strong>76 frames</strong>. A <code></code> window is sliced from the <em>start</em> of the buffer.</li>
<li><strong>Model Input:</strong> This window is fed into <code>embedding_model.onnx</code> as a <code></code> tensor.</li>
<li><strong>Window Step:</strong> After processing, the buffer is slid forward by <strong>8 frames</strong> (<code>splice(0, 8)</code>).</li>
</ul>
</li>
<li><strong>Stage 3: Prediction:</strong>
<ul>
<li><strong>Embedding Buffer:</strong> The 96-value feature vector from Stage 2 is pushed into a second, fixed-size buffer that holds the last <strong>16</strong> embeddings.</li>
<li><strong>Model Input:</strong> Once full, the 16 embeddings are flattened and fed into the final wake word model as a <code></code> tensor. This <code>[batch, sequence, features]</code> shape is the critical insight that resolved a key error.</li>
</ul>
</li>
</ul>
<hr />
<h3 id="the-unvarnished-truth-my-journey-into-debugging-hell">The Unvarnished Truth: My Journey into Debugging Hell</h3>
<p>After the initial burst of productivity, all three LLMs hit the same wall and gave up. They settled on the same, demoralizing conclusion: the problem was <strong>floating-point precision differences</strong> between Python and the browser&#8217;s ONNX Runtime. They suggested the complex math in <code>openWakeWord</code> was too sensitive and that a 100% client-side implementation was likely <strong>impossible</strong>.<br />
Something about that felt fishy. The separate VAD (Voice Activity Detection) model was working perfectly fine. This felt like a logic problem, not a fundamental platform limitation.<br />
This is where the breakthrough happened. I realized &#8220;vibe coding&#8221; wasn&#8217;t enough. I had to get specific. I decided to change my approach and use the LLMs as specialized, focused tools rather than general-purpose partners:</p>
<ol>
<li><strong>The Analyst:</strong> I tasked one LLM with a single, focused job: analyze the <code>openwakeword</code> Python source code and describe, in painstaking detail, exactly what it was doing at every step.</li>
<li><strong>The Coder:</strong> I took the detailed blueprint from the &#8220;Analyst&#8221; and fed it to a <em>different</em> LLM. Its job was to take that blueprint and write the JavaScript implementation.</li>
</ol>
<p>This cross-examination process was like a magic trick. It bypassed the ruts the models had gotten into and started revealing the hidden architectural assumptions that had been causing all the problems.</p>
<h4 id="the-first-wall-the-sound-to-image-pipeline">The First Wall: The Sound-to-Image Pipeline</h4>
<p>The &#8220;Analyst&#8221; LLM immediately revealed my most basic misunderstanding. I thought I was feeding a sound model, but that&#8217;s not how it works. These models don&#8217;t &#8220;hear&#8221; sound; they &#8220;see&#8221; it.<br />
<strong>Aha! Moment #1: It&#8217;s an Image Recognition Problem.</strong> The first model in the chain, <code>melspectrogram.onnx</code>, doesn&#8217;t process audio waves. Its entire job is to convert a raw 80ms audio chunk into a <strong>melspectrogram</strong>—a 2D array of numbers that is essentially an image representing the intensity of different frequencies in that sound. The subsequent models are doing pattern recognition on these sound-images, not on the audio itself. This also explained the second part of the puzzle: the models were trained on specifically processed images, which is why this transformation was mandatory:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="js">// This isn't just a normalization; it's part of the "image processing" pipeline
// that the model was trained on. It fails silently without it.

for (let j = 0; j &lt; new_mel_data.length; j++) {
  new_mel_data[j] = (new_mel_data[j] / 10.0) + 2.0;
}</pre>
<h4 id="the-second-wall-the-audio-history-tax">The Second Wall: The Audio History Tax</h4>
<p>With the formula in place, my test WAV file still failed. The &#8220;Analyst&#8221; LLM&#8217;s breakdown of the Python code&#8217;s looping was the key. I realized the pipeline&#8217;s second stage needs a history of <strong>76 spectrogram frames</strong> to even begin its work. Each 80ms audio chunk only produces <strong>5 frames</strong>, meaning the system has to process <strong>16 chunks</strong> (1.28 seconds) of audio before it can even think about generating the first feature vector. My test file was too short.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="js">// This logic checks if the audio is long enough and pads it with silence if not.

const minRequiredSamples = 16 * frameSize; // 16 chunks * 1280 samples/chunk = 20480

if (audioData.length &lt; minRequiredSamples) {
  const padding = new Float32Array(minRequiredSamples - audioData.length);
  const newAudioData = new Float32Array(minRequiredSamples);
  newAudioData.set(audioData, 0);
  newAudioData.set(padding, audioData.length);
  audioData = newAudioData; // Use the new, padded buffer
}</pre>
<h4>The Third Wall: The Treachery of Optimization</h4>
<p>The system came to life, but it was unstable, crashing with a bizarre <code>offset is out of bounds</code> error. This wasn&#8217;t a floating-point issue; it was a memory management problem. I discovered that for performance, the ONNX Runtime for web <strong>reuses its memory buffers</strong>. The variable I was saving wasn&#8217;t the data, but a temporary <em>reference</em> to a memory location that was being overwritten.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="js">// AHA Moment: ONNX Runtime reuses its output buffers. We MUST create a *copy*
// of the data instead of just pushing a reference to the buffer.

const new_embedding_data_view = embeddingOut[embeddingModel.outputNames[0]].data;
const stable_copy_of_embedding = new Float32Array(new_embedding_data_view);
embedding_buffer.push(stable_copy_of_embedding); // Push the stable copy, not the temporary view.</pre>
<h4 id="the-final-wall-the-purpose-of-the-vad">The Final Wall: The Purpose of the VAD</h4>
<p>The system was finally stable, and I could see the chart spike to 1.0 when I spoke the wake word. But the success sound wouldn&#8217;t play reliably. This was due to my most fundamental misconception. I had assumed the VAD&#8217;s purpose was to save resources. My thinking was: &#8220;VAD is cheap, the wake word model is expensive. So, I should only run the expensive model when the VAD detects speech.&#8221;<br />
This is completely wrong.<br />
<strong>Aha! Moment #4: The VAD is a Confirmation, Not a Trigger.</strong> The wake word pipeline must run <em>continuously</em> to maintain its history buffers. The VAD&#8217;s true purpose is to act as a <strong>confirmation signal</strong>. A detection is only valid if two conditions are met simultaneously: the wake word model reports a high score, AND the VAD confirms that human speech is currently happening. It’s a two-factor authentication system for your voice. This led to the final race condition: the VAD is fast, but the wake word pipeline is slow. The solution was a <strong>VAD Hangover</strong>—what I call &#8220;Redemption Frames&#8221;—to keep the detection window open just a little longer.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="js">// These constants define the VAD Hangover logic

const VAD_HANGOVER_FRAMES = 12; // Keep speech active for ~1 second after VAD stops

let vadHangoverCounter = 0;
let isSpeechActive = false;

// Later, the final check uses this managed state:
if (score &gt; 0.5 &amp;&amp; isSpeechActive) {
 // Detection is valid!
}</pre>
<p>&nbsp;</p>
<h3 id="the-backend-betrayal-a-final-hurdle">The Backend Betrayal: A Final Hurdle</h3>
<p>With the core logic finally perfected, I implemented a feature to switch between the WASM, WebGL, and WebGPU backends. WASM and WebGPU worked, but WebGL crashed instantly with the error: `Error: no available backend found. ERR: [wasm] backend not found`.<br />
The issue was that the melspectrogram.onnx model uses specialized audio operators that the WebGL backend in ONNX Runtime simply does not support. My code was trying to force all models onto the selected backend, which is impossible when one is incompatible. The solution was a hybrid backend approach: force the incompatible pre-processing models (melspectrogram and VAD) to run on the universally-supported WASM backend, while allowing the heavy-duty neural network models to run on the user&#8217;s selected GPU backend for a performance boost. I&#8217;ve left the WebGL option in the demo as a reference for this interesting limitation.</p>
<h3 id="the-final-product">The Final Product</h3>
<p>This journey was a powerful lesson in the limitations of &#8220;vibe coding&#8221; for complex technical problems. While LLMs are incredible for scaffolding, they can&#8217;t replace rigorous, first-principles debugging. By pivoting my strategy—using one LLM to deconstruct the source of truth and another to implement that truth—I was able to solve a problem that a single LLM, or even a committee of them, declared impossible. The result is a working, robust web demo that proves this complex audio pipeline can indeed be tamed, running <strong>100% on the client, in the browser</strong>, no Python backend required.</p>
<a class="nectar-button n-sc-button jumbo accent-color regular-button"  href="https://deepcorelabs.com/projects/openwakeword" data-color-override="false" data-hover-color-override="false" data-hover-text-color-override="#fff"><span>OpenWakeWord - Web Demo</span></a>
]]></content:encoded>
					
					<wfw:commentRss>https://deepcorelabs.com/open-wake-word-on-the-web/feed/</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
			</item>
		<item>
		<title>Stable Diffusion PNG Prompt Text Extractor</title>
		<link>https://deepcorelabs.com/stable-diffusion-png-prompt-text-extractor/</link>
					<comments>https://deepcorelabs.com/stable-diffusion-png-prompt-text-extractor/#respond</comments>
		
		<dc:creator><![CDATA[Miro Hristov]]></dc:creator>
		<pubDate>Wed, 26 Mar 2025 06:18:23 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[Stable Diffusion]]></category>
		<guid isPermaLink="false">https://deepcorelabs.com/?p=4432</guid>

					<description><![CDATA[A simple tool that extracts hidden prompt text from Stable Diffusion-generated PNG files &#8212; online (in the browser). What it does Upload a PNG file to extract the embedded generation...]]></description>
										<content:encoded><![CDATA[<p class="whitespace-pre-wrap break-words">A simple tool that extracts hidden prompt text from Stable Diffusion-generated PNG files &#8212; online (in the browser).</p>
<h2 class="text-xl font-bold text-text-200 mt-1 -mb-0.5">What it does</h2>
<ul class="[&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc space-y-1.5 pl-7">
<li class="whitespace-normal break-words">Upload a PNG file to extract the embedded generation prompts (stored in iTXt chunks)</li>
<li class="whitespace-normal break-words">Works entirely in your browser &#8211; no server uploads needed, completely client-side</li>
<li class="whitespace-normal break-words">Super fast &#8212; instantly reveals the exact prompt used to create the SD image</li>
</ul>
<p class="whitespace-pre-wrap break-words">Perfect for artists studying prompt techniques, content verification, or understanding how specific AI images were created.</p>
<p class="whitespace-pre-wrap break-words">Try it now to decode the text behind your Stable Diffusion images.</p>
<a class="nectar-button n-sc-button medium accent-color regular-button" target="_blank" href="https://deepcorelabs.com/tools/prompt-extractor/" data-color-override="false" data-hover-color-override="false" data-hover-text-color-override="#fff"><span>Stable Diffusion PNG Prompt Extractor</span></a>
<p><a href="https://deepcorelabs.com/tools/prompt-extractor/"><img fetchpriority="high" decoding="async" class="alignnone wp-image-4434 size-full" src="https://deepcorelabs.com/wp-content/uploads/2025/03/2025-03-26_021316.jpg" alt="Stable Diffusion PNG Prompt Text Extractor Online Tool" width="739" height="1177" srcset="https://deepcorelabs.com/wp-content/uploads/2025/03/2025-03-26_021316.jpg 739w, https://deepcorelabs.com/wp-content/uploads/2025/03/2025-03-26_021316-188x300.jpg 188w, https://deepcorelabs.com/wp-content/uploads/2025/03/2025-03-26_021316-643x1024.jpg 643w" sizes="(max-width: 739px) 100vw, 739px" /></a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://deepcorelabs.com/stable-diffusion-png-prompt-text-extractor/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
