Hacker News new | ask | show | jobs
by arijo 606 days ago
We could maybe chose the target window as the screenshot capture source instead of the full screen to prevent it to be hidden buy the Agent:

``` const getScreenshot = async (windowTitle: string) => { const { width, height } = getScreenDimensions(); const aiDimensions = getAiScaledScreenDimensions();

  const sources = await desktopCapturer.getSources({
    types: ['window'],
    thumbnailSize: { width, height },
  });

  const targetWindow = sources.find(source => source.name === windowTitle);

  if (targetWindow) {
    const screenshot = targetWindow.thumbnail;
    // Resize the screenshot to AI dimensions
    const resizedScreenshot = screenshot.resize(aiDimensions);
    // Convert the resized screenshot to a base64-encoded PNG
    const base64Image = resizedScreenshot.toPNG().toString('base64');
    return base64Image;
  }
  throw new Error(`Window with title "${windowTitle}" not found`);
}; ```
1 comments

Yup that could help, although if the key content is behind the window, clicks would bug out. I'm writing a PR to hide the window for now as a simple solution.

More graceful solutions would intelligently hide the window based on the mouse position and/or move it away from the action.

I think you can use nut-js desktop automation tool to send commands straight to the target window

```

import { mouse, Window, Point, Region } from '@nut-tree-fork/nut-js';

async function clickLinkInWindow(windowTitle: string, linkCoordinates: { x: number, y: number }) {

try {

    // Find window by title (using regex)
    const windows = await Window.getWindows(new RegExp(windowTitle));
    if (windows.length === 0) {
      throw new Error(`No window found matching title: ${windowTitle}`);
    }
    const targetWindow = windows[0];

    // Get window position and dimensions
    const windowRegion = await targetWindow.getRegion();
    console.log('Window region:', windowRegion);

    // Focus the window
    await targetWindow.focus();

    // Calculate absolute coordinates relative to window position
    const clickPoint = new Point(
      windowRegion.left + linkCoordinates.x,
      windowRegion.top + linkCoordinates.y
    );

    // Move mouse to target and click
    await mouse.setPosition(clickPoint);
    await mouse.leftClick();

    return true;
  } catch (error) {
    console.error('Error clicking link:', error);
    throw error;
  }
}

```

Maybe instead of a floating window do it like Zoom does when you're sharing your screen, become a frame around the desktop with a little toolbar at the top, bonus points if you can give Claude an avatar in a PiP window that talks you through what it's doing