Windows-Use is a powerful automation agent that interact directly with the Windows at GUI layer. It bridges the gap between AI Agents and the Windows OS to perform tasks such as opening apps, clicking buttons, typing, executing shell commands, and capturing UI state all without relying on traditional computer vision models. Enabling any LLM to perform computer automation instead of relying on specific models for it. 馃洜锔廔nstallation Guide Prerequisites Python 3.12 or higher UV (or pip ) ) Windows 7 or 8 or 10 or 11 Installation Steps Install using uv : uv pip install windows-use Or with pip: pip install windows-use 鈿欙笍Basic Usage # main.py from langchain_google_genai import ChatGoogleGenerativeAI from windows_use . agent import Agent from dotenv import load_dotenv load_dotenv () llm = ChatGoogleGenerativeAI ( model = 'gemini-2.0-flash' ) agent = Agent ( llm = llm , browser = 'chrome' , use_vision = True ) query = input ( "Enter your query: " ) agent_result = agent . invoke ( query = query ) print ( agent_result . content ) 馃 Run Agent You can use the following to run from a script: python main.py Enter your query: < YOUR TASK > 馃帴 Demos PROMPT: Write a short note about LLMs and save to the desktop Word.mov PROMPT: Change from Dark mode to Light mode Dark.to.Light.Mode.mov 馃搱 Grounding Vision Talk to your computer. Watch it get things done. Star History 鈿狅笍 Caution Agent interacts directly with your Windows OS at GUI layer to perform actions. While the agent is designed to act intelligently and safely, it can make mistakes that might bring undesired system behaviour or cause unintended changes. Try to run the agent in a sandbox envirnoment. 馃 License This project is licensed under the MIT License - see the LICENSE file for details. 馃 Contributing Contributions are welcome! Please check the CONTRIBUTING file for setup and development workflow. Made with 鉂わ笍 by Jeomon George Citation
First seen: 2025-09-12 15:54
Last seen: 2025-09-12 20:03