Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • M mae
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 1
    • Issues 1
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Create a new issue
  • Jobs
  • Issue Boards
Collapse sidebar
  • Jackson Queale
  • mae
  • Issues
  • #1

Closed
Open
Created Feb 15, 2025 by Jackson Queale@jacksonquealeMaintainer

Researchers Reduce Bias in aI Models while Maintaining Or Improving Accuracy


Machine-learning models can fail when they try to make predictions for individuals who were underrepresented in the datasets they were trained on.

For links.gtanet.com.br instance, a model that predicts the very best treatment alternative for someone with a chronic disease may be trained using a dataset that contains mainly male clients. That model may make inaccurate forecasts for female clients when released in a medical facility.

To enhance outcomes, engineers can try stabilizing the training dataset by eliminating data points till all subgroups are represented equally. While dataset balancing is promising, it often needs removing big amount of data, injuring the model's overall efficiency.

MIT scientists established a new method that determines and gets rid of specific points in a training dataset that contribute most to a model's failures on minority subgroups. By removing far fewer datapoints than other techniques, this technique maintains the overall accuracy of the design while enhancing its performance regarding underrepresented groups.

In addition, the strategy can determine covert sources of bias in a training dataset that does not have labels. Unlabeled data are much more prevalent than identified information for lots of applications.

This method could likewise be integrated with other approaches to improve the fairness of machine-learning designs deployed in high-stakes circumstances. For example, it may someday help ensure underrepresented clients aren't misdiagnosed due to a biased AI model.

"Many other algorithms that try to resolve this issue presume each datapoint matters as much as every other datapoint. In this paper, we are showing that assumption is not real. There are particular points in our dataset that are contributing to this predisposition, and we can find those information points, eliminate them, and improve performance," says Kimia Hamidieh, garagesale.es an electrical engineering and computer science (EECS) graduate trainee at MIT and co-lead author of a paper on this technique.

She wrote the paper with co-lead authors Saachi Jain PhD '24 and fellow EECS graduate trainee Kristian Georgiev; Andrew Ilyas MEng '18, PhD '23, a Stein Fellow at Stanford University; and senior authors Marzyeh Ghassemi, an associate professor in EECS and a member of the Institute of Medical Engineering Sciences and the Laboratory for Details and Decision Systems, and Aleksander Madry, the Cadence Design Systems Professor at MIT. The research study will exist at the Conference on Neural Details Processing Systems.

Removing bad examples

Often, machine-learning designs are trained using substantial datasets collected from lots of sources across the internet. These datasets are far too big to be carefully curated by hand, so they may contain bad examples that harm model performance.

Scientists likewise understand that some information points affect a model's efficiency on certain downstream jobs more than others.

The MIT researchers integrated these two concepts into a method that identifies and eliminates these problematic datapoints. They look for to fix an issue known as worst-group error, which happens when a design underperforms on minority subgroups in a training dataset.

The scientists' brand-new strategy is driven by prior work in which they introduced a method, called TRAK, that identifies the most important training examples for a particular model output.

For disgaeawiki.info this strategy, they take incorrect forecasts the model made about minority subgroups and utilize TRAK to recognize which training examples contributed the most to that inaccurate prediction.

"By aggregating this details throughout bad test forecasts in properly, we are able to discover the specific parts of the training that are driving worst-group accuracy down overall," Ilyas explains.

Then they get rid of those particular samples and retrain the design on the remaining data.

Since having more data typically yields much better total efficiency, eliminating simply the samples that drive worst-group failures maintains the design's overall precision while improving its efficiency on minority subgroups.

A more available method

Across 3 machine-learning datasets, utahsyardsale.com their method outperformed multiple strategies. In one instance, it enhanced worst-group precision while eliminating about 20,000 less training samples than a standard information balancing method. Their technique also attained higher precision than methods that require making changes to the inner functions of a model.

Because the MIT method involves changing a dataset instead, biolink.palcurr.com it would be easier for a specialist to use and can be applied to lots of kinds of designs.

It can likewise be used when predisposition is unidentified since subgroups in a training dataset are not identified. By determining datapoints that contribute most to a feature the model is learning, they can comprehend the variables it is using to make a forecast.

"This is a tool anybody can use when they are training a machine-learning model. They can look at those datapoints and see whether they are aligned with the capability they are trying to teach the model," says Hamidieh.

Using the strategy to find unidentified subgroup bias would require instinct about which groups to look for, bybio.co so the scientists wish to verify it and explore it more totally through future human research studies.

They also wish to improve the efficiency and reliability of their technique and guarantee the approach is available and easy-to-use for practitioners who might one day release it in real-world environments.

"When you have tools that let you critically look at the information and find out which datapoints are going to lead to predisposition or other unwanted behavior, it provides you an initial step towards building designs that are going to be more fair and more dependable," Ilyas says.

This work is funded, in part, by the National Science Foundation and wiki.myamens.com the U.S. Defense Advanced Research Projects Agency.

Assignee
Assign to
Time tracking