# Siemens - 2022

K-Nearest Neighbors Problem for Moderately High Dimensional Spaces

## Background

HEEDS is Siemens software which does “design space exploration” for engineering-design problems. The problem has a design space, which is defined by the design parameters that can be varied. The number of design parameters is usually less than 10, sometimes it can be much more than that.

HEEDS search algorithms vary the design parameters to create trial designs and then simulate their performance with appropriate analysis tools. A common goal is optimizing the dimensions or material properties of a structure to get the best tradeoff between strength and cost. The number of trial designs can be anywhere from a few dozen to 10’s of thousands. As the search accumulates trial designs, HEEDS identify interesting designs and then finds nearby designs from among those that it has already analyzed.

This is a k-nearest neighbors problem. As HEEDS users take on higher-dimensional problems and create larger design sets, the search starts to encounter performance issues. The execution time of a naïve implementation increases exponentially with problem size. There are more sophisticated algorithms available for 2D and 3D spaces and there is some work in very high dimensional problems (text, image recognition) but not as much in the range of our problems.

## About Project

The goal of this project is to collect and evaluate KNN search implementations to determine which, if any, offer lower complexity compared to simple approaches. R-tree and X-tree are possible techniques to test, for example.

The problem size of interest is roughly 25-50 dimensions, 50,000-100,000 points, and 10 < k < 500. Example data sets can be provided.

Review available algorithms from the open-source community, select those that seem appropriate for this size problem, and test them to determine how their speed increases as the problem size increases. Compare to a naïve approach.

## Deliverables

- Report outlining the testing process, techniques tested, and measured performance.
- Recommendations on most promising techniques to explore further.
- Code that was written for the project to execute the testing.