knarrow package

Subpackages

Submodules

knarrow.angle_method module

knarrow.angle_method.angle(x, y, **kwargs)

Find a knee by looking at the maximum change of the angle of the line going through consecutive point pairs.

Quite sensitive to noise, use with cubic spline smoothing.

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • y (np.ndarray) – the \(y\) coordinates of the points

  • **kwargs – possible additional arguments (none are used)

Returns:

the index of the knee

Return type:

int

knarrow.c_method module

knarrow.c_method.c_method(x, y, **kwargs)

Implements the C-method as described in https://blog.msmetko.xyz/posts/2

Parameters:
  • x (np.ndarray) – the ground truth \(x\) coordinates

  • y (np.ndarray) – the ground truth \(y\) coordinates

  • **kwargsdict, optional arguments (should be empty)

Returns:

the index of the knee

Return type:

int

knarrow.c_method.d2e_dc2(y, x, c)

The second derivative of the energy function \(E(x)\) with respect to \(c\).

Used in newton_raphson method for the optimization of the shape parameter \(c\).

Parameters:
  • y (np.ndarray) – the ground truth function values we wish to fit the knee curve \(f(x)\) on

  • x (np.ndarray) – the \(x\) coordinates of the points

  • c (float) – the shape parameter

Returns:

values of the second derivative of \(E(x)\) evaluated at \(x\)

Return type:

np.ndarray

knarrow.c_method.d2f_dc2(x, c)

The second derivative of the knee curve \(f(x)\) with respect to \(x\), conditioned on \(c\).

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • c (float) – the shape parameter

Returns:

values of the second derivative of \(f(x)\) evaluated at \(x\)

Return type:

np.ndarray

knarrow.c_method.de_dc(y, x, c)

The first derivative of the energy function \(E(x)\) with respect to \(c\).

Used in newton_raphson method for the optimization of the shape parameter \(c\).

Parameters:
  • y (np.ndarray) – the ground truth function values we wish to fit the knee curve \(f(x)\) on

  • x (np.ndarray) – the \(x\) coordinates of the points

  • c (float) – the shape parameter

Returns:

values of the first derivative of \(E(x)\) evaluated at \(x\)

Return type:

np.ndarray

knarrow.c_method.df_dc(x, c)

The first derivative of the knee curve \(f(x)\) with respect to \(x\), conditioned on \(c\).

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • c (float) – the shape parameter

Returns:

values of the slopes of tangents on \(f(x)\) evaluated at \(x\)

Return type:

np.ndarray

knarrow.c_method.f(x, c)

The knee curve.

This function is being fitted to the input data by tweaking the c parameter.

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • c (float) – the shape parameter

Returns:

the values of the function evaluated at \(x\) with respect to. c

Return type:

np.ndarray

knarrow.c_method.get_knee(c)

Calculates the position of the point of maximal curvature.

The position depends solely on the shape parameter \(c\).

Parameters:

c (float) – the shape parameter

Returns:

the \(x\) position where the knee curve \(f(x)\) is most curved

Return type:

float

knarrow.c_method.newton_raphson(x, y)

The implementation of the Newton-Raphson optimization procedure. It fits the knee curve \(f(x)\) to the \(y\) s of the corresponding \(x\) s by tweaking the shape parameter \(c\) from an initial guess.

Parameters:
  • x (np.ndarray) – the ground truth \(x\) coordinates

  • y (np.ndarray) – the ground truth \(y\) coordinates

Returns:

the optimal shape parameter \(c\) which minimizes the squared error, up to a predefined tolerance level

Return type:

float

knarrow.distance_method module

knarrow.distance_method.distance(x, y, **kwargs)

Find a knee by finding a point which is most distant from the line \(y=x\) (after normalizing the inputs)

Fun fact: vertical distance of a point P from line \(y=x\) is just a scaled version of the orthogonal distance of the same point P from line \(y=x\).

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • y (np.ndarray) – the \(y\) coordinates of the points

  • **kwargs – possible additional arguments (none are actually used)

Returns:

the index of the knee

Return type:

int

knarrow.distance_method.distance_adjacent(x, y, **kwargs)

Find a knee by finding a point which is most distant from the line going through the neighbouring points.

This method is quite sensitive to noise, so only use with cubic spline smoothing.

Note: I developed a (somewhat) fancy linear algebra implementation so this should be quite fast.

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • y (np.ndarray) – the \(y\) coordinates of the points

  • **kwargs – possible additional arguments (none are actually used)

Returns:

the index of the knee

Return type:

int

knarrow.kneedle module

knarrow.kneedle.kneedle(x, y, **kwargs)

Kneedle method from https://doi.org/10.1109/ICDCSW.2011.20

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • y (np.ndarray) – the \(y\) coordinates of the points

  • **kwargs

Returns:

the index of the knee

Return type:

int

knarrow.main module

knarrow.menger module

knarrow.menger.double_triangle_area(vertices)

Return twice the area of a triangle with the given vertices.

Parameters:

vertices (np.ndarray) – \(3 \times 2\) matrix where every row is a new 2-D vertex \((x, y)\)

Returns:

\(1 \times 1\) array, twice the area

Return type:

np.ndarray

knarrow.menger.get_curvature(vertices)

Calculate the Menger curvature defined by the three points

Parameters:

vertices (np.ndarray) – \(3 \times 2\) matrix where every row is a new 2-D vertex \((x, y)\)

Returns:

the curvature, i.e. the reciprocal of the radius of the circumcircle around the vertices

Return type:

np.ndarray

knarrow.menger.get_squared_vector_lengths(vertices)

Return the square of the lengths between neighbouring vertices

Parameters:

vertices (np.ndarray) – array of vertices

Returns:

lenghts; the entry at lenghts[i] is a squared distance between the vertex i and i+1

Return type:

np.ndarray

knarrow.menger.menger_anchored(x, y, **kwargs)

Find a knee using the Menger curvature on the first point, last point, and varying the middle point. More resistant to the noise in the data than menger_successive

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • y (np.ndarray) – the \(y\) coordinates of the points

  • **kwargs – possible additional arguments (none are used)

  • Returnsint: the index of the knee

knarrow.menger.menger_successive(x, y, **kwargs)

Find a knee using the Menger curvature on the three successive points

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • y (np.ndarray) – the \(y\) coordinates of the points

  • **kwargs – possible additional arguments (none are used)

Returns:

the index of the knee

Return type:

int

knarrow.ols module

knarrow.ols.ols_swiping(x, y, **kwargs)

Performs OLS swiping method.

By first selecting a pivot point, all the data points are implicitly split into two parts: the left one and the right one. Then, the algorithm regresses a line onto \(y\) with respect to \(x\) for both parts using ordinary least squares. If both lines fit quite well, meaning the \(R^2\) for both left and the right fit is particularly high, the pivot point is declared as a knee.

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • y (np.ndarray) – the \(y\) coordinates of the points

  • **kwargs – possible additional arguments (none are used)

Returns:

the index of the knee

Return type:

int

knarrow.ols.r_squared(x, y)

Fit a linear regression to \(y\) and return the \(R^2\) statistical measure of a fit of a linear regression

Parameters:
  • x (np.ndarray) – The \(x\) coordinates of the points

  • y (np.ndarray) – The \(y\) coordinates of the points

Returns:

The \(R^2\) measure of a fit. Bounded by \(\left[0, 1\right]\). The closer \(R^2\) is to \(1\), the better the fit.

Return type:

float

knarrow.util module

class knarrow.util.KneeType(*values)

Bases: Enum

DECREASING_CONCAVE = 2
DECREASING_CONVEX = 0
INCREASING_CONCAVE = 1
INCREASING_CONVEX = 3
knarrow.util.cubic_spline_smoothing(x, y, smoothing_factor=0)

Smoothes the \(y\) vecetor by minimizing the second derivative.

Parameters:
  • x (np.ndarray) – the \(x\) coordinates of the points

  • y (np.ndarray) – the \(y\) coordinates of the points

  • smoothing_factor (float) – the cubic spline smoothing hyperparameter

Returns:

the \(x\) and \(y\) coordinates of the smoothed points

Return type:

tuple of np.ndarray

knarrow.util.detect_knee_type(y1, y2, y3, y4)

Detects the type of a knee using the first two and the last two points.

Simplifies the problem by assuming noiseless input.

Parameters:
  • y1 (float) – the \(y\) coordinate of the first point

  • y2 – (float): the \(y\) coordinate of the second point

  • y3 – (float): the \(y\) coordinate of the second-to-last point

  • y4 – (float): the \(y\) coordinate of the last point

Returns:

the type of the knee detected

Return type:

KneeType

knarrow.util.get_delta_matrix(h)

Creates \(\Delta\) matrix. Used for cubic spline smoothing.

Adapted from https://en.wikipedia.org/wiki/Smoothing_spline#Derivation_of_the_cubic_smoothing_spline

[h1, h2, h3, h4] -> [[1/h1 -1/h1-1/h2 1/h2 0 0] [ 0 1/h2 -1/h2-1/h3 0 0] [ 0 0 1/h3 -1/h3-1/h4 1/h4]]

Parameters:

h (np.ndarray) – the differences vector

Returns:

the \(\Delta\) matrix

Return type:

np.ndarray

knarrow.util.get_weight_matrix(h)

Creates the weight matrix. Used for cubic spline smoothing.

Adapted from https://en.wikipedia.org/wiki/Smoothing_spline#Derivation_of_the_cubic_smoothing_spline

Parameters:

h (np.ndarray) – the differences vector

Returns:

the weight matrix \(W\)

Return type:

np.ndarray

knarrow.util.normalize(x)

Helper function for normalizing the inputs.

Normalization is an affine transformation such that the minimal element of x maps to 0, and maximal element of x maps to 1

Parameters:

x (np.ndarray) – the array to be normalized

Returns:

a normalized array such that the minimum is 0 and the maximum is 1

Return type:

np.ndarray

knarrow.util.np_anchored(length: int) ndarray[tuple[Any, ...], dtype[int64]]

Return indices x such that every row in array[x] is a slice of the array with a first, i-th and the last element Helper function which uses numpy broadcasting to work

Parameters:

length (int) – the total length of the array to be indexed in the anchored way

Returns:

indices for windowing an array

Return type:

np.ndarray

knarrow.util.np_windowed(length: int, window_size: int, stride: int = 1, dilation: int = 1) ndarray

Return indices x such that every row in array[x] is a windowed slice of the array Helper function which uses numpy broadcasting to work Adapted from: https://stackoverflow.com/a/42258242

Parameters:
  • length (int) – the total length of the array to be indexed

  • window_size (int) – the size of the window to be slided across the array

  • stride (int) – the size of the “jumps” between the consecutive windows. Defaults to 1

  • dilation (int) – the distance between the elements in the window. Defaults to 1

Returns:

indices for windowing an array

Return type:

np.ndarray

knarrow.util.prepare(f)
knarrow.util.projection_distance(vertices)

Return the projection distance of the point P1 to the line through both P2 and the origin.

Parameters:

vertices (np.ndarray) – array of shape (..., 2, 2). Coordinates of the points such that vertices[..., 0, :] are the coordinates of the point we wish to project on a line defined with the origin and vertices[..., 1, :]

Returns:

one dimensional array denoting the distance the point vertices[..., 0, :] must travel to be projected onto the line defined by vertices[..., 1, :] and the origin

Return type:

np.ndarray

Module contents

knarrow.all(*args, **kwargs)
knarrow.find_knee(*args, **kwargs)