Skip to content

PySlope Optimised - Achieving 5x Faster Calculations in Python

Published: at 02:03 AM

Table of contents

Open Table of contents

Introduction

A few years ago I released pyslope, a 2D slope stability module based on bishops method of slices.

This blog follows my efforts in improving the speed of the code starting with minor code changes that gave a 1.2x increase in speed and finally the vectorization of calculations to deliver a 5x increase in speed.

What does pyslope do?

Pyslope calculates the factor of safety (or likelihood of failure) for hundreds of different possible failure planes using a guess and check style iterative method.

Bishop’s method of slices is a numerical approach where a failure plane is assumed, and a series of vertical strips are used for calculations.

One of the biggest trade offs is that the accuracy of computation is directly inversely proportional to the number of calculations required and so the program is often limited in accuracy by stopping at a reasonable runtime.

The PySlope implementation simplifies inputs to speed up calculations, only considering horizontal strata and a single uni-directional slope.

Current Runtime

In order to test the runtime i have constructed a simple example that uses 100 vertical slices and 2500 iterations (possible slopes).

On my i5-13600K I get an average runtime of 2.26 seconds.

Profiling python code

To profile python code i have used pyinstrument.

I was able to use this without any changes to my actual code through simple command line arguments.

Profiling slows down the code, inflating total runtime. However, we are mainly interested in the proportion of time spent in each function, so this does not significantly affect our analysis. The average runtime across five independent runs was 5.15 seconds when using the profiler.

pip install pyinstrument

pyinstrument .\speed_benchmark.py

5.080 <module>  speed_benchmark.py:1
└─ 5.059 Slope.analyse_slope  pyslope.py:1171
   └─ 4.989 Slope._analyse_circular_failure_bishop  pyslope.py:1358
      ├─ 1.898 [self]  pyslope.py
      ├─ 1.094 Slope._analyse_circular_failure_ordinary  pyslope.py:1200
      │  ├─ 0.563 [self]  pyslope.py
      │  ├─ 0.101 Slope._calculate_strip_weight  pyslope.py:1707
      │  ├─ 0.057 Slope._calculate_strip_ll  pyslope.py:1813
      │  └─ 0.056 Slope._calculate_strip_udl_force  pyslope.py:1777
      ├─ 0.369 Slope._calculate_strip_weight  pyslope.py:1707
      ├─ 0.254 Slope._calculate_strip_udl_force  pyslope.py:1777
      ├─ 0.160 Slope._calculate_strip_ll  pyslope.py:1813
      ├─ 0.136 radians  <built-in>
      ├─ 0.128 Slope.get_external_y_intersection  pyslope.py:1890
      ├─ 0.116 tan  <built-in>
      ├─ 0.115 sin  <built-in>
      ├─ 0.109 Slope._get_material_at_depth  pyslope.py:1756
      ├─ 0.107 max  <built-in>
      ├─ 0.102 Slope.get_external_x_intersection  pyslope.py:1903
      ├─ 0.091 atan  <built-in>
      ├─ 0.088 min  <built-in>
      ├─ 0.085 cos  <built-in>
      ├─ 0.057 sqrt  <built-in>
      └─ 0.051 abs  <built-in>

We can also get an interactive HTML version with the following command

pyinstrument -r html -o pyslope_speed.html .\speed_benchmark.py

Insights from Profiling

Overall, function runtimes met expectations, but the following stood out:

Small optimizations such as limiting simple math functions where possible could lead to improvements in the runtime.

Optimisation 1 - Removing radians calls

We can see that radians runtime is 0.127 seconds however the function is essentially only used when working with a materials friction angle to get tan(radians(self.friction_angle)).

Since a materials friction angle never changes we can precompute and store this value.

self.tan_friction_angle = tan(radians(self.friction_angle))

We can then use this function throughout the code.

This simple optimisation alone surprisingly results in a new average total runtime of 4.6 seconds (a 10% speed increase) with radians and tan no longer appearing in the profiler

4.557 <module>  speed_benchmark.py:1
└─ 4.532 Slope.analyse_slope  pyslope.py:1173
   └─ 4.461 Slope._analyse_circular_failure_bishop  pyslope.py:1362
      ├─ 1.647 [self]  pyslope.py
      ├─ 1.017 Slope._analyse_circular_failure_ordinary  pyslope.py:1202
      │  ├─ 0.499 [self]  pyslope.py
      │  ├─ 0.098 Slope._calculate_strip_weight  pyslope.py:1711
      │  ├─ 0.091 Slope._calculate_strip_udl_force  pyslope.py:1781
      │  ├─ 0.051 max  <built-in>
      │  └─ 0.048 Slope._calculate_strip_ll  pyslope.py:1817
      ├─ 0.380 Slope._calculate_strip_weight  pyslope.py:1711
      ├─ 0.262 Slope._calculate_strip_udl_force  pyslope.py:1781
      ├─ 0.157 Slope._calculate_strip_ll  pyslope.py:1817
      ├─ 0.143 sin  <built-in>
      ├─ 0.129 Slope.get_external_y_intersection  pyslope.py:1894
      ├─ 0.110 Slope._get_material_at_depth  pyslope.py:1760
      ├─ 0.105 min  <built-in>
      ├─ 0.096 max  <built-in>
      ├─ 0.092 Slope.get_external_x_intersection  pyslope.py:1907
      ├─ 0.079 atan  <built-in>
      ├─ 0.077 cos  <built-in>
      ├─ 0.068 sqrt  <built-in>
      └─ 0.054 abs  <built-in>

Optimisation 2 - Precalculating boundaries

Another minor optimisation is that for all the UDL and Line load calculations there is repeated calculations in finding the left and right of the strip from the centre point and the width of the strip.

A simple refactor is to calculate these values outside of the function call and pass in the left and right coordinates.

strip_xl = s_x - (b / 2)
strip_xr = s_x + (b / 2)

# if there is a udl load on the strip apply it.
for udl in self._udls:
    W += self._calculate_strip_udl_force(b, udl, strip_xl, strip_xr)

# if there is a point load on the strip apply it.
for ll in self._lls:
    W += self._calculate_strip_ll(b, ll, strip_xl, strip_xr)

This on top of some other minor improvements brings the runtime down further to 4.4 seconds with an observable improvement in the Slope._calculate_strip_udl_force function.

Optimisation 3 - Removing multi-threading

At first glance, this task seems well-suited for multi-threading, since the calculations are independent and can be combined easily. However, creating and managing threads comes with an upfront cost. For a job that runs in roughly 2 seconds, the overhead of multi-threading outweighed any potential performance gains.

Additionally, the current multi-threaded implementation caused issues on some hardware configurations. For these reasons, I opted to remove multi-threading, simplifying the code while maintaining consistent performance across different systems.

Optimisation 4 - Vectorising slice calculations with NumPy

A more significant optimisation was replacing Python for loops over slices with NumPy vectorised operations. Previously, the code calculated slice positions, weights, and forces one at a time in a loop, which added considerable overhead.

By using np.linspace to generate slice center points, np.sqrt and array arithmetic for slice bottom elevations, and np.maximum/np.where for top slice elevations and masking, we can now perform these calculations across the entire array at once. Material properties, slice weights, and force components were also converted to array operations instead of per-slice loops.

# Old approach (for loop)
s_x = x_left + half_width
for _ in range(num_slices):
    s_yb = c_y - sqrt(radius**2 - (s_x - c_x)**2)
    s_x += b

# New approach (NumPy vectorised)
slice_x = np.linspace(x_left + half_width, x_right - half_width, num_slices)
slice_yb = c_y - np.sqrt((slice_x - c_x)**2 + 0)

This change is beneficial because NumPy operations run in compiled C code and avoid Python’s per-iteration overhead. The calculations that previously iterated over hundreds of slices now execute in parallel across arrays, resulting in a roughly 5x speed improvement while keeping the code concise and easier to maintain.

With this change running the code now feels instant!

Conclusion

I applied a combination of minor and major improvements to the slope analysis code. The first three optimisations focused on low-hanging fruit: precomputing friction angles, precalculating strip boundaries, and removing multi-threading overhead. Each of these delivered small but measurable gains, simplifying the code and reducing runtime in targeted areas.

The fourth optimisation involved a major refactor of the slice calculations, replacing Python for loops with NumPy vectorised operations. This required restructuring the core logic for slice positions, weights, and forces, but it resulted in a dramatic performance improvement, with slice calculations running roughly 5x faster.

Profiling with pyinstrument was straightforward and extremely helpful, allowing me to identify bottlenecks without modifying the package. I plan to use it again for future performance tuning.


Next Post
Non-Convex Optimization Problems