Table of contents
Open Table of contents
Introduction
A few years ago I released pyslope, a 2D slope stability module based on bishops method of slices.
This blog follows my efforts in improving the speed of the code starting with minor code changes that gave a 1.2x increase in speed and finally the vectorization of calculations to deliver a 5x increase in speed.
What does pyslope do?
Pyslope calculates the factor of safety (or likelihood of failure) for hundreds of different possible failure planes using a guess and check style iterative method.
Bishop’s method of slices is a numerical approach where a failure plane is assumed, and a series of vertical strips are used for calculations.
One of the biggest trade offs is that the accuracy of computation is directly inversely proportional to the number of calculations required and so the program is often limited in accuracy by stopping at a reasonable runtime.
The PySlope implementation simplifies inputs to speed up calculations, only considering horizontal strata and a single uni-directional slope.
Current Runtime
In order to test the runtime i have constructed a simple example that uses 100 vertical slices and 2500 iterations (possible slopes).
On my i5-13600K I get an average runtime of 2.26 seconds.
Profiling python code
To profile python code i have used pyinstrument.
I was able to use this without any changes to my actual code through simple command line arguments.
Profiling slows down the code, inflating total runtime. However, we are mainly interested in the proportion of time spent in each function, so this does not significantly affect our analysis. The average runtime across five independent runs was 5.15 seconds when using the profiler.
pip install pyinstrument
pyinstrument .\speed_benchmark.py
5.080 <module> speed_benchmark.py:1
└─ 5.059 Slope.analyse_slope pyslope.py:1171
└─ 4.989 Slope._analyse_circular_failure_bishop pyslope.py:1358
├─ 1.898 [self] pyslope.py
├─ 1.094 Slope._analyse_circular_failure_ordinary pyslope.py:1200
│ ├─ 0.563 [self] pyslope.py
│ ├─ 0.101 Slope._calculate_strip_weight pyslope.py:1707
│ ├─ 0.057 Slope._calculate_strip_ll pyslope.py:1813
│ └─ 0.056 Slope._calculate_strip_udl_force pyslope.py:1777
├─ 0.369 Slope._calculate_strip_weight pyslope.py:1707
├─ 0.254 Slope._calculate_strip_udl_force pyslope.py:1777
├─ 0.160 Slope._calculate_strip_ll pyslope.py:1813
├─ 0.136 radians <built-in>
├─ 0.128 Slope.get_external_y_intersection pyslope.py:1890
├─ 0.116 tan <built-in>
├─ 0.115 sin <built-in>
├─ 0.109 Slope._get_material_at_depth pyslope.py:1756
├─ 0.107 max <built-in>
├─ 0.102 Slope.get_external_x_intersection pyslope.py:1903
├─ 0.091 atan <built-in>
├─ 0.088 min <built-in>
├─ 0.085 cos <built-in>
├─ 0.057 sqrt <built-in>
└─ 0.051 abs <built-in>
We can also get an interactive HTML version with the following command
pyinstrument -r html -o pyslope_speed.html .\speed_benchmark.py
Insights from Profiling
Overall, function runtimes met expectations, but the following stood out:
- calculate strip weight & calculate strip udl force take a significant amount of time compared to their scope.
- Simple math functions max, min, cos, radians etc are taking up a considerable amount of time
Small optimizations such as limiting simple math functions where possible could lead to improvements in the runtime.
Optimisation 1 - Removing radians calls
We can see that radians runtime is 0.127 seconds however the function is essentially only used when working with a materials friction angle to get tan(radians(self.friction_angle)).
Since a materials friction angle never changes we can precompute and store this value.
self.tan_friction_angle = tan(radians(self.friction_angle))
We can then use this function throughout the code.
This simple optimisation alone surprisingly results in a new average total runtime of 4.6 seconds (a 10% speed increase) with radians and tan no longer appearing in the profiler
4.557 <module> speed_benchmark.py:1
└─ 4.532 Slope.analyse_slope pyslope.py:1173
└─ 4.461 Slope._analyse_circular_failure_bishop pyslope.py:1362
├─ 1.647 [self] pyslope.py
├─ 1.017 Slope._analyse_circular_failure_ordinary pyslope.py:1202
│ ├─ 0.499 [self] pyslope.py
│ ├─ 0.098 Slope._calculate_strip_weight pyslope.py:1711
│ ├─ 0.091 Slope._calculate_strip_udl_force pyslope.py:1781
│ ├─ 0.051 max <built-in>
│ └─ 0.048 Slope._calculate_strip_ll pyslope.py:1817
├─ 0.380 Slope._calculate_strip_weight pyslope.py:1711
├─ 0.262 Slope._calculate_strip_udl_force pyslope.py:1781
├─ 0.157 Slope._calculate_strip_ll pyslope.py:1817
├─ 0.143 sin <built-in>
├─ 0.129 Slope.get_external_y_intersection pyslope.py:1894
├─ 0.110 Slope._get_material_at_depth pyslope.py:1760
├─ 0.105 min <built-in>
├─ 0.096 max <built-in>
├─ 0.092 Slope.get_external_x_intersection pyslope.py:1907
├─ 0.079 atan <built-in>
├─ 0.077 cos <built-in>
├─ 0.068 sqrt <built-in>
└─ 0.054 abs <built-in>
Optimisation 2 - Precalculating boundaries
Another minor optimisation is that for all the UDL and Line load calculations there is repeated calculations in finding the left and right of the strip from the centre point and the width of the strip.
A simple refactor is to calculate these values outside of the function call and pass in the left and right coordinates.
strip_xl = s_x - (b / 2)
strip_xr = s_x + (b / 2)
# if there is a udl load on the strip apply it.
for udl in self._udls:
W += self._calculate_strip_udl_force(b, udl, strip_xl, strip_xr)
# if there is a point load on the strip apply it.
for ll in self._lls:
W += self._calculate_strip_ll(b, ll, strip_xl, strip_xr)
This on top of some other minor improvements brings the runtime down further to 4.4 seconds with an observable improvement in the Slope._calculate_strip_udl_force function.
Optimisation 3 - Removing multi-threading
At first glance, this task seems well-suited for multi-threading, since the calculations are independent and can be combined easily. However, creating and managing threads comes with an upfront cost. For a job that runs in roughly 2 seconds, the overhead of multi-threading outweighed any potential performance gains.
Additionally, the current multi-threaded implementation caused issues on some hardware configurations. For these reasons, I opted to remove multi-threading, simplifying the code while maintaining consistent performance across different systems.
Optimisation 4 - Vectorising slice calculations with NumPy
A more significant optimisation was replacing Python for loops over slices with NumPy vectorised operations. Previously, the code calculated slice positions, weights, and forces one at a time in a loop, which added considerable overhead.
By using np.linspace to generate slice center points, np.sqrt and array arithmetic for slice bottom elevations, and np.maximum/np.where for top slice elevations and masking, we can now perform these calculations across the entire array at once. Material properties, slice weights, and force components were also converted to array operations instead of per-slice loops.
# Old approach (for loop)
s_x = x_left + half_width
for _ in range(num_slices):
s_yb = c_y - sqrt(radius**2 - (s_x - c_x)**2)
s_x += b
# New approach (NumPy vectorised)
slice_x = np.linspace(x_left + half_width, x_right - half_width, num_slices)
slice_yb = c_y - np.sqrt((slice_x - c_x)**2 + 0)
This change is beneficial because NumPy operations run in compiled C code and avoid Python’s per-iteration overhead. The calculations that previously iterated over hundreds of slices now execute in parallel across arrays, resulting in a roughly 5x speed improvement while keeping the code concise and easier to maintain.
With this change running the code now feels instant!
Conclusion
I applied a combination of minor and major improvements to the slope analysis code. The first three optimisations focused on low-hanging fruit: precomputing friction angles, precalculating strip boundaries, and removing multi-threading overhead. Each of these delivered small but measurable gains, simplifying the code and reducing runtime in targeted areas.
The fourth optimisation involved a major refactor of the slice calculations, replacing Python for loops with NumPy vectorised operations. This required restructuring the core logic for slice positions, weights, and forces, but it resulted in a dramatic performance improvement, with slice calculations running roughly 5x faster.
Profiling with pyinstrument was straightforward and extremely helpful, allowing me to identify bottlenecks without modifying the package. I plan to use it again for future performance tuning.