Optimize algorithms with loop unrolling
Remove unnecessary trigonometric computation from C++ sum_of_squares (sin²+cos²=1). Add loop unrolling to both C++ and Cython implementations to improve performance by reducing loop overhead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>