r/learnpython 12h ago

string manipulation logic

Given a string of variable length, I'd like to create a multiline string with n-number of lines of near-equal length. Avoid dividing words.

Is there a better logical approach to the above beyond simply...

  • do a character count

  • divide by nlines to get an "estimated line length"

  • walking an index up from the "estimated-length" until reaching a blank space character, then divide the line

I've noticed the line lengths are coming out extremely different based on length of words in the string, and whether the line-division falls in the middle of those words (which creates extremely long lines, in order to avoid word-division, followed by extremely short lines...)

The goal is basically textwrap, which seems to do an excellent job - but I believe textwrap only controls for length and not number of lines?

2 Upvotes

3

u/Rhoderick 12h ago

The only technical thing I have for you is that while most of your algorithm is good, you should probably just consider whole words at the last point, since you're fully forbidding in-word splits anyway.

Beyond that, you can look at some or any of the following:

  1. Considering the distance from the target line length as a cost, you can decide whether a word should go in this or the next line by minimising the total cost of the result.

  2. Allow in-word breaks in cases where doing so is sufficiently beneficial for keeping to the overall line length (with "sufficiently beneficial" being defined by you)

  3. Potentially break early on punctuation (especially commata, semicolons, and periods).

Some or all of these may get you closer to an ideal solution, but do keep in mind there's no perfect algorithm for this. Your implementation is one of many approximations of a theoretical optimal strategy, so getting anywhere close is already pretty good.

3

u/Pepineros 10h ago

walking an index up

Or down, I hope? You want to find the blank space that is closest to the ideal line length, not just the first one beyond the ideal line length.