Re: [PATCH execline] Add forx -P

From: Sertonix <sertonix_at_posteo.net>
Date: Thu, 11 Jul 2024 16:42:59 +0000

> See, this is why I prefer technical discussions first and *maybe*
> patches later:
> 1. I'm not sure exactly what feature you want;
> 2. That's a bit too much change for me to read the diff in-line and
> insta-review it. It requires more effort. Which I'm not willing to do
> if I don't know what the change is supposed to be doing.

Sorry, I already used the patch and wanted to share it. We can ignore the
patch for now and I will try to explain what I want.

In my case I wanted to extract data from the aports git repository.
Unfortunatly I only found a way which required 1 git call per directory
so in the hope to speed up the process I tried 'forx -p':

elglob -0 paths main/* forx -E -p i { ${paths} } git -P log -1 --format=%at\ ${i} -- ${i}

Which made my system unresponsive for a while.

> >Using forx in parallel mode with a large number of arguments can lead to
> >all instances not being able to finish their jobs due to resource
> >constrains.
>
> That's only going to be a problem for an insane amount of arguments,
> which have to be explicitly listed in the block, so I don't think it's
> a real problem in the first place. If your machine isn't powerful
> enough to handle 5k arguments, don't use forx -p with a block that
> has 5k arguments?

I forgot to mention that this is more important for resource intensive jobs.
In my case ~1500 git instances were able to fill 8GB of ram very quickly.

And I would like to add that the number of arguments may be unknown when
writing a script or may even be determined by user input (DOS attack). So
a limit of parallel processes is in my opinion a requirement for any
execline script using 'for -p' to be robust.

> > With an option to specify a max amount of parallel instances
> >this can be handled properly.
>
> From what I can understand at first glance, your change puts a cap
> of maxpar over the amount of arguments that can be processed, and the
> remaining arguments are just ignored. If it is the case, I don't think
> it's useful, because maxpar has to be input by the user, and the user
> can already know how many arguments there are in the block, so the
> list can be truncated prior to the forx invocation.
> Better semantics for -P $maxpar (what I thought you had implemented
> before looking at the diff) would be a cap over the number of
> arguments that are *processed in parallel*, akin to make -j$maxpar,
> while still processing the full list;

My change does work like '-j$maxpar' (not truncation).

> that's a feature that I would
> consider implementing, but it's a little more involved, requiring a
> full refactor of the loop to always have a window of $maxpar children
> at a time.

A refactor isn't required. It may help the code readability though.

On the technical side it would be nice to add the same to forstdin.
Received on Thu Jul 11 2024 - 18:42:59 CEST

This archive was generated by hypermail 2.4.0 : Thu Jul 11 2024 - 18:43:29 CEST