One needs to send 1 mln HTTP requests concurrently, in batches, and read the responses. No more than 100 requests at a time.

Which way will it be better, recommended, idiomatic?

  • Send 100 ones, wait for them to finish, send another 100, wait for them to finish… and so on

  • Send 100 ones. As a a request among the 100 finishes, add a new one into the pool. “Done - add a new one. Done - add a new one”. As a stream.

  • Big P
    link
    fedilink
    English
    arrow-up
    6
    ·
    1 year ago

    Try asking this question 1 million times

  • paysrenttobirds@sh.itjust.works
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago

    Yes, I think the second. You have a pool of 100 http clients and a queue of one million requests and a queue to accept the responses as the clients complete, and a little machine that waits for capacity in the client queue to send the next request until there are no more requests. If the response is important to this process, your machine is also pulling from the response queue as available and computing whatever it needs from that, for example to decide whether to abort the rest of the requests. Any other use of the responses can be handled outside this loop.

    The other way would work fine, but I think it’s actually slightly more complicated and slower because you now have a queue of 10000 batches of 100 requests each and the machine has to watch for all one hundred clients to complete before sending off the next batch. Otherwise, it’s the same situation.

  • vmaziman@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 year ago

    Maybe producer consumer?

    Producer spits out all the messages to send out onto a message queue, fifo or whatever suits u.

    Parrallelizable consumers (think deployed containers) listen to queue and execute request, get response and save it

    Scale consumer count up or down as you need to deal with ratelimits