Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-54137 Prepare Apache Spark 4.2.0
  3. SPARK-51966

Replace select.select() with select.poll() when running on POSIX os

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.5, 4.0.0
    • 4.2.0
    • PySpark

    Description

      On glibc based Linux systems select() can monitor only file descriptor numbers that are less than FD_SETSIZE (1024).

      This is an unreasonably low limit for many modern applications.

      When running via pyspark we frequently observe:

      Exception occurred during processing of request from ('127.0.0.1', 46334)
      Traceback (most recent call last):
        File "/usr/lib/python3.11/socketserver.py", line 317, in _handle_request_noblock
          self.process_request(request, client_address)
        File "/usr/lib/python3.11/socketserver.py", line 348, in process_request
          self.finish_request(request, client_address)
        File "/usr/lib/python3.11/socketserver.py", line 361, in finish_request
          self.RequestHandlerClass(request, client_address, self)
        File "/usr/lib/python3.11/socketserver.py", line 755, in __init__
          self.handle()
        File "/usr/lib/python3.11/site-packages/pyspark/accumulators.py", line 293, in handle
          poll(authenticate_and_accum_updates)
        File "/usr/lib/python3.11/site-packages/pyspark/accumulators.py", line 266, in poll
          r, _, _ = select.select([self.rfile], [], [], 1)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ValueError: filedescriptor out of range in select()
      

      On POSIX systems poll() should be used instead of select().

      Attachments

        Activity

          People

            wjszlachtaman Wojciech Szlachta
            wjszlachtaman Wojciech Szlachta
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: