I usually do 2 passes. The first is the most time consuming as I'm stopping and starting a lot, identifying things I want to cut (uuhhhh, ummmm, noticeable breaths, stuttering, trimming pauses, off-the-record). Then I listen through it again for anything I missed the first time around. Sometimes things like the breaths are made easier to cut thanks to post-production processing once I've finished my first pass. But more often than not that's just the episode's run-time with an added 10 minutes or so. Usually I'd say the first pass is twice as long as the actual episode.
And now with the video versions, I usually edit both at the same time so I don't get toooo lost with the timecodes. Things like automatically truncating silence results in a noticeable difference in time. The audio versions normally end up about 15 minutes shorter than the video versions.
TL:DR - A long time.