When I use rayQueryProceedEXT(query), the work register value is very high.
More threads help with latency hiding, but half threads is enough to keep the core busy, so it's not guaranteed to be beneficial. A lot of complex gaming content has shaders using more than 32 registers, so I wouldn't worry too much.