Allowing for a decent settling time, you can reach +/- 1 encoder count. So it mostly comes down to how high resolution your encoder is. The highest resolution encoders we know of that are affordable are multipole magnetic encoders, like the one presented at the very bottom of this post: Project HoverArm, and in this topic: Where to source magnetic encoder and ring?.
Since you are moving a tiny mass, I would recommend an inrunner motor, since their rotor moment of inertia is small. Something like this:
That said, why not just buy an off-the-shelf system like this one?
I’m wondering for a long time if instead of F-theta lens you could do active focus - like the CD/DVD drives are using. Provided correct lens attached to the solenoid it should be possible to compute required parameters for this virtual Z axis depending on target X/Y position.