Input -> some Layers -> Final layer of size N -> Output
The trick here is to teach the NN to output the same as the input; so if the NN has to learn to emit what is on the input, it has to learn to extract intermediate representations that allow it to reconstruct the input; in this way, on the "final layer of size N" you have a representation of the input (otherwise you cannot obtain the output equal as input).
With that N you can now represent them in a N-dimensional space.
Imagine we have an input of size 20 but choose N=2 for that layer, then we train the NN, and we pass it a new sample, with that we have obtained a 2D representation of a 20D data.
Of course this approach has limitations and it's not going to have 100% acc in many cases, as by reducing dimension you may lose information too.