I know we can guarantee correctness either by locking or using a specialized thread whose sole job is to read/write and communicate with it through queue.
But this approach seems logically ok, so I want to avoid implementing them, specially because both has performance penalty.
In general, no.
Concurrent reading and writing behavior is heavily dependent on both the underlying operating system and filesystem.
You may be able to get something working by reading and writing chunks that are both a multiple of the underlying block size and are block-aligned. But you are likely in the world of "undefined behavior".
See also, related question: How do filesystems handle concurrent read/write?