⚛ Hz
like sqlite3
⚛ Hz
https://sqlite.org/appfileformat.html
⚛ Hz
But as a beginner, it is understandable to implement a crappy database of your own
+
I suggest to use a proper database for that... (>1MB)
the file is not mine... Employer give it to me and want me to parse it
Anonymous
thank you, I will check it the problem write directly with ofstream is I need to write small byte so many times, so if I use ofstream and start writing all of that directly to file its start to slow down the program so first I need to finish creating new changed data and then write the changed data to file in one step with ofstream
writing itself doesn't write the data to the file, you need to either flush the writes or set up automatic flushing. that being said, i think if you are writing something like a word processor, the current context or page of the document should completely be in RAM in different structs and not in the file's to-be-written buffer. > so first I need to finish creating new changed data and then write the changed data to file in one step with ofstream do exactly this
Anonymous
i was just saying that they won't impact performance because not every write will be an actual disk write
Anonymous
he was talking about doing multiple small writes
I dont know the entire context but it is especially important that you dont flush the buffer yourself when you are doing multiple small writes
+
i was just saying that they won't impact performance because not every write will be an actual disk write
so if its the case I can simple use ofstream directly and add my data to it, and also freely seek into it? and its doesn't impact performance?
⚛ Hz
(unless your program be killed accidently
⚛ Hz
* you may want to write to a temporary file, and swap them after close
Anonymous
i gave an example for a word processor. see if you can do something similar where the file can be divided into pages where you keep the current page on RAM in structs and only put it on the ofstream when the user moves to a different page.
+
> and also freely seek into it? that will almost certainly cause automatic flushes
so the only way for me is create the full buffer with all data first and then at the end write it to the file but I'm not sure how can I do that, maybe create a vector of chars and append each of my data to it and save it to file at the end? I mean I need to be able to move in the file to update the offsets
+
btw thank you for trying to help me
⚛ Hz
you can reserve space if you don't want to let it allocate memory when appending
⚛ Hz
https://en.cppreference.com/w/cpp/string/basic_string/reserve
⚛ Hz
and if you have to read the whole file, you don't need seek any more
⚛ Hz
tip: memory mapped file cannot be easily extended
⚛ Hz
it has to be fixed size
Anonymous
it has to be fixed size
You can resize shared memory
⚛ Hz
no, it won't work for mmap for file / MapViewOfFile the only way to resize is destory old map, resize file, and re-map
⚛ Hz
Especially the situation here is the whole file, which means you can't use mremap
+
Your use case shouldnt be using fstream at all. You should instead be using memory mapped files for this.
I dont think so, I need to first read data with ifstream, and create new file with some change to another file but ofstream
+
std::string is basically a special std::vector(it is just a char container)
about using std::string I need to use another data type like uint32_t too I dont know if I can store this type of data on that too
⚛ Hz
* a portable way is repeat & and >> to split into individual bytes (cast to byte array may cause byte order problem)
⚛ Hz
or simply uint32_t value; char result[4]; result[0] = (value & 0x000000ff); result[1] = (value & 0x0000ff00) >> 8; result[2] = (value & 0x00ff0000) >> 16; result[3] = (value & 0xff000000) >> 24;
Anonymous
about using std::string I need to use another data type like uint32_t too I dont know if I can store this type of data on that too
you only need to store things as bytes on the disk. when your data is in RAM you can use a struct that contains uint32_t, an std::variant<uint32_t, something something>, or an std::any or just use a uint32_t itself.
Anonymous
no, it won't work for mmap for file / MapViewOfFile the only way to resize is destory old map, resize file, and re-map
The point was to create a memory map larger than the size of the file and store the contents there along with the edits. At the time of writing, call ftrunate on the file (after determining the required size) and store the contents of the memory back into the file.
⚛ Hz
The point was to create a memory map larger than the size of the file and store the contents there along with the edits. At the time of writing, call ftrunate on the file (after determining the required size) and store the contents of the memory back into the file.
it is unspecified behavior ( A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file. The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified.
Anonymous
I dont think so, I need to first read data with ifstream, and create new file with some change to another file but ofstream
If you are going to write to a different file, then it is an even easier case. Using fstream for such cases is not efficient.
Anonymous
it is unspecified behavior ( A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file. The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified.
Create an anonymous mapped region. If you are going to map ile into the memory then yes you cant change the size. I was saying read the contents of the file into a shared memory region bigger than the file length( can be determined using fstat) and write the contents of this shared memory region into another file.
⚛ Hz
wait, does it has any difference as use std::string ??
Anonymous
wait, does it has any difference as use std::string ??
It does. It is not MT safe while memory mapped regions can be made MT safe
⚛ Hz
hmmm, I don't think that this issue needs to be considered....The original question did not even mention to use it in a different thread. and it is not mt-safe when you want to resize the shared memory
Anonymous
hmmm, I don't think that this issue needs to be considered....The original question did not even mention to use it in a different thread. and it is not mt-safe when you want to resize the shared memory
Doesnt hurt to be safe. The general idea is to not resize shared memory if possible. If you cant know the changes or how big the new file will be then resizing memory is still a better option than resizing string with changes in between.
⚛ Hz
but you can use std::string::reserve to reserve space
Anonymous
but you can use std::string::reserve to reserve space
You can also create an anonymous memory mapping with the size you require
⚛ Hz
it has no difference any way (according to the c++ memory model, dynamically allocated memory can be accessed by multiple threads by default)
Anonymous
it has no difference any way (according to the c++ memory model, dynamically allocated memory can be accessed by multiple threads by default)
Yes but the memory is not meant to be shared across threads in the case of a string though it uses the heap
⚛ Hz
unless you decide to override the global memory allocator, all memory allocated by stl container should be global accessable
Anonymous
unless you decide to override the global memory allocator, all memory allocated by stl container should be global accessable
It should be as long as you dont use any of the string member functions which defeats the whole point. If someone somewhere uses the string member functions in any way such access will lead to UB
⚛ Hz
you can always use pre-allocated string as general byte buffer, use it as std::string buffer; std::mutex mtx; and only lock it when need to resize(operator[], c_str() and data() should be safe since the space has already be reserved, so no additional allocation when resizing, that's mean the address won't change) But it sounds crazy to modify the size in multiple threads, how can you ensure that other data structures have not been destroyed before this? anyway it should not unsafer than using raw system provided function like mmap, since it need read the whole documention to use them correctly
Anonymous
you can always use pre-allocated string as general byte buffer, use it as std::string buffer; std::mutex mtx; and only lock it when need to resize(operator[], c_str() and data() should be safe since the space has already be reserved, so no additional allocation when resizing, that's mean the address won't change) But it sounds crazy to modify the size in multiple threads, how can you ensure that other data structures have not been destroyed before this? anyway it should not unsafer than using raw system provided function like mmap, since it need read the whole documention to use them correctly
1. Create Anonymous mapping with required size (may be greater than the size of the file to account for edits) 2. Read file into this memory 3. Edit the memory 4. Write changes into new file. This should be the right way for multiple reasons A) It is more efficient because you directly interface with the OS and you know what it is doing rather than using an STL data structure which does so many things behind your back. B) It can be made thread safe much easily when compared to strings. I am not sure how you are going to enforce the lock the mutex when calling any functions that require a resize except for to call lock on any non const operation on the string which makes it even more inefficient. With memory mapped regions, you can easily set up a guard region at the end of the mapped memory region and only when writes flow into this region, lock a mutex and call remap with MAY_MOVE option which will ensure only the pointers into the region will be invalidated but offsets will remain valid even after the move. Typically with file operations you will mostly be working with offsets which makes it easier. C) strings are supposed to represent characters and not bytes of memory and hence must not be exploited for such purposes. D) With memory mapped region any attempt to write beyond the memory region can be tracked and made to fault (with additional system calls) but with strings it may silently lead to Undefined Behavior if you do so
⚛ Hz
1. Create Anonymous mapping with required size (may be greater than the size of the file to account for edits) 2. Read file into this memory 3. Edit the memory 4. Write changes into new file. This should be the right way for multiple reasons A) It is more efficient because you directly interface with the OS and you know what it is doing rather than using an STL data structure which does so many things behind your back. B) It can be made thread safe much easily when compared to strings. I am not sure how you are going to enforce the lock the mutex when calling any functions that require a resize except for to call lock on any non const operation on the string which makes it even more inefficient. With memory mapped regions, you can easily set up a guard region at the end of the mapped memory region and only when writes flow into this region, lock a mutex and call remap with MAY_MOVE option which will ensure only the pointers into the region will be invalidated but offsets will remain valid even after the move. Typically with file operations you will mostly be working with offsets which makes it easier. C) strings are supposed to represent characters and not bytes of memory and hence must not be exploited for such purposes. D) With memory mapped region any attempt to write beyond the memory region can be tracked and made to fault (with additional system calls) but with strings it may silently lead to Undefined Behavior if you do so
(if you consider ub, all system interface are ub in standard c++) yes, it is downside to string/mutex thanks to c++ don't have rust-like ownship system, it's all about the programmer's responsibility and the char is not byte, yes, this is a historical issue. The defects of using the system interface are more serious in my opinion: it is not portable. it is not even about linux / windows, how about macos? did you know about the all differences between macos and linux? (example: nginx misuse the SO_REUSEPORT/SO_REUSEADDR so it not woring in macos in some version) after all, I don't think such a simple task need to use a complex and platform-dependent solution, it maybe a good program, but it is definitely a bad design choose... do you really need to write to a single file in multiple threads? If this is a serious demand, then there are definitely more problems need to consider
+
1. Create Anonymous mapping with required size (may be greater than the size of the file to account for edits) 2. Read file into this memory 3. Edit the memory 4. Write changes into new file. This should be the right way for multiple reasons A) It is more efficient because you directly interface with the OS and you know what it is doing rather than using an STL data structure which does so many things behind your back. B) It can be made thread safe much easily when compared to strings. I am not sure how you are going to enforce the lock the mutex when calling any functions that require a resize except for to call lock on any non const operation on the string which makes it even more inefficient. With memory mapped regions, you can easily set up a guard region at the end of the mapped memory region and only when writes flow into this region, lock a mutex and call remap with MAY_MOVE option which will ensure only the pointers into the region will be invalidated but offsets will remain valid even after the move. Typically with file operations you will mostly be working with offsets which makes it easier. C) strings are supposed to represent characters and not bytes of memory and hence must not be exploited for such purposes. D) With memory mapped region any attempt to write beyond the memory region can be tracked and made to fault (with additional system calls) but with strings it may silently lead to Undefined Behavior if you do so
I fully agree with Reasons C I don't actually know how can Convert data type like uint16_t and store them in std::string and at the end write them as char to file
+
use that
oh I dont see it
⚛ Hz
cut it to uint16 is trivial
Anonymous
(if you consider ub, all system interface are ub in standard c++) yes, it is downside to string/mutex thanks to c++ don't have rust-like ownship system, it's all about the programmer's responsibility and the char is not byte, yes, this is a historical issue. The defects of using the system interface are more serious in my opinion: it is not portable. it is not even about linux / windows, how about macos? did you know about the all differences between macos and linux? (example: nginx misuse the SO_REUSEPORT/SO_REUSEADDR so it not woring in macos in some version) after all, I don't think such a simple task need to use a complex and platform-dependent solution, it maybe a good program, but it is definitely a bad design choose... do you really need to write to a single file in multiple threads? If this is a serious demand, then there are definitely more problems need to consider
If you are going to bitch about portability when he didnt even mention that he wants his application to be portable why is it so wrong to expect a program to be thread safe? His question was efficiency and the option I gave was more efficient and safer. Why are system calls UB in C++? Lol what does that even mean? Even the C++RT relies on system calls. It is not portable. But making it portable is easy because all the major operating systems support creating a shared memory region, extending it and writing the contents of a shared memory region into a file. It will be slightly more work to make it portable across all OSes but the benefit are that 1. It will be much more efficient 2. It will be safer to use 3. It is the right thing to do And using a string for this is what is a bad design choice as you are using something for purposes that it was not intended for. This is just a hack forgetting for a while all the other disadvantages I mentioned.
Anonymous
@you_Know_it0 you learning C for hacking?
Anonymous
some people learn C for ethical hacking
+
@you_Know_it0 you learning C for hacking?
no, Im just working with a file in binary
Anonymous
ah okey
+
some people learn C for ethical hacking
no a single threads should be ok for this job
⚛ Hz
That's the problem, no multiple thread mean no need those complexity
Anonymous
1. Create Anonymous mapping with required size (may be greater than the size of the file to account for edits) 2. Read file into this memory 3. Edit the memory 4. Write changes into new file. This should be the right way for multiple reasons A) It is more efficient because you directly interface with the OS and you know what it is doing rather than using an STL data structure which does so many things behind your back. B) It can be made thread safe much easily when compared to strings. I am not sure how you are going to enforce the lock the mutex when calling any functions that require a resize except for to call lock on any non const operation on the string which makes it even more inefficient. With memory mapped regions, you can easily set up a guard region at the end of the mapped memory region and only when writes flow into this region, lock a mutex and call remap with MAY_MOVE option which will ensure only the pointers into the region will be invalidated but offsets will remain valid even after the move. Typically with file operations you will mostly be working with offsets which makes it easier. C) strings are supposed to represent characters and not bytes of memory and hence must not be exploited for such purposes. D) With memory mapped region any attempt to write beyond the memory region can be tracked and made to fault (with additional system calls) but with strings it may silently lead to Undefined Behavior if you do so
Points A, C and D here apply even in the case of a single threaded application.
⚛ Hz
If you really concerned about safety, you may want to handle process killed by accidentally, and only half of the file have been written
Anonymous
If you really concerned about safety, you may want to handle process killed by accidentally, and only half of the file have been written
Yes you should if it is a problem you need to handle. Why do you think that you shouldnt? And you still have not answered what you meant by "all system calls are UB in c++"
+
I found this small project https://github.com/m-byte918/Binary-Reader-Writer/blob/master/Buffer.cpp its store all the data in a vector
⚛ Hz
The std interface is the only thing you can rely on(if you need be defined in standard
Anonymous
I found this small project https://github.com/m-byte918/Binary-Reader-Writer/blob/master/Buffer.cpp its store all the data in a vector
Yes this seems good and he handles even LittleEndian and BigEndian issues thus making it possible to deserialize data that has been serialized on a system supporting different endianness. This buffer doesnt support edits and writes in between btw. Again if it is efficiency that you seek then memory mapping may be your best option.
Anonymous
if it has been wrapped in std, it has been defined
Do you even know what Undefined Behavior means?
Anonymous
Read about what the C++ standard defines as Undefined Behavior here And no system calls are not UB according to the standards as long as you call them the right way.
⚛ Hz
Ok, that's my fault, I'm confused with rust's unsafe
Anonymous
Ok, that's my fault, I'm confused with rust's unsafe
Unsafe in Rust is just a way of asking the rust compiler to not bother about borrow checkers and immutability issues and to allow pointers and other stuff
⚛ Hz
Unsafe in Rust is just a way of asking the rust compiler to not bother about borrow checkers and immutability issues and to allow pointers and other stuff
(no, unsafe block also enforce the borrow checker and immutable, it only allow you to do something like call unsafe function, dereferencing pointer
Anonymous
(no, unsafe block also enforce the borrow checker and immutable, it only allow you to do something like call unsafe function, dereferencing pointer
There are ways to overcome the borrow checker in unsafe code using aliased pointers that the Rust compiler cant/wont catch.