@programminginc - страница 3147

⚛ Hz

as I said I have some files (max size 40MB) what I want to do is open files make some changes (new change can be bigger or smaller then original data) and also for add new change I need to seek so many times in data, I mean first create a string buffer (by converting std::string to char by c_str()) then save the size of string before it, then get the size of buffer and after that append both of them on another buffer.

I suggest to use a proper database for that... (>1MB)

⚛ Hz

like sqlite3

⚛ Hz

https://sqlite.org/appfileformat.html

⚛ Hz

But as a beginner, it is understandable to implement a crappy database of your own

+

I suggest to use a proper database for that... (>1MB)

the file is not mine... Employer give it to me and want me to parse it

Anonymous

thank you, I will check it the problem write directly with ofstream is I need to write small byte so many times, so if I use ofstream and start writing all of that directly to file its start to slow down the program so first I need to finish creating new changed data and then write the changed data to file in one step with ofstream

writing itself doesn't write the data to the file, you need to either flush the writes or set up automatic flushing. that being said, i think if you are writing something like a word processor, the current context or page of the document should completely be in RAM in different structs and not in the file's to-be-written buffer. > so first I need to finish creating new changed data and then write the changed data to file in one step with ofstream do exactly this

Anonymous

writing itself doesn't write the data to the file, you need to either flush the writes or set up automatic flushing. that being said, i think if you are writing something like a word processor, the current context or page of the document should completely be in RAM in different structs and not in the file's to-be-written buffer. > so first I need to finish creating new changed data and then write the changed data to file in one step with ofstream do exactly this

You dont need to explicitly flush the writes. The C++RT will work with the OS to flush it at some random not determinable time. Unless you have reasons to want to flush the buffer, you shouldnt do it

Anonymous

You dont need to explicitly flush the writes. The C++RT will work with the OS to flush it at some random not determinable time. Unless you have reasons to want to flush the buffer, you shouldnt do it

he was talking about doing multiple small writes

Anonymous

i was just saying that they won't impact performance because not every write will be an actual disk write

Anonymous

he was talking about doing multiple small writes

I dont know the entire context but it is especially important that you dont flush the buffer yourself when you are doing multiple small writes

+

i was just saying that they won't impact performance because not every write will be an actual disk write

so if its the case I can simple use ofstream directly and add my data to it, and also freely seek into it? and its doesn't impact performance?

⚛ Hz

(unless your program be killed accidently

⚛ Hz

* you may want to write to a temporary file, and swap them after close

Anonymous

so if its the case I can simple use ofstream directly and add my data to it, and also freely seek into it? and its doesn't impact performance?

> and also freely seek into it? that will almost certainly cause automatic flushes

Anonymous

so if its the case I can simple use ofstream directly and add my data to it, and also freely seek into it? and its doesn't impact performance?

what type of program is it?

Anonymous

i gave an example for a word processor. see if you can do something similar where the file can be divided into pages where you keep the current page on RAM in structs and only put it on the ofstream when the user moves to a different page.

+

> and also freely seek into it? that will almost certainly cause automatic flushes

so the only way for me is create the full buffer with all data first and then at the end write it to the file but I'm not sure how can I do that, maybe create a vector of chars and append each of my data to it and save it to file at the end? I mean I need to be able to move in the file to update the offsets

+

i gave an example for a word processor. see if you can do something similar where the file can be divided into pages where you keep the current page on RAM in structs and only put it on the ofstream when the user moves to a different page.

yes I understand what you mean, but I cant understand how can I achieve creating this buffer with dynamic size

+

btw thank you for trying to help me

Anonymous

so the only way for me is create the full buffer with all data first and then at the end write it to the file but I'm not sure how can I do that, maybe create a vector of chars and append each of my data to it and save it to file at the end? I mean I need to be able to move in the file to update the offsets

it can be a vector of any standard layout type

⚛ Hz

so the only way for me is create the full buffer with all data first and then at the end write it to the file but I'm not sure how can I do that, maybe create a vector of chars and append each of my data to it and save it to file at the end? I mean I need to be able to move in the file to update the offsets

std::string is basically a special std::vector(it is just a char container)

⚛ Hz

you can reserve space if you don't want to let it allocate memory when appending

⚛ Hz

https://en.cppreference.com/w/cpp/string/basic_string/reserve

⚛ Hz

and if you have to read the whole file, you don't need seek any more

Anonymous

as I said I have some files (max size 40MB) what I want to do is open files make some changes (new change can be bigger or smaller then original data) and also for add new change I need to seek so many times in data, I mean first create a string buffer (by converting std::string to char by c_str()) then save the size of string before it, then get the size of buffer and after that append both of them on another buffer.

Your use case shouldnt be using fstream at all. You should instead be using memory mapped files for this.

⚛ Hz

tip: memory mapped file cannot be easily extended

⚛ Hz

it has to be fixed size

Anonymous

it has to be fixed size

You can resize shared memory

⚛ Hz

no, it won't work for mmap for file / MapViewOfFile the only way to resize is destory old map, resize file, and re-map

⚛ Hz

Especially the situation here is the whole file, which means you can't use mremap

+

Your use case shouldnt be using fstream at all. You should instead be using memory mapped files for this.

I dont think so, I need to first read data with ifstream, and create new file with some change to another file but ofstream

+

std::string is basically a special std::vector(it is just a char container)

about using std::string I need to use another data type like uint32_t too I dont know if I can store this type of data on that too

⚛ Hz

* a portable way is repeat & and >> to split into individual bytes (cast to byte array may cause byte order problem)

⚛ Hz

or simply uint32_t value; char result[4]; result[0] = (value & 0x000000ff); result[1] = (value & 0x0000ff00) >> 8; result[2] = (value & 0x00ff0000) >> 16; result[3] = (value & 0xff000000) >> 24;

Anonymous

about using std::string I need to use another data type like uint32_t too I dont know if I can store this type of data on that too

you only need to store things as bytes on the disk. when your data is in RAM you can use a struct that contains uint32_t, an std::variant<uint32_t, something something>, or an std::any or just use a uint32_t itself.

Anonymous

no, it won't work for mmap for file / MapViewOfFile the only way to resize is destory old map, resize file, and re-map

The point was to create a memory map larger than the size of the file and store the contents there along with the edits. At the time of writing, call ftrunate on the file (after determining the required size) and store the contents of the memory back into the file.

⚛ Hz

The point was to create a memory map larger than the size of the file and store the contents there along with the edits. At the time of writing, call ftrunate on the file (after determining the required size) and store the contents of the memory back into the file.

it is unspecified behavior ( A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file. The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified.

Anonymous

I dont think so, I need to first read data with ifstream, and create new file with some change to another file but ofstream

If you are going to write to a different file, then it is an even easier case. Using fstream for such cases is not efficient.

Anonymous

it is unspecified behavior ( A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file. The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified.

Create an anonymous mapped region. If you are going to map ile into the memory then yes you cant change the size. I was saying read the contents of the file into a shared memory region bigger than the file length( can be determined using fstat) and write the contents of this shared memory region into another file.

⚛ Hz

wait, does it has any difference as use std::string ??

Anonymous

wait, does it has any difference as use std::string ??

It does. It is not MT safe while memory mapped regions can be made MT safe

⚛ Hz

hmmm, I don't think that this issue needs to be considered....The original question did not even mention to use it in a different thread. and it is not mt-safe when you want to resize the shared memory

Anonymous

hmmm, I don't think that this issue needs to be considered....The original question did not even mention to use it in a different thread. and it is not mt-safe when you want to resize the shared memory

Doesnt hurt to be safe. The general idea is to not resize shared memory if possible. If you cant know the changes or how big the new file will be then resizing memory is still a better option than resizing string with changes in between.

⚛ Hz

but you can use std::string::reserve to reserve space

+

Create an anonymous mapped region. If you are going to map ile into the memory then yes you cant change the size. I was saying read the contents of the file into a shared memory region bigger than the file length( can be determined using fstat) and write the contents of this shared memory region into another file.

but Im not sure how exactly can I do this I never worked with shared memory and I think its a little to much for this job, at the end the file itself not so big

Anonymous

but you can use std::string::reserve to reserve space

You can also create an anonymous memory mapping with the size you require

Anonymous

but Im not sure how exactly can I do this I never worked with shared memory and I think its a little to much for this job, at the end the file itself not so big

Not knowing how to do something is not a reason for not doing it if it is the right thing to do

⚛ Hz

it has no difference any way (according to the c++ memory model, dynamically allocated memory can be accessed by multiple threads by default)

Anonymous

it has no difference any way (according to the c++ memory model, dynamically allocated memory can be accessed by multiple threads by default)

Yes but the memory is not meant to be shared across threads in the case of a string though it uses the heap

⚛ Hz

unless you decide to override the global memory allocator, all memory allocated by stl container should be global accessable

Anonymous

unless you decide to override the global memory allocator, all memory allocated by stl container should be global accessable

It should be as long as you dont use any of the string member functions which defeats the whole point. If someone somewhere uses the string member functions in any way such access will lead to UB

⚛ Hz

you can always use pre-allocated string as general byte buffer, use it as std::string buffer; std::mutex mtx; and only lock it when need to resize(operator[], c_str() and data() should be safe since the space has already be reserved, so no additional allocation when resizing, that's mean the address won't change) But it sounds crazy to modify the size in multiple threads, how can you ensure that other data structures have not been destroyed before this? anyway it should not unsafer than using raw system provided function like mmap, since it need read the whole documention to use them correctly

Anonymous

you can always use pre-allocated string as general byte buffer, use it as std::string buffer; std::mutex mtx; and only lock it when need to resize(operator[], c_str() and data() should be safe since the space has already be reserved, so no additional allocation when resizing, that's mean the address won't change) But it sounds crazy to modify the size in multiple threads, how can you ensure that other data structures have not been destroyed before this? anyway it should not unsafer than using raw system provided function like mmap, since it need read the whole documention to use them correctly

1. Create Anonymous mapping with required size (may be greater than the size of the file to account for edits) 2. Read file into this memory 3. Edit the memory 4. Write changes into new file. This should be the right way for multiple reasons A) It is more efficient because you directly interface with the OS and you know what it is doing rather than using an STL data structure which does so many things behind your back. B) It can be made thread safe much easily when compared to strings. I am not sure how you are going to enforce the lock the mutex when calling any functions that require a resize except for to call lock on any non const operation on the string which makes it even more inefficient. With memory mapped regions, you can easily set up a guard region at the end of the mapped memory region and only when writes flow into this region, lock a mutex and call remap with MAY_MOVE option which will ensure only the pointers into the region will be invalidated but offsets will remain valid even after the move. Typically with file operations you will mostly be working with offsets which makes it easier. C) strings are supposed to represent characters and not bytes of memory and hence must not be exploited for such purposes. D) With memory mapped region any attempt to write beyond the memory region can be tracked and made to fault (with additional system calls) but with strings it may silently lead to Undefined Behavior if you do so

⚛ Hz

1. Create Anonymous mapping with required size (may be greater than the size of the file to account for edits) 2. Read file into this memory 3. Edit the memory 4. Write changes into new file. This should be the right way for multiple reasons A) It is more efficient because you directly interface with the OS and you know what it is doing rather than using an STL data structure which does so many things behind your back. B) It can be made thread safe much easily when compared to strings. I am not sure how you are going to enforce the lock the mutex when calling any functions that require a resize except for to call lock on any non const operation on the string which makes it even more inefficient. With memory mapped regions, you can easily set up a guard region at the end of the mapped memory region and only when writes flow into this region, lock a mutex and call remap with MAY_MOVE option which will ensure only the pointers into the region will be invalidated but offsets will remain valid even after the move. Typically with file operations you will mostly be working with offsets which makes it easier. C) strings are supposed to represent characters and not bytes of memory and hence must not be exploited for such purposes. D) With memory mapped region any attempt to write beyond the memory region can be tracked and made to fault (with additional system calls) but with strings it may silently lead to Undefined Behavior if you do so

(if you consider ub, all system interface are ub in standard c++) yes, it is downside to string/mutex thanks to c++ don't have rust-like ownship system, it's all about the programmer's responsibility and the char is not byte, yes, this is a historical issue. The defects of using the system interface are more serious in my opinion: it is not portable. it is not even about linux / windows, how about macos? did you know about the all differences between macos and linux? (example: nginx misuse the SO_REUSEPORT/SO_REUSEADDR so it not woring in macos in some version) after all, I don't think such a simple task need to use a complex and platform-dependent solution, it maybe a good program, but it is definitely a bad design choose... do you really need to write to a single file in multiple threads? If this is a serious demand, then there are definitely more problems need to consider

+

1. Create Anonymous mapping with required size (may be greater than the size of the file to account for edits) 2. Read file into this memory 3. Edit the memory 4. Write changes into new file. This should be the right way for multiple reasons A) It is more efficient because you directly interface with the OS and you know what it is doing rather than using an STL data structure which does so many things behind your back. B) It can be made thread safe much easily when compared to strings. I am not sure how you are going to enforce the lock the mutex when calling any functions that require a resize except for to call lock on any non const operation on the string which makes it even more inefficient. With memory mapped regions, you can easily set up a guard region at the end of the mapped memory region and only when writes flow into this region, lock a mutex and call remap with MAY_MOVE option which will ensure only the pointers into the region will be invalidated but offsets will remain valid even after the move. Typically with file operations you will mostly be working with offsets which makes it easier. C) strings are supposed to represent characters and not bytes of memory and hence must not be exploited for such purposes. D) With memory mapped region any attempt to write beyond the memory region can be tracked and made to fault (with additional system calls) but with strings it may silently lead to Undefined Behavior if you do so

I fully agree with Reasons C I don't actually know how can Convert data type like uint16_t and store them in std::string and at the end write them as char to file

⚛ Hz

or simply uint32_t value; char result[4]; result[0] = (value & 0x000000ff); result[1] = (value & 0x0000ff00) >> 8; result[2] = (value & 0x00ff0000) >> 16; result[3] = (value & 0xff000000) >> 24;

use that

+

use that

oh I dont see it

⚛ Hz

cut it to uint16 is trivial

Anonymous

(if you consider ub, all system interface are ub in standard c++) yes, it is downside to string/mutex thanks to c++ don't have rust-like ownship system, it's all about the programmer's responsibility and the char is not byte, yes, this is a historical issue. The defects of using the system interface are more serious in my opinion: it is not portable. it is not even about linux / windows, how about macos? did you know about the all differences between macos and linux? (example: nginx misuse the SO_REUSEPORT/SO_REUSEADDR so it not woring in macos in some version) after all, I don't think such a simple task need to use a complex and platform-dependent solution, it maybe a good program, but it is definitely a bad design choose... do you really need to write to a single file in multiple threads? If this is a serious demand, then there are definitely more problems need to consider

If you are going to bitch about portability when he didnt even mention that he wants his application to be portable why is it so wrong to expect a program to be thread safe? His question was efficiency and the option I gave was more efficient and safer. Why are system calls UB in C++? Lol what does that even mean? Even the C++RT relies on system calls. It is not portable. But making it portable is easy because all the major operating systems support creating a shared memory region, extending it and writing the contents of a shared memory region into a file. It will be slightly more work to make it portable across all OSes but the benefit are that 1. It will be much more efficient 2. It will be safer to use 3. It is the right thing to do And using a string for this is what is a bad design choice as you are using something for purposes that it was not intended for. This is just a hack forgetting for a while all the other disadvantages I mentioned.

+

If you are going to bitch about portability when he didnt even mention that he wants his application to be portable why is it so wrong to expect a program to be thread safe? His question was efficiency and the option I gave was more efficient and safer. Why are system calls UB in C++? Lol what does that even mean? Even the C++RT relies on system calls. It is not portable. But making it portable is easy because all the major operating systems support creating a shared memory region, extending it and writing the contents of a shared memory region into a file. It will be slightly more work to make it portable across all OSes but the benefit are that 1. It will be much more efficient 2. It will be safer to use 3. It is the right thing to do And using a string for this is what is a bad design choice as you are using something for purposes that it was not intended for. This is just a hack forgetting for a while all the other disadvantages I mentioned.

my target platform is just window, so that not a problem first I need to learn how to work with shared memory

Anonymous

@you_Know_it0 you learning C for hacking?

⚛ Hz

my target platform is just window, so that not a problem first I need to learn how to work with shared memory

So do you need to operate in different threads?

Anonymous

some people learn C for ethical hacking

+

@you_Know_it0 you learning C for hacking?

no, Im just working with a file in binary

Anonymous

ah okey

+

some people learn C for ethical hacking

no a single threads should be ok for this job

⚛ Hz

That's the problem, no multiple thread mean no need those complexity

Anonymous

1. Create Anonymous mapping with required size (may be greater than the size of the file to account for edits) 2. Read file into this memory 3. Edit the memory 4. Write changes into new file. This should be the right way for multiple reasons A) It is more efficient because you directly interface with the OS and you know what it is doing rather than using an STL data structure which does so many things behind your back. B) It can be made thread safe much easily when compared to strings. I am not sure how you are going to enforce the lock the mutex when calling any functions that require a resize except for to call lock on any non const operation on the string which makes it even more inefficient. With memory mapped regions, you can easily set up a guard region at the end of the mapped memory region and only when writes flow into this region, lock a mutex and call remap with MAY_MOVE option which will ensure only the pointers into the region will be invalidated but offsets will remain valid even after the move. Typically with file operations you will mostly be working with offsets which makes it easier. C) strings are supposed to represent characters and not bytes of memory and hence must not be exploited for such purposes. D) With memory mapped region any attempt to write beyond the memory region can be tracked and made to fault (with additional system calls) but with strings it may silently lead to Undefined Behavior if you do so

Points A, C and D here apply even in the case of a single threaded application.

⚛ Hz

If you really concerned about safety, you may want to handle process killed by accidentally, and only half of the file have been written

Anonymous

If you really concerned about safety, you may want to handle process killed by accidentally, and only half of the file have been written

Yes you should if it is a problem you need to handle. Why do you think that you shouldnt? And you still have not answered what you meant by "all system calls are UB in c++"

+

I found this small project https://github.com/m-byte918/Binary-Reader-Writer/blob/master/Buffer.cpp its store all the data in a vector

⚛ Hz

Yes you should if it is a problem you need to handle. Why do you think that you shouldnt? And you still have not answered what you meant by "all system calls are UB in c++"

if it has been wrapped in std, it has been defined

⚛ Hz

The std interface is the only thing you can rely on(if you need be defined in standard

Anonymous

I found this small project https://github.com/m-byte918/Binary-Reader-Writer/blob/master/Buffer.cpp its store all the data in a vector

Yes this seems good and he handles even LittleEndian and BigEndian issues thus making it possible to deserialize data that has been serialized on a system supporting different endianness. This buffer doesnt support edits and writes in between btw. Again if it is efficiency that you seek then memory mapping may be your best option.

Anonymous

if it has been wrapped in std, it has been defined

Do you even know what Undefined Behavior means?

Anonymous

Read about what the C++ standard defines as Undefined Behavior here And no system calls are not UB according to the standards as long as you call them the right way.

⚛ Hz

Ok, that's my fault, I'm confused with rust's unsafe

Anonymous

Ok, that's my fault, I'm confused with rust's unsafe

Unsafe in Rust is just a way of asking the rust compiler to not bother about borrow checkers and immutability issues and to allow pointers and other stuff

⚛ Hz

Unsafe in Rust is just a way of asking the rust compiler to not bother about borrow checkers and immutability issues and to allow pointers and other stuff

(no, unsafe block also enforce the borrow checker and immutable, it only allow you to do something like call unsafe function, dereferencing pointer

Anonymous

(no, unsafe block also enforce the borrow checker and immutable, it only allow you to do something like call unsafe function, dereferencing pointer

There are ways to overcome the borrow checker in unsafe code using aliased pointers that the Rust compiler cant/wont catch.