Recover un-synced data after reboot #344

asteriskSF · 2019-12-03T22:53:06Z

I'm looking for some way to recover data that was written into flash by lfs but not yet sync'ed with the file after a reboot. Is this possible with some existing calls or is it possible to add some additional algorithm to implement as a new feature?

I am interested in doing this for a single file which is always appended with logging data. We hope the log data will better explain why the system rebooted in the first place.

We're using lfs v1.7.1 with a NOR-flash.

// block device configuration
.read_size = 256,
.prog_size = 256,
.block_size = 32768,
.block_count = 512,
.lookahead = 512,

static uint8_t read_buffer[256];
static uint8_t prog_buffer[256];
static uint32_t lookahead_buffer[512/32];

The text was updated successfully, but these errors were encountered:

e107steved · 2019-12-04T14:59:15Z

@geky should be able to comment better on whether this would work

As a suggestion, set a flag in your low-level flash write routine at startup. Then write some data to your file. Logically this would attempt to write to the first free flash location, which is probably where your code started writing before the reset. Note the flash location accessed (and probably return an error, and clear the flag).
Depending on when the reset occurred, it's perfectly possible that the data never got to flash.

From what I can remember of how free space is identified, it may not work all the time

Alternatively, set a flag each time you start to write to your file, and in the low-level driver note the block number written in some area of RAM which isn't cleared (this entry may be trickier to identify, since there will be directory updates as well).

geky · 2019-12-05T20:13:28Z

Not possible, lfs_file_sync is how you mark the data you want to persist. Otherwise as @e107steved mentions, it may be in RAM.

One option is to store a "commit offset", either as a separate file or as a custom attribute (both have similar costs). This would be a single integer offset that tracks the size of the log that is "committed". This would free up the size of the log so you could call lfs_file_sync more often to give you more data to debug.

Note that lfs_file_sync has a cost, so the best approach may be to call lfs_file_sync based on a timer. This way you can batch together spikes of multiple log writes.

asteriskSF · 2019-12-09T16:55:54Z

Just so I can understand this better and explain it better to the rest of the team, Is it not possible because there is no persistent (non-volatile) record of the block which contains the unsychronized data which has been programmed on flash but not yet sync'ed into the file? Would it be possible to record in the metadata which blocks are storing the unsync'ed data?

I understand that data in RAM certainly would not be recoverable. We were hopefully that since only 256 bytes could be stored in the programming cache, that the remainder of the data (which must have already been written to the non-volatile storage) could be located and therefore recovered. In addition, we check the LFS internals to determine when the write has progressed into a new block and always sync after a write which which has moved into a new block. (This method minimizes the time delay of the sync by only needing to copy a minimal amount of data... we call it "lazy sync") Therefore we only have at most 1-2 blocks which have not been sync'ed.

Unfortunately with 32k blocks, quite a lot of our logging messages are lost when an unplanned reset occurs. We have chosen not to sync on a timer, because the block copy on the next file append can have a very high latency. Blocking our real-time system while LFS erases a block and copies up to 32k of data is very difficult to accommodate.

asteriskSF · 2019-12-12T20:51:25Z

If we can store the new block location of the unwritten data after each sync, could we recover the data?

joel-felcana · 2019-12-13T11:15:44Z

I have the same question. We log continuously and at a high data rate. I noticed that after a power loss I get 0B available, no matter how many times had I written to flash (I mean real writes, calls to write_stuff_to_flash()) or blocks I had used. I tried lfs_file_sync() just before a reset and I could get the data back, which is terrific as it's exactly what I needed. I thought that after a write buffer was filled, a call to sync was made, and that sync was only needed to flush the buffer.

What are the drawbacks of calling sync often? Does it take one "slot" on the dir file on every sync? Would that cause sync times to steadily grow over time as it happened with open and close times in #214?

geky · 2019-12-18T20:14:48Z

Ah interesting. So if I understand correctly, it's not the overhead of writing the metadata that's the problem, but actually that littlefs freezes the file's blocks, forcing the next write to have to copy the last block of the file.

That is a tricky problem to solve. You could imagine on NOR flash, where you can program a byte at a time, littlefs could simply continue to write data to the file without copying the block. So after a sync the file could continue to use the last block without copying. This would make a sync as cheap as a single metadata update.

Unfortunately, this doesn't work when the prog_size > 1. If you end up writing data that isn't aligned to the program block and call sync, littlefs must write out the full program block, including garbage padding. This would prevent you from being able to continue writing to the block.

We could modify littlefs to continue using blocks if they were synced and the data was aligned to a prog-size boundary. Then you could call sync every 256-bytes without as big disruptions.

I didn't originally because this is a rather niche optimization and I didn't think it would be worth the complexity.

Thinking generally, we could also repurpose the inline file mechanism used to store small files to also store data written to the end of files. This would let littlefs avoid freezing the tail of the file though it may add overhead to the metadata. This would be something I'd want to profile to see if it would be worth it. It would also be a bit complicated to implement.

geky · 2019-12-18T20:28:22Z

What are the drawbacks of calling sync often? Does it take one "slot" on the dir file on every sync? Would that cause sync times to steadily grow over time as it happened with open and close times in #214?

Yes, this is also a concern. The runtime cost in #214 comes from the time it takes to scan the log, which grows as more commits are added.

Though one difference is that these syncs get cleaned up during metadata compaction, whereas the creation of additional files resides in the metadata block until the file is deleted.

I noticed that after a power loss I get 0B available

Ah, this is because lfs_file_open creates a 0B file. This isn't strictly necessary, but we need to store the file name somewhere. #353

We could have a temporary "uncommitted" file placeholder written to metadata, so the file doesn't exist after a power-loss, but I haven't seen this be a big request.

Other than that quirk, littlefs strictly does not update the on-disk file unless sync is called.

If we can store the new block location of the unwritten data after each sync, could we recover the data?

Yes, if you are willing to read directly from the raw block device, as long as you know the last block the data should still be there. The problem is that littlefs doesn't know which block was a part of the file unless you call sync.

You could store this in a custom attribute.

#define UNSYNC_BLOCK 0x75
lfs_setattr(&lfs, "path/to/file", UNSYNC_BLOCK, &file->block, sizeof(file->block));

Normally you can attach custom attributes to open files with the lfs_file_config struct, but that requires you to call sync, which kinda defeats the whole purpose.

HamzaHajeir · 2021-06-23T19:10:32Z

Well, It's a real problem, see this raised issue in ESP8266 Arduino core:
esp8266/Arduino#8155

Summary: the file is lost if didn't call File::close which is wrapped to lfs_file_close.
Reason: power interruption while writing.

That hits the guarantee that LittleFS provides:

Power-loss resilience - littlefs is designed to handle random power failures. All file operations have strong copy-on-write guarantees and if power is lost the filesystem will fall back to the last known good state.

M-Bab · 2022-08-25T07:48:03Z

During extensive testing I recognized sometimes an fsync is not enough to have all data actually written. Because we are working with small blocks of data (64 bytes), I had to artificially flood the buffer when I really want to have all data written:

void LOG_QuickSync(bool bFloodBuffer)
{
  if (logFileWriteHandle != NULL)
  {
    if (bFloodBuffer)
    {
      /* Fill Cache with enough data that definitely all events are written. */
      static const uint8_t DataBufferFiller[CONFIG_LITTLEFS_WRITE_SIZE] = {0};
      fwrite((void *) DataBufferFiller, sizeof(uint8_t), sizeof(DataBufferFiller), logFileWriteHandle);
      ESP_LOGI(TAG, "Forced quick sync for up-to-date log done!");
    }
    fsync(fileno(logFileWriteHandle));
  }
}

All tests with different littlefs configurations (changed PAGE, READ, WRITE, LOOKAHEAD and CACHE size) did not help. So I am posting this here that it might resolve the problem for others - or maybe someone comes up with a solution that the workaround is not needed.

trullock · 2023-09-19T11:28:13Z

I've just been doing some testing on this, and I have this exact behaviour (i.e. 0B file after a power cut before file.close()) on ESP32 using Arduino, but not on ESP8266 using Arduino. With the latter is seems to have synced on every flush(). Cna anyone shed any light on this before I spend hours diffing the sources and configs to try and find the difference?

geky · 2023-09-21T16:50:41Z

Hi @trullock, I'm not sure about the above issue, but I just wanted to mention littlefs currently creates a zero-length file when you call lfs_file_open in order to save the filename. That may be what you're seeing.

trullock · 2023-09-21T16:58:09Z

@geky thanks, probably is but on the 8266 I get all written bytes until the reset and on the 32 I get none

geky · 2023-09-21T17:03:52Z

Huh, sounds like one library is just calling lfs_file_sync at different times than the other. Maybe they have different flush implementations.

trullock · 2023-09-21T17:13:03Z

Yeah see here for some more clues joltwallet/esp_littlefs#144

geky added the enhancement label Dec 18, 2019

geky mentioned this issue Jan 31, 2020

lfs is erasing a block on every append #374

Open

HamzaHajeir mentioned this issue Jun 24, 2021

File loss due to program interruption before file closing #575

Open

This was referenced Jul 12, 2021

lfs_file_write following lfs_file_sync is really heavy operation #564

Open

Correct way to use LFS? #541

Closed

davidk88 mentioned this issue Jul 19, 2021

Every write triggers block erase #581

Closed

clinton-r mentioned this issue Feb 17, 2022

Why are there so many block erases? #644

Open

trullock mentioned this issue Sep 18, 2023

How to call sync manually? lorol/LITTLEFS#65

Open

trullock mentioned this issue Sep 20, 2023

Different behaviour on ESP32 vs ESP8266 joltwallet/esp_littlefs#144

Open

domnulvlad mentioned this issue Feb 25, 2024

Commit written data automatically joltwallet/esp_littlefs#179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recover un-synced data after reboot #344

Recover un-synced data after reboot #344

asteriskSF commented Dec 3, 2019

e107steved commented Dec 4, 2019 •

edited

Loading

geky commented Dec 5, 2019

asteriskSF commented Dec 9, 2019

asteriskSF commented Dec 12, 2019

joel-felcana commented Dec 13, 2019

geky commented Dec 18, 2019

geky commented Dec 18, 2019

HamzaHajeir commented Jun 23, 2021

M-Bab commented Aug 25, 2022

trullock commented Sep 19, 2023

geky commented Sep 21, 2023

trullock commented Sep 21, 2023

geky commented Sep 21, 2023

trullock commented Sep 21, 2023

Recover un-synced data after reboot #344

Recover un-synced data after reboot #344

Comments

asteriskSF commented Dec 3, 2019

e107steved commented Dec 4, 2019 • edited Loading

geky commented Dec 5, 2019

asteriskSF commented Dec 9, 2019

asteriskSF commented Dec 12, 2019

joel-felcana commented Dec 13, 2019

geky commented Dec 18, 2019

geky commented Dec 18, 2019

HamzaHajeir commented Jun 23, 2021

M-Bab commented Aug 25, 2022

trullock commented Sep 19, 2023

geky commented Sep 21, 2023

trullock commented Sep 21, 2023

geky commented Sep 21, 2023

trullock commented Sep 21, 2023

e107steved commented Dec 4, 2019 •

edited

Loading