Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding basic version of DUMP and RESTORE commands #899

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

s3w3nofficial
Copy link

Currently only work's on string type and does not support compression, but it's still a solid starting point.

Implemented

  • redis length encoding
  • redis compatible crc64 implementation
  • DUMP command that works on strings without compression
  • RESTORE command that works on strings without compression and additional modifiers

Not Implemented

  • support for other types than strings
  • support for lzf compression
  • optional modifiers for RESTORE

Also about the copyright for the crc64, i only ported the pycrc generated version, but still i'm not sure if and which copyright should i add.

@s3w3nofficial
Copy link
Author

@microsoft-github-policy-service agree

@s3w3nofficial s3w3nofficial force-pushed the s3w3nofficial/add-dump-and-restore-command branch from f9c9838 to 2954d6e Compare January 3, 2025 01:24
@badrishc
Copy link
Contributor

badrishc commented Jan 6, 2025

Also about the copyright for the crc64, i only ported the pycrc generated version, but still i'm not sure if and which copyright should i add.

since you only ported the pycrc generated version, it seems we should be okay as it is ...

@TalZaccai TalZaccai requested a review from vazois January 7, 2025 19:28
libs/common/Crc64.cs Show resolved Hide resolved
libs/server/Resp/Parser/RespCommand.cs Outdated Show resolved Hide resolved
libs/server/Resp/Parser/RespCommand.cs Outdated Show resolved Hide resolved
@TalZaccai TalZaccai linked an issue Jan 14, 2025 that may be closed by this pull request
@vazois
Copy link
Contributor

vazois commented Jan 14, 2025

@s3w3nofficial, any updates on the open comments?

@s3w3nofficial
Copy link
Author

@s3w3nofficial, any updates on the open comments?

Only open comment left is from @badrishc about the renaming which i already implemented, i guess i can close that one ?

Copy link
Contributor

@vazois vazois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

libs/common/Crc64.cs Show resolved Hide resolved
@s3w3nofficial
Copy link
Author

Every test is passing except the RestoreACLsAsync. It throws this exception: "NOAUTH Authentication required.".

I can't figure out why. Can you guys please take a look at it ?

@vazois vazois self-requested a review January 16, 2025 17:50
libs/server/Resp/Parser/RespCommand.cs Outdated Show resolved Hide resolved
libs/server/Resp/Parser/RespCommand.cs Outdated Show resolved Hide resolved
libs/server/Resp/RespServerSession.cs Outdated Show resolved Hide resolved
@@ -2652,6 +2652,50 @@
}
]
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add these commands to playground/CommandInfoUpdater/SupportedCommand.cs and run the CommandInfoUpdater tool with --force, then ensure that the produced json files (RespCommandDocs.json & RespCommandInfo.json) are the same (it might move these commands around as it's alphabetizing the commands & sub-commands). Thank you!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@badrishc
Copy link
Contributor

Every test is passing except the RestoreACLsAsync. It throws this exception: "NOAUTH Authentication required.".

I can't figure out why. Can you guys please take a look at it ?

@kevin-montrose, can you help how to fix the ACL unit test here? The error is:

image

@kevin-montrose
Copy link
Contributor

Every test is passing except the RestoreACLsAsync. It throws this exception: "NOAUTH Authentication required.".
I can't figure out why. Can you guys please take a look at it ?

@kevin-montrose, can you help how to fix the ACL unit test here? The error is:

image

@badrishc @s3w3nofficial I suspect it's that try/catch not leaving the client in a nice state when an exception is encountered. The pattern elsewhere for ACLs with commands that aren't repeatable is to use a counter to differentiate each invocation.

[Test]
public async Task RestoreACLsAsync()
{
    var count = 0;

    await CheckCommandsAsync(
        "RESTORE",
        [DoRestoreAsync]
    );

    async Task DoRestoreAsync(GarnetClient client)
    {
        var payload = new byte[]
        {
            0x00, // value type 
            0x03, // length of payload
            0x76, 0x61, 0x6C,       // 'v', 'a', 'l'
            0x0B, 0x00, // RDB version
            0xDB, 0x82, 0x3C, 0x30, 0x38, 0x78, 0x5A, 0x99 // Crc64
        };

        count++;

        var val = await client.ExecuteForStringResultAsync(
            "$7\r\nRESTORE\r\n"u8.ToArray(),
            [
                Encoding.UTF8.GetBytes($"foo-{count}"), 
                "0"u8.ToArray(), 
                payload
            ]);

        ClassicAssert.AreEqual("OK", val);
    }
}

Works, and all ACL tests pass.

Copy link
Contributor

@kevin-montrose kevin-montrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of little notes for improvements, and a couple questions around correctness.


// return error if key already exists
var keyExists = storageApi.EXISTS(key);
switch (keyExists)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a 1 case switch instead of an if?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

// get footer (2 bytes of rdb version + 8 bytes of crc)
var footer = valueSpan[^10..];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should an error be reported if valueSpan is too small? Just raising an exception is kinda hostile.


if (ttl > 0)
{
storageApi.SETEX(key, valArgSlice, TimeSpan.FromMilliseconds(ttl));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is reported if the set fails?

More generally, the Exists call on L45 won't keep the key from being created by the time we get here (or to line 107).

Perhaps a SETEX with NX (and dealing with the responses) would be a better approach? Redis has a REPLACE flag to control that behavior, but that can be future work.

}
else
{
storageApi.SET(key, valArgSlice);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as above on SETEX.


var status = storageApi.GET(key, out var value);

switch (status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same "why switch and not if"-note.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


// Total len (% + length of ascii bytes + CR LF + payload type + redis encoded payload len + payload len + rdb version + crc64 + CR LF)
var totalLength = 1 + lengthInASCIIBytesLen + 2 + 1 + encodedLength.Length + value.ReadOnlySpan.Length + 2 + 8 + 2;
Span<byte> buffer = stackalloc byte[totalLength];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be pretty big, right? So unconditionally stackalloc'ing is dangerous, probably get a rented array past a certain limit (64 or 128 are probably reasonable?).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if renting here is a good idea, if this method will get called, it's probably going to be multiple times, the renting and disposing might be overall less efficient than just heap allocating.

For now i changed it to heap alloc when above 128.

buffer[offset++] = 0x00;

// length of the span
foreach (var b in encodedLength)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why copy byte-by-byte and not use a Span CopyTo method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Len of the dump (payload type + redis encoded payload len + payload len + rdb version + crc64)
var len = 1 + encodedLength.Length + value.ReadOnlySpan.Length + 2 + 8;
var lengthInASCIIBytes = new Span<byte>(new byte[NumUtils.NumDigitsInLong(len)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Max length here is like 10 right? You can just stackalloc that instead of allocating an array.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/// <param name="buff"></param>
/// <returns></returns>
/// <exception cref="ArgumentException"></exception>
public static (long length, byte payloadStart) DecodeLength(ref ReadOnlySpan<byte> buff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: C# convention is to PascalCase tuple members. So (long Length, byte PayloadStart).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/// <param name="length"></param>
/// <returns></returns>
/// <exception cref="ArgumentOutOfRangeException"></exception>
public static byte[] EncodeLength(long length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole method is pretty allocate-y, which worries me if it's ever extensively used.

Copy link
Contributor

@PaulusParssinen PaulusParssinen Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree bit @kevin-montrose here. A common approach is to have small static method to first calculate the required buffer size (it can be simple switch expression) for the caller (i.e. in our case NetworkDUMP method) to know how much buffer it needs (to slice from existing buffer, stackalloc, rent etc., doesn't matter).

Then the actual encoding method takes a Span<byte> output argument which ownership belongs to caller. Now we could do the encoding on this buffer without allocations. You can see this pattern all over .NET with TryWrite and other encoding/serialization methods.

@s3w3nofficial
Copy link
Author

Hey @kevin-montrose

Thanks for the help with test and for the comments. I will try to resolve these over the weekend.

Regarding the parsing, i'm working on a parser that support's the whole rdb format and i hope i will be able to implement that one in the future. For now i will look into optimizing the RespLengthEncodingUtils since the parser won't be ready any time soon.

@@ -344,6 +344,9 @@
/* Fails if encounters error during AOF replay or checkpointing */
"FailOnRecoveryError": false,

/* Skips crc64 validation in restore command */
"SkipChecksumValidation": false,
Copy link
Contributor

@badrishc badrishc Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to SkipRDBRestoreChecksumValidation to clarify that this skip is specifically for the restore command (we have checksum validations in other unrelated places).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named it like this because that's how it's named in the redis source code skip_checksum_validation. Also in the redis it only can be se as a build option and not in redis.conf.

Copy link
Contributor

@badrishc badrishc Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is fine to diverge from other resp caches in such respects. having a conf option is clearly more powerful, and we can name it in a way that makes sense for garnet.

Comment on lines 20 to 43
public static (long Length, byte PayloadStart) DecodeLength(ref ReadOnlySpan<byte> buff)
{
// remove the value type byte
var encoded = buff.Slice(1);

if (encoded.Length == 0)
{
throw new ArgumentException("Encoded length cannot be empty.", nameof(encoded));
}

var firstByte = encoded[0];
return (firstByte >> 6) switch
{
// 6-bit encoding
0 => (firstByte & 0x3F, 1),
// 14-bit encoding
1 when encoded.Length < 2 => throw new ArgumentException("Not enough bytes for 14-bit encoding."),
1 => (((firstByte & 0x3F) << 8) | encoded[1], 2),
// 32-bit encoding
2 when encoded.Length < 5 => throw new ArgumentException("Not enough bytes for 32-bit encoding."),
2 => ((long)((encoded[1] << 24) | (encoded[2] << 16) | (encoded[3] << 8) | encoded[4]), 5),
_ => throw new ArgumentException("Invalid encoding type.", nameof(encoded))
};
}
Copy link
Contributor

@PaulusParssinen PaulusParssinen Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to suggest change of approach to this length decoding. Exception handling is expensive, to avoid it you can use the Try-pattern seen a lot in the .NET world. This way caller can decide how to handle failure, i.e. throw expensive exception or to maybe just write RESP error without EH catching logic?

The method should also only concern itself with the length decoding and leave the slicing of value type byte in the beginning to the caller who hands the input buffer to this method.

My alternate approach:

    public static bool TryReadLength(ReadOnlySpan<byte> input, out int length, out int bytesRead)
    {
        length = 0;
        bytesRead = 0;
        if (input.Length < 1) return false;
        
        var firstByte = input[0];
        switch (firstByte >> 6)
        {
            case 0:
                bytesRead = 1;
                length = firstByte & 0x3F;
                return true;
            case 1 when input.Length > 1:
                bytesRead = 2;
                length = ((firstByte & 0x3F) << 8) | input[1];
                return true;
            case 2:
                bytesRead = 5;
                return BinaryPrimitives.TryReadInt32BigEndian(input.Slice(1), out length);
            default: 
                return false;
        }
    }

With this approach, the JITted assembly will be much tighter: https://www.diffchecker.com/UMTqkaAE/

/// <param name="length"></param>
/// <returns></returns>
/// <exception cref="ArgumentOutOfRangeException"></exception>
public static byte[] EncodeLength(long length)
Copy link
Contributor

@PaulusParssinen PaulusParssinen Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree bit @kevin-montrose here. A common approach is to have small static method to first calculate the required buffer size (it can be simple switch expression) for the caller (i.e. in our case NetworkDUMP method) to know how much buffer it needs (to slice from existing buffer, stackalloc, rent etc., doesn't matter).

Then the actual encoding method takes a Span<byte> output argument which ownership belongs to caller. Now we could do the encoding on this buffer without allocations. You can see this pattern all over .NET with TryWrite and other encoding/serialization methods.

Comment on lines 162 to 164
var buffer = totalLength <= 128
? stackalloc byte[totalLength]
: ArrayPool<byte>.Shared.Rent(totalLength);
Copy link
Contributor

@PaulusParssinen PaulusParssinen Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! This rent still needs to be returned to the ArrayPool<byte> after we're done. I'd also recommend bumping the threshold to 256 bytes for it to be worth it.

This is the common pattern:

byte[]? rentedBuffer = null;
var buffer = totalLength <= 256 
    ? stackalloc byte[256] // constant sized stackallocs result better codegen (We should bunch of stack left here). 
    : (rentedBuffer = ArrayPool<byte>.Shared.Rent(totalLength));

then after we're done:

if (rentedBuffer is not null)
    ArrayPool<byte>.Shared.Return(rentedBuffer);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bytesRead = 1;
length = firstByte & 0x3F;
return true;
case 1 when input.Length > 2:
Copy link
Contributor

@PaulusParssinen PaulusParssinen Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had small mistake in my snippet, sorry! 😅 (Off by ones, am I right?)

Suggested change
case 1 when input.Length > 2:
case 1 when input.Length > 1:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DUMP Command Not Supported in Garnet
6 participants