The Great Parsing #27

CosmicHorrorDev · 2021-11-07T20:15:05Z

From finding out how to extract contents from .vpk files in #26 we now have over 60k VDF files to test parsing with just from the contents of a few Valve games

The full corpus is much too large and probably a nono to include in here, but I'll hack together a program that tries to parse each file and dump any ones that fail to a separate location. Once I get that running I'll post any failures here

The text was updated successfully, but these errors were encountered:

CosmicHorrorDev · 2021-11-07T21:56:21Z

Exactly two files (the same exact same contents) use some weird platform tag identifier thing like so

"Foo"
{
    "Bar" [$WIN32]
    {
    }
}

Handling this would probably be a pain especially since I have no clue what possible values there are and I also don't know how all it can be applied (I'm assuming the above would make "Bar" and its value considered Windows 32-bit exclusive, can it also be applied to a value that is a string? Where else could it be used?

CosmicHorrorDev · 2021-11-07T21:58:35Z

It seems common to still use \ as a path separator instead of escaping a character. I suppose the easiest way to handle this would be to have escape characters to not be parsed by default and add an option to parse them since they seem incredibly rare

CosmicHorrorDev · 2021-11-07T22:00:33Z

It seems somewhat common to include a null byte at the end of the file. Not sure if this is packed file specific and just isn't handled right or if this is present normally (Hopefully it's just the former for consistency)

CosmicHorrorDev · 2021-11-07T22:02:01Z

Some files failed to read because they're not UTF-8 encoded. Need to dig into the different encodings used. It may be reasonable to expect users to handle encoding and convert it to UTF-8 for us

CosmicHorrorDev · 2021-11-07T22:20:03Z

It looks like the platform specific tags may be more common and do seem to indicate the platform that a value is used for. Here's a snippet from another file

"xpos"	"r223" [$WIN32]
"xpos"	"r223" [$X360HIDEF]
"xpos"	"r220" [$X360LODEF]

This also shows that it can be used on values that are strings as well. The full set of tags that I've seen so far are WIN32, WIN32WIDE, X360, X360HIDEF, X360LODEF, X360WIDE, DEMO, ENGLISH, JAPANESE, KOREAN, etc. and beyond that there looks to be some conditional logic that can be used as well like [$WIN32 && $ENGLISH] or [$WIN32 && !$ENGLISH]

The parsing position is a bit awkward as well since it can appear at the end of a pair for Key-String, but between the two tokens for Key-Obj. With how many different possible values there are it doesn't seem worth trying to parse specifics, we could just return the string for what's inside

Of the 16,353 failures this is included in 345

CosmicHorrorDev · 2021-11-07T23:44:13Z

The number of files that used #base are 292 of the 16,353 failures.

Of those files it appears that #base always appears on the top value. I'll have to dig in more to see if #base was ever used with a file that also has a #base

CosmicHorrorDev · 2021-11-07T23:45:18Z

Finally the number of files that use \ when not trying to represent an escaped character are 15,079 which makes it a very prevalent issue.

This was referenced Nov 11, 2021

Parse optional ending NULL byte #28

Closed

Support parsing tags #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Great Parsing #27

The Great Parsing #27

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

The Great Parsing #27

The Great Parsing #27

Comments

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021

CosmicHorrorDev commented Nov 7, 2021