Installation is a single binary zip file: There's a blog post with examples – let's try it out on logs and see what happens. That's not a problem with the new version: as of DuckDB 0.7.0, DuckDB can read NDJSON files and infer a schema from the values. I haven't used duckdb extensively as it requires that a schema is defined before you import. DuckDB is like SQLite, but focused on analytics – it focuses on processing entire columns at once, rather than a row at a time. SQLite does have some disadvantages in that it processes rows sequentially, and so asking it aggregate or analytical questions like "what are the 10 most common user agent strings" can take a while on large datasets. lines_nofs0 which would provide a web application UI for SQLite, but I haven't tried this. Interestingly, sqlite-lines can be used with Datasette with datasette data.db -load-extension. Saving the table and exporting it to your local desktop is also very simple, and gives you the option of using a database GUI like DB Browser for SQLite. Adding the sqlite-lines extension is as simple as getting the static library: Using sqlite3 can be more convenient than using jq or other JSON processing command line tools for digging around in logs. sqlite-linesĪlex Garcia released sqlite-lines in June specifically to read NDJSON. TL DR: With NDJSON support, slurping structured logs into a "no dependencies" database like SQLite or DuckDB is easier than ever. What we really want is an in-process "no dependencies" database that understands NDJSON. Spark is great at NDJSON dataframes, but Spark is a heavyweight solution that we can't just install on a host. Getting an actual schema so that you can get NDJSON into a database is still a somewhat manual process compared to CSV. The de-facto standard for structured logging is newline-delimited JSON (NDJSON), and there is only a loose concept of a "schema" – structured logging can have high cardinality, and there's usually only a few guaranteed common fields such as timestamp, level and logger_name. Using SQL has a number of advantages over using JSON processing tools or log viewers, such as the ability to progressively build up views while filtering or querying, better timestamp support, and the ability to do aggregate query logic.īut structured logging isn't what most databases are used to. I've written a bit about querying structured logging with SQLite and the power of data science when it comes to logging by using Apache Spark. Structured logging and databases are a natural match – there's easily consumed structured data on one side, and tools for querying and presenting data on the other.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |