From a5e71c90ac44e7ce5362209177b40eaf39673540 Mon Sep 17 00:00:00 2001 From: Nolan Darilek Date: Tue, 17 Jun 2025 11:33:07 -0400 Subject: [PATCH] Add README. --- README.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..f482e9a --- /dev/null +++ b/README.md @@ -0,0 +1,29 @@ +# Assignment + +To launch, install Docker or Podman and run: + +```bash +$ cargo test + Finished `test` profile [unoptimized + debuginfo] target(s) in 0.70s + Running unittests src/lib.rs (target/debug/deps/supermetal_assignment-a8159c3e18c41f11) + +running 1 test +(took 2197.842923 ms) +test test::test_mysql ... ok + +test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 29.89s +... +``` + +## Future direction + +There are a few obvious places to go with this, and you probably don't really +care about these, but just so you know *I* know...: + +* Nothing is configurable outside of code. (E.g. the Parquet batch size is a constant.) Might have been better to do a CLI rather than a test. Production code would of course be more configurable/tunable. +* My dataset is a bit lopsided--an employees table with 500K records and a departments table with only 9. This made it a bit difficult to definitively establish how much concurrency actually benefitted in some areas but intuitively it made sense, and would probably bear out on actual data. +* As previously stated, I'm not familiar with Parquet/Arrow outside of my use of + it with DuckDB/Overture Maps. There are probably additional ways to speed + things up, and I look forward to learning about those when we work together. + +Thanks, and do let me know if you have any additional questions.