Assignment

To launch, install Docker or Podman and run:

$ cargo test
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.70s
     Running unittests src/lib.rs (target/debug/deps/supermetal_assignment-a8159c3e18c41f11)

running 1 test
(took 2197.842923 ms)
test test::test_mysql ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 29.89s
...

Future direction

There are a few obvious places to go with this, and you probably don't really care about these, but just so you know I know...:

Nothing is configurable outside of code. (E.g. the Parquet batch size is a constant.) Might have been better to do a CLI rather than a test. Production code would of course be more configurable/tunable.
My dataset is a bit lopsided--an employees table with 500K records and a departments table with only 9. This made it a bit difficult to definitively establish how much concurrency actually benefitted in some areas but intuitively it made sense, and would probably bear out on actual data.
As previously stated, I'm not familiar with Parquet/Arrow outside of my use of it with DuckDB/Overture Maps. There are probably additional ways to speed things up, and I look forward to learning about those when we work together.

Thanks, and do let me know if you have any additional questions.

1.3 KiB Raw Blame History

Assignment

Future direction

1.3 KiB

Raw Blame History