Problem 1: Dimension reduction (2017/2018)
While analyzing data and real-world problems we often deal with high dimensional data. They are usually hard to visualize and
may even pose a threat to validity of the model, e.g. when multicollinearity is an issue. In such situations, one of the solutions
can be applying to the dataset a dimension reduction algorithm, such as PCA or t-SNE.
Let us think about a very naive dimension reduction algorithm, that can be applied to vector data' = [a0, a1, ... , aN-1].
The dimension reduction is achieved by finding the linear combination of the elements of the vector, so that the resulting vector would
be of length 4. In general case, one could put fractional weights in the formulas; however, for simplicity let us assume that
weights are either 1 or 0, and the elements of the resulting vector have to be the sums of the subsequent elements of the original vector.
Since we do not want to lose any information, each element of the original vector should be added to exactly one of the elements of the result vector.
Write a function is_dim_reduction which,
given the original vector data of positive integers of length N,
and K queries, each of which is a vector of positive integers of length 4,
decides for each of the queries whether it is possible that the given query could be the result of such a naive dimension reduction algorithm.
For example, for data' = [1,2,3,4,5,6,7,8,9,10]:
- For the query [15,6,15,19] the answer should be YES (15=1+2+3+4+5, 6=6, 15=7+8, 19 = 9+10).
- For the query [16,5,15,19] the answer should be NO (16=1+2+3+4+6, 5=5, 15=7+8, 19 = 9+10 is not correct, because we have to sum consecutive elements).
- For the query [5,10,19,21] the answer should be YES (10=1+2+3+4, 5=5, 21=6+7+8, 19 = 9+10 -- elements of the query can be given in any order).
- For the query [21,7,8,9] the answer should be NO (21=1+2+3+4+5+6, 7=7, 8=8, 9=9 is not correct, because 10 is left unused).
- For the query [21,7,8,19] the answer should be YES (21=1+2+3+4+5+6, 7=7, 8=8, 19=9+10).
- For the query [28,15,9,10] the answer should be NO (28=1+2+3+4+5+6+7, 15=7+8, 9=9, 10=10 is not correct, because 7 is used twice).
- For the query [28,8,9,10] the answer should be YES (28=1+2+3+4+5+6+7, 8=8, 9=9, 10=10 is not correct, because 7 is used twice).
Testing program in C++
#include <iostream>
#include <vector>
#include <array>
using namespace std;
// you can add extra includes or auxiliary functions/variables if needed
vector<bool> is_dim_reduction(const vector<int>& data, const vector<array<int,4>>& queries) {
// write your solution here
return {};
}
int main() {
vector<int> test_data =
{1,2,3,4,5,6,7,8,9,10};
vector<array<int,4>> test_queries = {
{15,6,15,19},
{16,5,15,19},
{5,10,19,21},
{21,7,8,9},
{21,7,8,19},
{28,15,9,10},
{28,8,9,10}
};
vector<bool> expected_result =
{true, false, true, false, true, false, true};
auto result = is_dim_reduction(test_data, test_queries);
for(auto v: result)
cout << (v?"YES\n":"NO\n");
if(result == expected_result)
cout << "Result correct!\n";
else
cout << "Result incorrect!\n";
return 0;
}
Testing program in Python
# add your own imports and auxiliary functions if needed
def is_dim_reduction(data, queries):
# implement your solution here
return []
test_data = [1,2,3,4,5,6,7,8,9,10]
test_queries = [
[15,6,15,19],
[16,5,15,19],
[5,10,19,21],
[21,7,8,9],
[21,7,8,19],
[28,15,9,10],
[28,8,9,10]
]
expected_result = [True, False, True, False, True, False, True]
my_result = is_dim_reduction(test_data, test_queries)
print(my_result)
if my_result == expected_result:
print("Result correct!")
else:
print("Result incorrect!")
Rules
- Your solution will be tested partially automatically.
Please do not change the format of the parameters, the output format, or in general, parts of the testing program other than the places which are marked.
(In case if your solution fails the automatic tests, I will look manually, but you might get penalty points for changing the format.)
- Other programming languages are allowed, provided you have an approval of the lecturer, and write a "testing program" such as ones above yourself.
- To get the full score, the time complexity of your algorithm should be at least as good as that of the intended solution.
(This time complexity is not given, but the intended solution runs in a few seconds in Python, and <1 second in C++, for N=1000000, K=10000.)
- You do not receive penalty points for bad programming style, high memory complexity, or high constant in time complexity (unless somebody really goes overboard with one of these).
- Remember to test your program! The tests provided might not be sufficient.
- Using the standard libraries is allowed.
- Using non-standard libraries, or code snippets found in Internet, is allowed, as long as the source is given (although it should not be useful in these problems).
- Discussing the algorithm and the implementation with other people is not allowed.
- Send your solution to erykk@mimuw.edu.pl before Nov 29, 2017, 11:59 am (deadline extended).
- One of the four solutions may be fixed after the deadline (but before the exam).