Yuri
06.07.2018
10:25:17
Looks like a very good dataset!
k
06.07.2018
10:26:00
Would you please make me a little clear about timestep definition here in my case
Yuri
06.07.2018
10:26:01
So, basically, you could read DeepSpeech2 for an idea of what architecture you can take
You can group them, yes, with some conv layer on the front
Google
Yuri
06.07.2018
10:26:41
Or can pass them individually
Or you can pass them in parallel but without a conv layer, but for a LSTM it'll be a little be harder to figure out all connections in the data
k
06.07.2018
10:28:03
I already used the conv1d with lstm and even individually lstm but im stocked in very basic question. Here that what is the timestep here for me
Yuri
06.07.2018
10:28:40
For most packages you need to flatten both dimensions into one then
But other than this, feels right
Like, (T*F, batchsize)
k
06.07.2018
10:31:06
Yuri
06.07.2018
10:34:43
Yes
Also, if you would pass the data individually and use a custom metric with checks every 40 steps, it won't have any benefit over this grouped solution, I guess.
k
06.07.2018
10:42:16
Yuri
06.07.2018
10:46:08
Google
k
06.07.2018
10:51:11
So then what do you suggest for intera person policy? It seems that model cannot find common pattern through different people
Someone suggested me to split everyone's sinal in 10second for instand and shuffel thi bucket of 10sec then feed the network with a larg batches
But here problem is even in stateless model of lstm the internal state of peaple are kept within a batch, thus if we have a sequence like seq1,seq2,...seqn which are randomly shuffeled and each seq can be blonged to any arbitrary person the lstm considers them as seq1 as the history of seq2 and so on which os not correct
Arcady
06.07.2018
11:16:31
there must be a "clear" option in the lib u r using
k
06.07.2018
11:20:30
Arcady
06.07.2018
11:21:34
consider batch = 1 sample. then you can reset the state of lstm after training it on each sample.
k
06.07.2018
11:24:03
Arcady
06.07.2018
11:25:59
a completely different method is e.g. in case of machine translation/text classification is to put a "termination symbol" at the end of each text sample.
k
06.07.2018
11:27:22
Since the temporal dependency exist within data for each people a simple suffeling wont work so i tried the walkfroward ,... but still not rubost enough
Arcady
06.07.2018
11:28:55
k
06.07.2018
11:29:33
Im using keras with tf backend
Arcady
06.07.2018
11:29:40
k
06.07.2018
11:35:38
Arcady
06.07.2018
11:36:00
apart from using another lib no :)
k
06.07.2018
11:37:56
Yuri
06.07.2018
15:41:07
There should be no problem of having multiple people data in a batch.
Also, regarding learning stateful behavior -- you can try to sample data: instead of providing each 40 timesteps, give only 10 or 5 or 1. Or instead of measuring every second, do 5x sampling for the data and average 5 labels into 1 (*corrected). This might prevent your kind of overfitting/generalization failure that comes from learning local behaviour instead of global.
Also it's important to remember that there are cases and tasks when a LSTM layer doesn't help much because there's not much reliable correlation in global behaviour.
Google
Yuri
06.07.2018
15:50:54
Imagine that you would do the same as you do for EEG but for the sounds that people make:
Once per second measure a sound they pronounce, and attempt to find correlation between sounds made 1 second ago, 2 seconds ago, 3 seconds ago and now. On that timeframe, the only thing that is the same is a person's timbre, and it's different for each person -- exactly what you found in your dataset!
k
06.07.2018
15:55:16
Yuri
06.07.2018
15:57:10
well, in few words, rather I don't understand how LSTM could help you at all.
Let's talk in terms of second FFT transform: at what frequencies your learnable signal mostly is?
k
06.07.2018
15:58:57
Yuri
06.07.2018
15:59:02
Again, for a comparison with sounds, frequencies longer than 1 Hz are of no interest.
k
06.07.2018
16:00:36
Yuri
06.07.2018
16:00:46
aha, no, you have a label for each 40 ms and data for each 1 ms.
k
06.07.2018
16:04:30
Yuri
06.07.2018
16:05:13
CNN I think. And a FFT maybe before a CNN.
k
06.07.2018
16:06:26
actually i have used some calculated feature on this 40 ms time frame like FFT and used them instead of raw data and i have implied even Conv1d for them and eventually i saw when i mix it with lstm it gaves me a slightly better result
Yuri
06.07.2018
16:09:53
Also consider different preprocessing and find out the best. You can try a fixed model like CNN+FC layers and compare different preprocessing before it first of all.
For the sounds, scientists found the best preprocessing a long time ago and training NN to reproduce it as its part isn't rational because this makes the learning much slower and much more data is needed to learn it.
k
06.07.2018
16:17:22
Yuri
06.07.2018
16:17:50
yeah, indeed
so a good research would look like finding a good combination of the parameters for the following:
0) choosing a baseline performance for your study (and optionally perform baseline analysis)
1) preprocessing (no FFT, FFT: hann/hamming, frame size, window size, overlap)
2) architecture: finding the best architecture.
3) learning the possible reasons of overfitting and measuring their impact into the final quliaty.
4) "theoretical maximum" quality, probably a kind of analysys of the data variance across people, data noise (maybe by trying to soften the data) and label noise (how often similar data leads to different labels).
You can take a small part of dataset for most of these studies, so the network would train very fast (in several minutes on modern GPUs).
I'd also suggest to take initial values for all parts from other people's works.
k
06.07.2018
16:23:08
Thanks a lot yuri im thinking now how can i combine raw data and calculated feature at the same time i mean some cnn with several filters for raw data and some domain related fetures like heart rate at same time is it possible or do you recomend it?
Google
Yuri
06.07.2018
19:41:29
yes, absolutely. you can approximate heart beats with a linear, a cosine or an exponential decaying function, I think.
Evgeniy
07.07.2018
05:29:18
k
07.07.2018
07:50:12
Yuri
07.07.2018
10:07:15
k
07.07.2018
11:18:48
also it is the description in the dataset about them:
"- features_ECG/*.arff
arff files containing features computed from the filtered ECG signal with a sliding centred window which size depends on the modality (arousal -> ws=4s, valence -> ws=10s; optimised on the dev partition).
Features are thus provided separately for each of those two dimensions.
The first feature vector is assigned to the center of the window, and duplicated for the previous frames - from 1 to ws/(2*sampling_period)-1, with the sampling period being equal to 40ms.
The last feature vector is also duplicated for the last frames.
19 features x 7501 frames per file"
this is the Raw data structure :
if I want to feed this data to the model what approach do you suggest is better?
1)scaling all the data for instance in range of [-1,1]
2) or instead, using a Batchnormalization layer as a input layer to model without any data scaling?
Yuri
07.07.2018
12:57:41
Andrey
07.07.2018
14:53:28
> tools to predict feature importance
https://github.com/marcotcr/lime (https://github.com/thomasp85/lime) as an example. It can be used for DL models, too (see original paper https://arxiv.org/abs/1602.04938)
k
07.07.2018
15:36:30
Roman
07.07.2018
17:02:11
Добрый вечер!
Ищу человека, который бы отвечал за deep learning сторону проекта
Сейчас в команде я - отвечаю почти за все вопросы и немного за разработку и андроид-разработчик
Если кому-то будет интересно, пишите в лс
k
09.07.2018
20:14:49
Hi again everybody
My new question is:
Let say we have a sychology related experiment to collect data and each person are asked for giving rating feedback.
As usuall in almost all sychology domains people have quiet different rates based on their understanding, mood, knowledge,...
If we want to use these feedbacks as the label for any classification or regression of some dataset we will face a wide veriety of user's rate for every specific rating question and it makes problem that we wont have a reliable ground truth just by simply using the average of rate through all rater.
We need a weighted avege that gives higher weights to more correlated and simillar rates.
So could you please guide me how to define such a inter rater reliability coefficent ??
Or do you have other suggestion?
Specially when the feedbacks are not categorical and they are a real number between range [a,b]
There exist Cohen Kappa statistic but it's only for categorical data and i dont know any others
Диер
13.07.2018
14:09:20
Hello guys.
I'm fairly new to reinforcement learning. I have implemented DQN in the past and now I'm working on A3C for a custom environment. And I noticed that in DQN I used an epsilon greedy policy, so I used something like this to force exploration:
if eps <= random.random():
return random.randint(0, num_actions-1)
else:
return np.argmax(model.predict(state))
But in A3C I am using this instead:
policy = model.predict(state)
return np.random.choice(num_actions, p=policy)
As far as I know, this is used to make model conservative about its actions, so we are trying to encourage the model to give a much higher probability (close to 1) for good actions and reduce unpredictability .
In A3C we use a critic model to predict value, which is basically a n-step return (expected reward for future n steps) right?
But the question is why do we use different approaches? Can I use epsilon greedy policy in A3C or vise versa? Which one is better and when? Or is there certain type of environment which requires to use one of them? And what if my environment is impossible to predict (I mean the future reward), but it is possible to develop a strategy that can beat the game. Let's say, it is a game where you start from a random point and never know what obstacle will come out, but you know for sure that you have to avoid them. Do I have to predict the value then?
k
19.07.2018
19:43:09
Hi every body
As you know it's common to reuse pretrained DNN Model for most of the Image processing project in the preliminary layeres, Im looking for such a transfer learning strategy for ecg or eeg signal processing to speed up the process. Do you know any existing model to refer me there?
or do have any comment or idea for such a decision? do recommend it?
Thanks.
Michael
20.07.2018
20:52:02
ребята, подскажите насчет speech recognition: я там вижу разные виды audio processing: mfcc, filter banks, including delta+ delta-delta. Получается очень разный размер инпута: от (timesteps, 13) with mfcc, до (timesteps, 39) или даже (timesteps, 161) for linear spectrograms. Это все для LibriSpeech на DeepSpeech моделях.
Konstantin
20.07.2018
20:52:42
Google
Michael
20.07.2018
20:53:37
какой код используешь?
Konstantin
20.07.2018
20:54:28
в данный момент я свой пишу, пока даже без нейронок, я споткнулся от них, и понял что надо классически методы рассмотреть.
Michael
20.07.2018
20:54:46
как споткнулся?
Yuri
21.07.2018
03:42:41
Michael
22.07.2018
03:17:23
Yuri
22.07.2018
04:01:32